Mean Squared Error – Explained | What is Mean Square Error?

Mean Square Error

What is Mean Squared Error?

In Statistics, Mean Squared Error (MSE) is defined as Mean or Average of the square of the difference between actual and estimated values.

Contributed by: Swati Deval

To understand it better, let us take an example of actual demand and forecasted demand for a brand of ice creams in a shop in a year. Before we move into the example,

MonthActual DemandForecasted DemandErrorSquared Error
14244-24
24546-11
3494811
45550525
5575524
6606000
76264-24
85860-24
9545311
10504824
11444224
12403824
Sum   56

MSE= 56/12 = 4.6667

From the above example, we can observe the following.

  1. As forecasted values can be less than or more than actual values, a simple sum of difference can be zero. This can lead to a false interpretation that forecast is accurate
  2. As we take a square, all errors are positive, and mean is positive indicating there is some difference in estimates and actual. Lower mean indicates forecast is closer to actual.
  3. All errors in the above example are in the range of 0 to 2 except 1, which is 5. As we square it, the difference between this and other squares increases. And this single high value leads to higher mean. So MSE is influenced by large deviators or outliers.

As this can indicate how close a forecast or estimate is to the actual value, this can be used as a measure to evaluate models in Data Science.

MSE as Model Evaluation Measure

In the Supervised Learning method, the data set contains dependent or target variables along with independent variables. We build models using independent variables and predict dependent or target variables. If the dependent variable is numeric, regression models are used to predict it. In this case, MSE can be used to evaluate models.

In Linear regression, we find lines that best describe given data points. Many lines can describe given data points, but which line describes it best can be found using MSE.

For a given dataset, no data points are constant, say N.  Let SSE1, SSE2, … SSEn denotes Sum of squared error. So MSE for each line will be SSE1/N, SSE2/N, … , SSEn/N

Hence the least sum of squared error is also for the line having minimum MSE. So many best-fit algorithms use the least sum of squared error methods to find a regression line.

MSE unit order is higher than the error unit as the error is squared. To get the same unit order, many times the square root of MSE is taken. It is called the Root Mean Squared Error (RMSE).

RMSE = SQRT(MSE)

This is also used as a measure for model evaluation. There are other measures like MAE, R2 used for regression model evaluation. Let us see how these compare with MSE or RMSE

Mean Absolute Error (MAE) is the sum of the absolute difference between actual and predicted values.

R2 or R Squared is a coefficient of determination. It is the total variance explained by model/total variance.

MSE / RSMEMAER2
Based on square of errorBased on absolute value of errorBased on correlation between actual and predicted value
Value lies between 0 to ∞Value lies between 0 to ∞Value lies between 0 and 1
Sensitive to outliers, punishes larger error moreTreat larger and small errors equally. Not sensitive to outliersNot sensitive to outliers
Small value indicates better modelSmall value indicates better modelValue near 1 indicates better model

RSME is always greater than or equal to MAE (RSME >= MAE). The greater difference between them indicates greater variance in individual errors in the sample.

Both R & Python have functions which give these values for a regression model. Which measure to choose depends on the data set and the problem being addressed. If we want to treat all errors equally,  MAE is a better measure. If we want to give more weight-age to large errors, MSE/RMSE is better.

Conclusion

MSE is used to check how close estimates or forecasts are to actual values. Lower the MSE, the closer is forecast to actual. This is used as a model evaluation measure for regression models and the lower value indicates a better fit.

Great Learning also offers a PG Program in Artificial Intelligence and Machine Learning in collaboration with UT Austin. Take up the PGP AIML and learn with the help of online mentorship sessions and gain access to career assistance, interview preparation, and job fairs. Get world-class training by industry leaders.

→ Explore this Curated Program for You ←

Avatar photo
Great Learning Editorial Team
The Great Learning Editorial Staff includes a dynamic team of subject matter experts, instructors, and education professionals who combine their deep industry knowledge with innovative teaching methods. Their mission is to provide learners with the skills and insights needed to excel in their careers, whether through upskilling, reskilling, or transitioning into new fields.

Recommended Data Science Courses

Data Science and Machine Learning from MIT

Earn an MIT IDSS certificate in Data Science and Machine Learning. Learn from MIT faculty, with hands-on training, mentorship, and industry projects.

4.63 ★ (8,169 Ratings)

Course Duration : 12 Weeks

PG in Data Science & Business Analytics from UT Austin

Advance your career with our 12-month Data Science and Business Analytics program from UT Austin. Industry-relevant curriculum with hands-on projects.

4.82 ★ (10,876 Ratings)

Course Duration : 12 Months

Scroll to Top