Mean Squared Error: Definition, Applications and Examples

Mean Squared Error – Explained | What is Mean Square Error?

What is Mean Squared Error?
MSE as Model Evaluation Measure
Conclusion

What is Mean Squared Error?

In Statistics, Mean Squared Error (MSE) is defined as Mean or Average of the square of the difference between actual and estimated values.

Contributed by: Swati Deval

To understand it better, let us take an example of actual demand and forecasted demand for a brand of ice creams in a shop in a year. Before we move into the example,

Month	Actual Demand	Forecasted Demand	Error	Squared Error
1	42	44	-2	4
2	45	46	-1	1
3	49	48	1	1
4	55	50	5	25
5	57	55	2	4
6	60	60	0	0
7	62	64	-2	4
8	58	60	-2	4
9	54	53	1	1
10	50	48	2	4
11	44	42	2	4
12	40	38	2	4
Sum				56

MSE= 56/12 = 4.6667

From the above example, we can observe the following.

As forecasted values can be less than or more than actual values, a simple sum of difference can be zero. This can lead to a false interpretation that forecast is accurate
As we take a square, all errors are positive, and mean is positive indicating there is some difference in estimates and actual. Lower mean indicates forecast is closer to actual.
All errors in the above example are in the range of 0 to 2 except 1, which is 5. As we square it, the difference between this and other squares increases. And this single high value leads to higher mean. So MSE is influenced by large deviators or outliers.

As this can indicate how close a forecast or estimate is to the actual value, this can be used as a measure to evaluate models in Data Science.

MSE as Model Evaluation Measure

In the Supervised Learning method, the data set contains dependent or target variables along with independent variables. We build models using independent variables and predict dependent or target variables. If the dependent variable is numeric, regression models are used to predict it. In this case, MSE can be used to evaluate models.

In Linear regression, we find lines that best describe given data points. Many lines can describe given data points, but which line describes it best can be found using MSE.

For a given dataset, no data points are constant, say N. Let SSE1, SSE2, … SSEn denotes Sum of squared error. So MSE for each line will be SSE1/N, SSE2/N, … , SSEn/N

Hence the least sum of squared error is also for the line having minimum MSE. So many best-fit algorithms use the least sum of squared error methods to find a regression line.

MSE unit order is higher than the error unit as the error is squared. To get the same unit order, many times the square root of MSE is taken. It is called the Root Mean Squared Error (RMSE).

RMSE = SQRT(MSE)

This is also used as a measure for model evaluation. There are other measures like MAE, R2 used for regression model evaluation. Let us see how these compare with MSE or RMSE

Mean Absolute Error (MAE) is the sum of the absolute difference between actual and predicted values.

R2 or R Squared is a coefficient of determination. It is the total variance explained by model/total variance.

MSE / RSME	MAE	R2
Based on square of error	Based on absolute value of error	Based on correlation between actual and predicted value
Value lies between 0 to ∞	Value lies between 0 to ∞	Value lies between 0 and 1
Sensitive to outliers, punishes larger error more	Treat larger and small errors equally. Not sensitive to outliers	Not sensitive to outliers
Small value indicates better model	Small value indicates better model	Value near 1 indicates better model

RSME is always greater than or equal to MAE (RSME >= MAE). The greater difference between them indicates greater variance in individual errors in the sample.

Both R & Python have functions which give these values for a regression model. Which measure to choose depends on the data set and the problem being addressed. If we want to treat all errors equally, MAE is a better measure. If we want to give more weight-age to large errors, MSE/RMSE is better.

Conclusion

MSE is used to check how close estimates or forecasts are to actual values. Lower the MSE, the closer is forecast to actual. This is used as a model evaluation measure for regression models and the lower value indicates a better fit.

Great Learning also offers a PG Program in Artificial Intelligence and Machine Learning in collaboration with UT Austin. Take up the PGP AIML and learn with the help of online mentorship sessions and gain access to career assistance, interview preparation, and job fairs. Get world-class training by industry leaders.