# Thread: Linear Regression: Mean square error (MSE) ?

1. ## Linear Regression: Mean square error (MSE) ?

Simple linear regression model:
Y_i = β0 + β1*X_i + ε_i , i=1,...,n
where n is the number of data points, ε_i is random error

Let σ^2 = V(ε_i) = V(Y_i)

Then an unbiased estimator of σ^2 is
s^2 = (1/n-2)[∑(e_i)^2]
where e_i's are the residuals

s^2 is called the "mean square error" (MSE).

My concerns:
1) The GENERAL formula for sample variance is s^2 = (1/n-1)[∑(y_i - y bar)^2], it's defined on the first pages of my statistics textbook, I've been using this again and again, now I don't see how this general formula (which always holds) can reduce to the formula in red above? How come we have (n-2) and e_i in the formula for s^2?

2) From what I've learnt in previous stat courses, the "mean square error" of a point estimator is by definition
MSE(θ hat) = E[(θ hat - θ)^2]

Is this the same MSE as the one in red above? Are they related at all?

Any help is greatly appreciated!

note: also under discussion in math help forum

2. My textbook also says that the sample s^2 = (1/n-1)[∑(y_i - y bar)^2] has n-1 in the denominator because it has n-1 degrees of freedom.

And s^2 = (1/n-2)[∑(e_i)^2] has n-2 in the denominator because it has n-2 degrees of freedom.
Now I am puzzled...what is "degrees of freedom"? Why does it have n-2 degrees of freedom? What is the simplest way to understand this?

Thanks!

3. Originally Posted by kingwinner
What is the simplest way to understand this?

Thanks!

The easiest way to understand this is to follow a basic rule for a sums of squares. That is, your degrees of freedom are:

#of independent observations (N) minus (-) the number of estimates of population parameters (Betas).

So, with a simple regression you have: N - 2 because you have two estimates of two parameters (B0 and B1).

As another example, if you have a regression model such as:

Yhat = b0 + b1X1 + b2X2 +b3X3 + b4X4

you would have degrees of freedom of N - 5 because you have 5 estimates of 5 parameters. Do you follow....

When you compute the standard deviation for a set of N data points you have N - 1 degrees of freedom because you have one estimate (XBar) of one parameter (Mu).

4. ## The Following User Says Thank You to Dragan For This Useful Post:

rossh (12-11-2011)

5. Thanks for the helpful comments about degrees of freedom. It makes a lot more sense now!

There is still something that I don't understand...

The GENERAL formula (which always holds) for sample variance is
s^2 = (1/n-1)[∑(y_i - y bar)^2].

I don't see how this can possibly reduce to the formula
s^2 = (1/n-2)[∑(e_i)^2]
in this special case.

If s^2 = (1/n-1)[∑(y_i - y bar)^2] is the general formula, then it should also hold for the estimate of σ^2 = V(ε_i) = V(Y_i), right? But I don't see how this can happen...

6. I think you need to first take a look at the link below:

http://en.wikipedia.org/wiki/Regression_analysis

The Residual Sum of Square (RSS) is defined as sum[(e_i)^2], i = 1,2,...,N, and e_i = y_i - y_i hat

7. Your link is great with a lot of helpful information, but it doesn't seem to explain the discrepancy between s^2 = (1/n-1)[∑(y_i - y bar)^2] and s^2 = (1/n-2)[∑(e_i)^2] which is what I don't understand...

8. I need to stress that it is y hat, not y bar in this formula:

s^2 = (1/n-2)[∑(y_i - y_i hat)^2]

The denominator is (n-2), which is the degree of freedom. Why? You can see that e_i = y_i - y_i hat, and there are TWO parameters in the y_i hat, namely beta_0 and beta_1. Here n is the # of observations, so the df = n-2.

∑(y_i - y hat)^2 is called the SSE, as the link I provided earlier indicates. To get an MSE, which is the "mean square error", we need to divide the SSE (error sum of squares) by its df. Hence we have

s^2 = (1/n-2)[∑(y_i - y_i hat)^2]

9. Typically, to estimate V(X_i), we use the sample standard deviation S^2 = (1/n-1)[∑(X_i - X bar)^2].

Now, by the definition of variance, V(ε_i) = E[( ε_i-E(ε_i) )^2], so to estimate V(ε_i), shouldn't we use S^2 = (1/n-2)[∑(ε_i - ε bar)^2] ? This form looks much more similar to the formula for sample standard deviation above (compare the parts in red).

Thanks for clearing my doubts!

10. Originally Posted by kingwinner
Typically, to estimate V(X_i), we use the sample standard deviation S^2 = (1/n-1)[∑(X_i - X bar)^2].

Now, by the definition of variance, V(ε_i) = E[( ε_i-E(ε_i) )^2], so to estimate V(ε_i), shouldn't we use S^2 = (1/n-2)[∑(ε_i - ε bar)^2] ?
Kingswinner: You are misinterpreting.

Look: for any regression model with one dependent variable (Y) we would have:

S = Sqrt [ Sum(Y – Yhat)^2 ) / (N – 1) ]

where S is the standard deviation of the error terms (e).

Now, we also have (more commonly) for a regression model with 1 predictor (X),

S_y.x = Sqrt [ Sum(Y – Yhat)^2 ) / (N – 2) ]

where S_y.x is the standard deviation of the regression line. This is also

commonly referred to as the standard error of the estimate (e.g. SPSS will refer to S_y.x as such)

More generally, with k predictors the standard error of the estimate can be written as:

S_y.x = Sqrt [ Sum(Y – Yhat)^2 ) / (N – k - 1) ].

11. Originally Posted by a little boy
I need to stress that it is y hat, not y bar in this formula:

s^2 = (1/n-2)[∑(y_i - y_i hat)^2]

The denominator is (n-2), which is the degree of freedom. Why? You can see that e_i = y_i - y_i hat, and there are TWO parameters in the y_i hat, namely beta_0 and beta_1. Here n is the # of observations, so the df = n-2.

∑(y_i - y hat)^2 is called the SSE, as the link I provided earlier indicates. To get an MSE, which is the "mean square error", we need to divide the SSE (error sum of squares) by its df. Hence we have

s^2 = (1/n-2)[∑(y_i - y_i hat)^2]
But why are we using y_i hat here instead of y bar(the sample mean)?

S^2 = (1/n-1)[∑(X_i - X bar)^2] (estimator for V(X_i), the GENERAL formula for sample standard deviation taught in first year STAT which I believe ALWAYS holds)

S^2 = (1/n-2)[∑(Y_i - Y_i hat)^2] (estimate for V(ε_i) = V(Y_i) )

Why are we using Y_i hat here instead of Y bar(the sample mean)? What explains such a discrepancy? Y bar is always the best unbiased estimator of the population mean μ = E(Y_i) from what I have learnt in first year STAT, so shouldn't we always use Y bar in calculating the sample standard deviation?

Thanks for explaining!

12. ## This is a REGRESSION problem

http://en.wikipedia.org/wiki/Errors_..._in_statistics

S^2 = (1/n-1)[∑(X_i - X bar)^2] (estimator for V(X_i), the GENERAL formula for sample standard deviation): This is the definition and is of course true. However I think the question you posted is about REGRESSION ANALYSIS, and the residual e_i is DEFINED as y_i - y_i hat. You may refer to the link in my first reply, under the "Linear regression" item for details.

13. ## Re: Linear Regression: Mean square error (MSE) ?

kingwinner, you are missing one crucial point..

y (the dependent variable in this regression) depends on 2 population parameters - b0 (the intercept) and b1(the slope coefficient). So that y_hat also depends on 2 estimates (remember we are working with a sample, so by definition we don't know the population parameters) and y_hat= bo_hat + b1_hat.

Then the error comes from the difference in each y that is actually in the data and the y_hat. The SSres (sum squared of residuals) is the sum of all the y's minus their y_hats. So that MSres (mean square of residuals) is SSres divided by the degrees of freedom, which as mentioned above is N the number of observations (equivalent to the number of y's) minus the number of population parameters we have estimated in the process of getting y_hat.

Where you got confused in applying the variance of a sample of data is that we could change this. In the text books, x_bar is given, but x_bar is the same as x_hat if we have only one variable!!

Hope that helped. And also, trust me, there are days that you can doubt yourself and your ability to understand stats, but just remind yourself that its not meant to be easy, and you're doing better than the average person (ignore the pathetic pun) to even begin to look at it

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts