# E[MSE] simple linear regression

#### B_Miner

##### New Member
Hi All-

I am trying to figure out how to prove that MSE = SSE/n-2 is an unbiased estimator of sigma^2 in simple linear regression.

I have that (1/(n-2))E{SUM[Yi^2-2Yib1Xi-2boYi+bo^2+b1^2Xi^2]}

Are bo and b1 random variables?

Thanks!

#### Dragan

##### Super Moderator
Hi All-

I am trying to figure out how to prove that MSE = SSE/n-2 is an unbiased estimator of sigma^2 in simple linear regression.

Thanks!
No, you have to bring the parameters (Beta0, Beta1, u_i) and the estimates (b0, b1, e_i) in together. I’ll sketch the proof and then you can do the rest.

Here goes, we know that
(1) Y_i = Beta0 + Beta1X_i + u_i

Thus,
(2) Ybar = Beta0 + Beta1Xbar + ubar.

Subtracting (2) from (1) gives
(3) (Y_i – Ybar) = Beta1(X_i – Xbar) + (u_i – ubar)

It is also true that
(4) e_i = (Y_i – Ybar) – b1(X_i – Xbar)

As such, substituting (3) into (4) yields
(5) e_i = Beta1(X_i – Xbar) + (u_i – ubar) – b1(X_i – Xbar)

Now, squaring and summing will give
(6) Sum[e^2_i] = (b1 – Beta1)^2 *Sum[X_i – Xbar)^2 + Sum[u_i – ubar]^2 – 2*(b1 – Beta1)*Sum[(X_i – Xbar)*(u_i - ubar)]

Take expectations on both sides
(7) E[Sum[e^2_i]] =E[ (b1 – Beta1)^2 Sum[X_i – Xbar)^2 + Sum[u_i – ubar]^2 – 2*(b1 – Beta1)*Sum[(X_i – Xbar)*(u_i - ubar)] ].

Next, while taking expectations, you have to impose the classical regression assumptions and this will yield
(8) E[Sum[e^2_i]] = Sigma^2 + (N – 1)Sigma^2 – 2*Sigma^2 = (N – 2)*Sigma^2.

Define the MSE as
(9) MSE = Sum[e^2_i] / (N – 2).

Thus,
(10) E[MSE] = E[Sum[e^2_i]] / (N – 2) = Sigma^2

which shows that the MSE is an unbiased estimate.

#### B_Miner

##### New Member
Thanks so much for this Dragan! It has helped me a lot!

#### statgirl11

##### New Member
Hey Dragan,

I can follow your response quite well up to (7), however I'm having some trouble moving from (7) to (8) with the regression principles...

Here's where I am so far

E[(b1 – Beta1)^2 Sum[X_i – Xbar)^2]
= Sum[X_i – Xbar)^2E[(b1 – Beta1)^2]
= Sxx Var(b1)
= Sxx (sigma^2/Sxx)
= sigma^2

...am I following correctly here?

I can't seem to figure out how/why
E[Sum[u_i – ubar]^2] = (N – 1)Sigma^2, and 2*(b1 – Beta1)*Sum[(X_i – Xbar)*(u_i - ubar)] = 2*Sigma^2

Can you give me a few tips on how you went about that?

Thank you so much for your help so far!

#### Dragan

##### Super Moderator
Hey Dragan,

I can follow your response quite well up to (7), however I'm having some trouble moving from (7) to (8) with the regression principles...

Here's where I am so far

E[(b1 – Beta1)^2 Sum[X_i – Xbar)^2]
= Sum[X_i – Xbar)^2E[(b1 – Beta1)^2]
= Sxx Var(b1)
= Sxx (sigma^2/Sxx)
= sigma^2

...am I following correctly here?

I can't seem to figure out how/why
E[Sum[u_i – ubar]^2] = (N – 1)Sigma^2, and 2*(b1 – Beta1)*Sum[(X_i – Xbar)*(u_i - ubar)] = 2*Sigma^2

Can you give me a few tips on how you went about that?

Thank you so much for your help so far!
1. Yes, that’s fine.

2. Just think of the usual explanation for the expected value of variances:
e.g. Average[ Sum[X – Xbar]^2 / N] = E[ Sum[X – Xbar]^2 / N ] = (N – 1)Sigma^2 / N. Note: We don't have N in the denominator. And, remember why we divide by N – 1 instead of N when we compute the sample variance.

3. This is a bit trickier.

The term –2*[b1 – Beta1]*Sum[X_i – Xbar]*(u_i – ubar) can be expressed as (removing b1 and Beta1 terms)

–2*((Sum [ X_i – Xbar]*u_i )/ (Sum[ X_i – Xbar]^2)) * Sum[X_i – Xbar]*u_i

Taking expectations while noting that the X_i are nonstochastic gives

–2*E [((Sum [ X_i – Xbar]*u_i )^2/ (Sum[ X_i – Xbar]^2)) ]

= -2*E[u_i^2] = –2*Sigma^2

since the u_i are assumed to have constant variance of Sigma^2.

#### statgirl11

##### New Member
I'm still very confused about how we can express the term –2*[b1 – Beta1]*Sum[X_i – Xbar]*(u_i – ubar) as –2*((Sum [ X_i – Xbar]*u_i )/ (Sum[ X_i – Xbar]^2)) * Sum[X_i – Xbar]*u_i

How can you just remove the b1 and Beta1 terms? Also, where did the ubar go from the first term? (when I try and manipulate the first expression to get the second, I seem to end up with a ui and ubar still...), also, how did we end up with a (X_i-Xbar) term in the denominator...

I understand why the term –2*((Sum [ X_i – Xbar]*u_i )/ (Sum[ X_i – Xbar]^2)) * Sum[X_i – Xbar]*u_i results in -2*Sigma^2, I just can't seem to get the expression into that form...

#### Dragan

##### Super Moderator
I'm still very confused about how we can express the term –2*[b1 – Beta1]*Sum[X_i – Xbar]*(u_i – ubar) as –2*((Sum [ X_i – Xbar]*u_i )/ (Sum[ X_i – Xbar]^2)) * Sum[X_i – Xbar]*u_i

How can you just remove the b1 and Beta1 terms? Also, where did the ubar go from the first term? (when I try and manipulate the first expression to get the second, I seem to end up with a ui and ubar still...), also, how did we end up with a (X_i-Xbar) term in the denominator...

I understand why the term –2*((Sum [ X_i – Xbar]*u_i )/ (Sum[ X_i – Xbar]^2)) * Sum[X_i – Xbar]*u_i results in -2*Sigma^2, I just can't seem to get the expression into that form...

Okay let’s take the (B1 – Beta1) term. I’ll try to make things more clear.

We know that B1 can be computed as:

B1 = (Sum(X_i – Xbar)*Y_i) / Sum[X_i – Xbar ]^2 = Sum[k_i*Y_i]

where k_i = (X_i – Xbar) / Sum[X_i – Xbar ]^2.

Next, substitute the population regression function in for Y_i as:

B1 = (Sum(k_i*(Beta0 + Beta1*X_i + u_i ) )

Expand,

B1 =Beta0*Sum(k_i) + Beta1*Sum(k_i*X_i )+ Sum(k_i*u_i)

Simplify,

B1=Beta1 + Sum(k_i*u_i)

because Beta0*Sum(k_i) = 0 since Sum[X_i - Xbar]=0 and Sum(k_i*X_i ) = Sum(k_i*(X_i – Xbar)) = 1.

Now, in the term (above) –2*E[ (B1 – Beta1)…] substitute in for B1 as

–2*E[ (Beta1 + Sum(k_i*u_i) – Beta1) …]

which is equal to

–2*E[ (Sum(k_i*u_i)) …]

which is what I gave above when you substitute back in the expression for k_i .

And, you should end up with what I gave in my previous post (above)...I hope

Note: ubar is zero, by assumption.

I hope this notation helps.

#### statgirl11

##### New Member
That notation is much much clearer, thank you so much! I understand how you did that now...(sorry, my math is a little rusty)

but when you say that...

–2*E [((Sum [ X_i – Xbar]*u_i )^2/ (Sum[ X_i – Xbar]^2)) ]

= -2*E[u_i] = –2*Sigma^2

...wouldn't you end up with -2*E[u_i^2] (rather than -2*E[u_i]), because of the ^2 in the previous expression?

#### Dragan

##### Super Moderator
That notation is much much clearer, thank you so much! I understand how you did that now...(sorry, my math is a little rusty)

but when you say that...

–2*E [((Sum [ X_i – Xbar]*u_i )^2/ (Sum[ X_i – Xbar]^2)) ]

= -2*E[u_i] = –2*Sigma^2

...wouldn't you end up with -2*E[u_i^2] (rather than -2*E[u_i]), because of the ^2 in the previous expression?

Oh Yes, that's correct, I just forgot to put it in. I'll change it. Thanks.

#### statgirl11

##### New Member
Phew, okay, then I've got it now!

Thank you so much again for all your help!

#### edbrown2

##### New Member
in my book, it states that the expectation of the sum of the squared error terms is equal to the error variance times n-1, but wouldnt it be instead the error variance times n, since by definition the expectation of a sum is the sum of the expectations and the expectation of the squared error term is sigma^2 by assumption? Im a bit confused here!

#### edbrown2

##### New Member
never mind i just figured it out, the book was expressing the data in deviations form, with the assumption that sample mean of error terms was zero. Therefore, upon taking the expected value of the sum of the squared error terms, they had to account for the sample variance by adjusting for degrees of freedom.

#### ursou1smine

##### New Member
Var(b1)= (sigma^2/Sxx)

Can someone explain this for me? Thanks for the help of you guys!

#### GKJohn

##### New Member
1. Yes, that’s fine.

2. Just think of the usual explanation for the expected value of variances:
e.g. Average[ Sum[X – Xbar]^2 / N] = E[ Sum[X – Xbar]^2 / N ] = (N – 1)Sigma^2 / N. Note: We don't have N in the denominator. And, remember why we divide by N – 1 instead of N when we compute the sample variance.

3. This is a bit trickier.

The term –2*[b1 – Beta1]*Sum[X_i – Xbar]*(u_i – ubar) can be expressed as (removing b1 and Beta1 terms)

–2*((Sum [ X_i – Xbar]*u_i )/ (Sum[ X_i – Xbar]^2)) * Sum[X_i – Xbar]*u_i

Taking expectations while noting that the X_i are nonstochastic gives

–2*E [((Sum [ X_i – Xbar]*u_i )^2/ (Sum[ X_i – Xbar]^2)) ]

= -2*E[u_i^2] = –2*Sigma^2

since the u_i are assumed to have constant variance of Sigma^2.
hello Dragan, I have been browsing the internet, looking for a proof of the unbiased indicator, and I just cant find anything to hang my hat on. They are all a little different, and there is at least one thing that I just dont understand. some of them seem circular to me. anyway, I am going back to school for a phd in math, 25 years removed from my last math course. this may seem like simple algebra, and I not only want to follow each step, but want to understand it to the point where perhaps not completely intuitive, I can make 'some' logical sense of why it works as it does. anyway, where you say this.... I am not sure what you mean....Note: We don't have N in the denominator. And, remember why we divide by N – 1 instead of N when we compute the sample variance..... do you mean that you put in N in denom and inadvertantly did so? I doubt it, because you would have just taken it out.... anyway, I appreciate what you wrote here 3 years ago, and I sure hope you are still with this site, but you have a nice delivery about you.. I just dont see all the pieces. thanks

#### matt freeman

##### New Member
1. Yes, that’s fine.

2. Just think of the usual explanation for the expected value of variances:
e.g. Average[ Sum[X – Xbar]^2 / N] = E[ Sum[X – Xbar]^2 / N ] = (N – 1)Sigma^2 / N. Note: We don't have N in the denominator. And, remember why we divide by N – 1 instead of N when we compute the sample variance.

3. This is a bit trickier.

The term –2*[b1 – Beta1]*Sum[X_i – Xbar]*(u_i – ubar) can be expressed as (removing b1 and Beta1 terms)

–2*((Sum [ X_i – Xbar]*u_i )/ (Sum[ X_i – Xbar]^2)) * Sum[X_i – Xbar]*u_i

Taking expectations while noting that the X_i are nonstochastic gives

–2*E [((Sum [ X_i – Xbar]*u_i )^2/ (Sum[ X_i – Xbar]^2)) ]

= -2*E[u_i^2] = –2*Sigma^2

since the u_i are assumed to have constant variance of Sigma^2.

I'm really struggling to see how
–2*E [((Sum [ X_i – Xbar]*u_i )^2/ (Sum[ X_i – Xbar]^2)) ] = -2*E[u_i^2].

Is there 'cancellation' involved???

Isn't the u_i part of the summation, as below?
-2*E{(Sum[(X_i - Xbar)*u_i])^2 / Sum[(X_i – Xbar)^2]}

Thanks!