+ Reply to Thread
Results 1 to 12 of 12

Thread: Linear Regression: Mean square error (MSE) ?

  1. #1
    Points: 4,374, Level: 42
    Level completed: 12%, Points required for next Level: 176

    Posts
    110
    Thanks
    11
    Thanked 0 Times in 0 Posts

    Linear Regression: Mean square error (MSE) ?




    Simple linear regression model:
    Y_i = β0 + β1*X_i + ε_i , i=1,...,n
    where n is the number of data points, ε_i is random error

    Let σ^2 = V(ε_i) = V(Y_i)

    Then an unbiased estimator of σ^2 is
    s^2 = (1/n-2)[∑(e_i)^2]
    where e_i's are the residuals

    s^2 is called the "mean square error" (MSE).


    My concerns:
    1) The GENERAL formula for sample variance is s^2 = (1/n-1)[∑(y_i - y bar)^2], it's defined on the first pages of my statistics textbook, I've been using this again and again, now I don't see how this general formula (which always holds) can reduce to the formula in red above? How come we have (n-2) and e_i in the formula for s^2?


    2) From what I've learnt in previous stat courses, the "mean square error" of a point estimator is by definition
    MSE(θ hat) = E[(θ hat - θ)^2]

    Is this the same MSE as the one in red above? Are they related at all?

    Any help is greatly appreciated!

    note: also under discussion in math help forum
    Last edited by kingwinner; 05-22-2009 at 01:48 AM.

  2. #2
    Points: 4,374, Level: 42
    Level completed: 12%, Points required for next Level: 176

    Posts
    110
    Thanks
    11
    Thanked 0 Times in 0 Posts
    My textbook also says that the sample s^2 = (1/n-1)[∑(y_i - y bar)^2] has n-1 in the denominator because it has n-1 degrees of freedom.

    And s^2 = (1/n-2)[∑(e_i)^2] has n-2 in the denominator because it has n-2 degrees of freedom.
    Now I am puzzled...what is "degrees of freedom"? Why does it have n-2 degrees of freedom? What is the simplest way to understand this?

    Thanks!

  3. #3
    Super Moderator
    Points: 9,909, Level: 66
    Level completed: 65%, Points required for next Level: 141
    Dragan's Avatar
    Location
    Illinois, US
    Posts
    1,790
    Thanks
    0
    Thanked 144 Times in 130 Posts
    Quote Originally Posted by kingwinner View Post
    What is the simplest way to understand this?

    Thanks!

    The easiest way to understand this is to follow a basic rule for a sums of squares. That is, your degrees of freedom are:

    #of independent observations (N) minus (-) the number of estimates of population parameters (Betas).

    So, with a simple regression you have: N - 2 because you have two estimates of two parameters (B0 and B1).

    As another example, if you have a regression model such as:

    Yhat = b0 + b1X1 + b2X2 +b3X3 + b4X4

    you would have degrees of freedom of N - 5 because you have 5 estimates of 5 parameters. Do you follow....

    When you compute the standard deviation for a set of N data points you have N - 1 degrees of freedom because you have one estimate (XBar) of one parameter (Mu).

  4. The Following User Says Thank You to Dragan For This Useful Post:

    rossh (12-11-2011)

  5. #4
    Points: 4,374, Level: 42
    Level completed: 12%, Points required for next Level: 176

    Posts
    110
    Thanks
    11
    Thanked 0 Times in 0 Posts
    Thanks for the helpful comments about degrees of freedom. It makes a lot more sense now!

    There is still something that I don't understand...

    The GENERAL formula (which always holds) for sample variance is
    s^2 = (1/n-1)[∑(y_i - y bar)^2].

    I don't see how this can possibly reduce to the formula
    s^2 = (1/n-2)[∑(e_i)^2]
    in this special case.

    If s^2 = (1/n-1)[∑(y_i - y bar)^2] is the general formula, then it should also hold for the estimate of σ^2 = V(ε_i) = V(Y_i), right? But I don't see how this can happen...

  6. #5
    Points: 2,521, Level: 30
    Level completed: 48%, Points required for next Level: 79

    Posts
    20
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I think you need to first take a look at the link below:

    http://en.wikipedia.org/wiki/Regression_analysis

    The Residual Sum of Square (RSS) is defined as sum[(e_i)^2], i = 1,2,...,N, and e_i = y_i - y_i hat

  7. #6
    Points: 4,374, Level: 42
    Level completed: 12%, Points required for next Level: 176

    Posts
    110
    Thanks
    11
    Thanked 0 Times in 0 Posts
    Your link is great with a lot of helpful information, but it doesn't seem to explain the discrepancy between s^2 = (1/n-1)[∑(y_i - y bar)^2] and s^2 = (1/n-2)[∑(e_i)^2] which is what I don't understand...

  8. #7
    Points: 2,521, Level: 30
    Level completed: 48%, Points required for next Level: 79

    Posts
    20
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I need to stress that it is y hat, not y bar in this formula:

    s^2 = (1/n-2)[∑(y_i - y_i hat)^2]

    The denominator is (n-2), which is the degree of freedom. Why? You can see that e_i = y_i - y_i hat, and there are TWO parameters in the y_i hat, namely beta_0 and beta_1. Here n is the # of observations, so the df = n-2.

    ∑(y_i - y hat)^2 is called the SSE, as the link I provided earlier indicates. To get an MSE, which is the "mean square error", we need to divide the SSE (error sum of squares) by its df. Hence we have

    s^2 = (1/n-2)[∑(y_i - y_i hat)^2]

  9. #8
    Points: 4,374, Level: 42
    Level completed: 12%, Points required for next Level: 176

    Posts
    110
    Thanks
    11
    Thanked 0 Times in 0 Posts
    Typically, to estimate V(X_i), we use the sample standard deviation S^2 = (1/n-1)[∑(X_i - X bar)^2].

    Now, by the definition of variance, V(ε_i) = E[( ε_i-E(ε_i) )^2], so to estimate V(ε_i), shouldn't we use S^2 = (1/n-2)[∑(ε_i - ε bar)^2] ? This form looks much more similar to the formula for sample standard deviation above (compare the parts in red).

    Thanks for clearing my doubts!
    Last edited by kingwinner; 05-23-2009 at 05:15 AM.

  10. #9
    Super Moderator
    Points: 9,909, Level: 66
    Level completed: 65%, Points required for next Level: 141
    Dragan's Avatar
    Location
    Illinois, US
    Posts
    1,790
    Thanks
    0
    Thanked 144 Times in 130 Posts
    Quote Originally Posted by kingwinner View Post
    Typically, to estimate V(X_i), we use the sample standard deviation S^2 = (1/n-1)[∑(X_i - X bar)^2].

    Now, by the definition of variance, V(ε_i) = E[( ε_i-E(ε_i) )^2], so to estimate V(ε_i), shouldn't we use S^2 = (1/n-2)[∑(ε_i - ε bar)^2] ?
    Kingswinner: You are misinterpreting.

    Look: for any regression model with one dependent variable (Y) we would have:

    S = Sqrt [ Sum(Y Yhat)^2 ) / (N 1) ]

    where S is the standard deviation of the error terms (e).

    Now, we also have (more commonly) for a regression model with 1 predictor (X),

    S_y.x = Sqrt [ Sum(Y Yhat)^2 ) / (N 2) ]

    where S_y.x is the standard deviation of the regression line. This is also

    commonly referred to as the standard error of the estimate (e.g. SPSS will refer to S_y.x as such)

    More generally, with k predictors the standard error of the estimate can be written as:

    S_y.x = Sqrt [ Sum(Y Yhat)^2 ) / (N k - 1) ].

  11. #10
    Points: 4,374, Level: 42
    Level completed: 12%, Points required for next Level: 176

    Posts
    110
    Thanks
    11
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by a little boy View Post
    I need to stress that it is y hat, not y bar in this formula:

    s^2 = (1/n-2)[∑(y_i - y_i hat)^2]

    The denominator is (n-2), which is the degree of freedom. Why? You can see that e_i = y_i - y_i hat, and there are TWO parameters in the y_i hat, namely beta_0 and beta_1. Here n is the # of observations, so the df = n-2.

    ∑(y_i - y hat)^2 is called the SSE, as the link I provided earlier indicates. To get an MSE, which is the "mean square error", we need to divide the SSE (error sum of squares) by its df. Hence we have

    s^2 = (1/n-2)[∑(y_i - y_i hat)^2]
    But why are we using y_i hat here instead of y bar(the sample mean)?

    S^2 = (1/n-1)[∑(X_i - X bar)^2] (estimator for V(X_i), the GENERAL formula for sample standard deviation taught in first year STAT which I believe ALWAYS holds)

    S^2 = (1/n-2)[∑(Y_i - Y_i hat)^2] (estimate for V(ε_i) = V(Y_i) )

    Why are we using Y_i hat here instead of Y bar(the sample mean)? What explains such a discrepancy? Y bar is always the best unbiased estimator of the population mean μ = E(Y_i) from what I have learnt in first year STAT, so shouldn't we always use Y bar in calculating the sample standard deviation?

    Thanks for explaining!

  12. #11
    Points: 2,521, Level: 30
    Level completed: 48%, Points required for next Level: 79

    Posts
    20
    Thanks
    0
    Thanked 0 Times in 0 Posts

    This is a REGRESSION problem

    Please first have a look at the link:

    http://en.wikipedia.org/wiki/Errors_..._in_statistics

    S^2 = (1/n-1)[∑(X_i - X bar)^2] (estimator for V(X_i), the GENERAL formula for sample standard deviation): This is the definition and is of course true. However I think the question you posted is about REGRESSION ANALYSIS, and the residual e_i is DEFINED as y_i - y_i hat. You may refer to the link in my first reply, under the "Linear regression" item for details.

  13. #12
    Points: 469, Level: 9
    Level completed: 38%, Points required for next Level: 31

    Posts
    1
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Linear Regression: Mean square error (MSE) ?


    kingwinner, you are missing one crucial point..

    y (the dependent variable in this regression) depends on 2 population parameters - b0 (the intercept) and b1(the slope coefficient). So that y_hat also depends on 2 estimates (remember we are working with a sample, so by definition we don't know the population parameters) and y_hat= bo_hat + b1_hat.

    Then the error comes from the difference in each y that is actually in the data and the y_hat. The SSres (sum squared of residuals) is the sum of all the y's minus their y_hats. So that MSres (mean square of residuals) is SSres divided by the degrees of freedom, which as mentioned above is N the number of observations (equivalent to the number of y's) minus the number of population parameters we have estimated in the process of getting y_hat.

    Where you got confused in applying the variance of a sample of data is that we could change this. In the text books, x_bar is given, but x_bar is the same as x_hat if we have only one variable!!

    Hope that helped. And also, trust me, there are days that you can doubt yourself and your ability to understand stats, but just remind yourself that its not meant to be easy, and you're doing better than the average person (ignore the pathetic pun) to even begin to look at it

+ Reply to Thread

           




Similar Threads

  1. non-linear least square nls()
    By zzzc in forum R
    Replies: 1
    Last Post: 12-18-2010, 04:08 PM
  2. Square Error Question
    By vglnT in forum Statistics
    Replies: 0
    Last Post: 11-01-2010, 08:44 AM
  3. Mean Square Error in Excel
    By bella_85 in forum Statistics
    Replies: 1
    Last Post: 11-11-2009, 08:26 AM
  4. Error estimates with r2 in linear regression
    By mh8782 in forum Regression Analysis
    Replies: 2
    Last Post: 09-18-2009, 09:44 AM
  5. Error estimation from chi-square alone?
    By chi-cube in forum Psychology Statistics
    Replies: 8
    Last Post: 10-20-2008, 09:55 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats