+ Reply to Thread
Results 1 to 3 of 3

Thread: Why do you divide by n-1 to find sample variance? Linear regression

  1. #1
    Points: 507, Level: 10
    Level completed: 14%, Points required for next Level: 43

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Why do you divide by n-1 to find sample variance? Linear regression




    Okay I have very limited knowledge on statistics but I'm wondering why the hell you divide the numerator in the sample variance formula by n-1. I've read so much on it and I don't understand. Some have said that because you are using the sample mean you're taking a degree of freedom away therefore you divide by n - 1.. <( obviously you can see my lack of statistical knowledge) But that makes no sense to me. It doesn't really make me understand why...

  2. #2
    Points: 1,821, Level: 25
    Level completed: 21%, Points required for next Level: 79
    Buckeye's Avatar
    Location
    Ohio
    Posts
    102
    Thanks
    31
    Thanked 4 Times in 4 Posts

    Re: Why do you divide by n-1 to find sample variance? Linear regression

    I believe the n-1 derives from the use of unbiased vs. biased estimators. https://en.wikipedia.org/wiki/Unbias...dard_deviation
    "I have discovered a truly remarkable proof of this theorem which this margin is too narrow to contain." Pierre de Fermat

  3. #3
    Super Moderator
    Points: 13,151, Level: 74
    Level completed: 76%, Points required for next Level: 99
    Dragan's Avatar
    Location
    Illinois, US
    Posts
    2,014
    Thanks
    0
    Thanked 223 Times in 192 Posts

    Re: Why do you divide by n-1 to find sample variance? Linear regression


    It is because you have a one estimate (XBar) of one parameter in the computation of the sample variance for a single set of data - so your degrees of freedom (df) would be N-1. More generally, for a sum of squares, the general rule is: df = N minus the # of parameter estimates. An example would be a regression model with one predictor. In this case you have two estimates (b_0, b_1) of two parameters (Beta_0, Beta_1) and thus, your df to compute the Mean Squares for Error would be SS(error) divided by N - 2.
    Last edited by Dragan; 08-28-2016 at 04:08 PM. Reason: Clarity

  4. The Following User Says Thank You to Dragan For This Useful Post:

    Buckeye (08-28-2016)

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats