Clarify this explanation of why we use d.f.?

bruin

New Member
#1
I have read some variant of this explanation of d.f.'s many times - I'll quote the latest one that I read:

Degrees of Freedom: 1-Sample t test

You have a data set with 10 values. If you’re not estimating anything, each value can take on any number, right? Each value is completely free to vary.

But suppose you want to test the population mean with a sample of 10 values, using a 1-sample t test. You now have a constraint—the estimation of the mean.

This explanation would make more sense to me if they said that the constraint arises from using a sample variance to estimate a population variance. That would explain why you use d.f. on one-sample t but not one-sample z.

But, given that they say the constraint arises from using the sample mean to estimate the pop mean, I can't understand the discrepancy between z test and t-test. Both z-statistic and t-statistic use a sample mean to estimate a population mean.
 

bruin

New Member
#2
I think I kind of get the z/t discrepancy based on the fact that there is only one z (normal) distribution but many t-distributions (depending on n). Since n is fixed for a given t-distribution, that's why there's now a constraint, because you don't just need a certain sample mean, but you need a certain n of scores to equal a certain sample mean.

But is there a more detailed explanation someone can give me that fleshes this idea out a lot more? Do you simply have to have studied "mathematical statistics" (as opposed to the "practical stats" many students are taught) to really appreciate degrees of freedom at a deeper level than what I've said above?
 
Last edited:

hlsmith

Omega Contributor
#3
Well the variance and mean are both parameters if that makes a difference.


Your Z and T comparison makes sense to me, but I am "practical".
 

bruin

New Member
#4
Thanks hlsmith - I'll only bump this one time, I promise.

But does anyone have anything to add here? Isn't there any meatier/more-satisfying explanation possible than the one I gave, without recourse to the "mathematical" calc-based stats?
 

rogojel

TS Contributor
#5
hi,
I am also a purely "practical" guy and to me this is an issue if using the same term (d.f.) in two different ways. For a t-test d.f seems to me to be a simple label of the relevant distribution. I can understand why they call it d.f. but any other name would do (like "underlying sample size"? just an idea). This labelling has IMHO nothing to do with constraints on our data.

In calculating the variance we do have degrees if freedom instead of sample sizes - because we do have a constraint, the value of the mean.

I wonder what others think?
 
#6
hi,
I am also a purely "practical" guy and to me this is an issue if using the same term (d.f.) in two different ways. For a t-test d.f seems to me to be a simple label of the relevant distribution. I can understand why they call it d.f. but any other name would do (like "underlying sample size"? just an idea). This labelling has IMHO nothing to do with constraints on our data.

In calculating the variance we do have degrees if freedom instead of sample sizes - because we do have a constraint, the value of the mean.

I wonder what others think?
If you both are referring to the sample variance being calculated using (N-1) rather than N, this is done to provide an unbiased estimate of the population variance/sd. Using N creates a downward bias in the estimate and (N-1) corrects for that.

The Z test assumes the population variance is known which means you're technically not "estimating" the variance since it is a known quantity (look at background on this if you're curious). You're also assumed to be working with a "large" sample (ideally infinite, or the size of population, I would imagine).

Maybe another person can add a bit more or correct anything I noted that's incorrect, gotta run!
 

hlsmith

Omega Contributor
#7
d.f., the enigma. Yeah to piggy-back on ondan, when you have a larger sample closer to the population, a minus 1 doesn't mean as much.


I always try to remember the degrees of freedom that can vary, or the idea that is similar to dummy coding where you only need so many terms to explain all of the model terms and the others are available to vary. I am sure I just butchered that concept.
 

Dason

Ambassador to the humans
#8
One thing to note (and explains the difference between the z-test and t-test) is that the degrees of freedom are associated with the error term. So when you're thinking about how many parameters need to be estimated it's specifically just how many parameters need to be estimated for you to get an estimate of the variance of the test statistic. So for a z-test you *know* the variance already so you don't have any degree of freedom issues to worry about. For a simple one-sample t-test you have to estimate a mean and that mean gets used in the estimation of the standard deviation so that's where you lose your degree of freedom.

Ultimately the answer is "that's how the math works out" but the intuition which follows the math has you looking at the estimate of the variance and how many independent observations you actually end up with to estimate that variance.