t, Z and sigma again

#1
Wikipedia says:In probability and statistics, Student's t-distribution (or simply the t-distribution) is any member of a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the sample size is small and the population standard deviation is unknown. " I/e asked several times, here, two questions:
Why t when sigma is unknown vs Z when sigma is known?
Is sigma ever known?
I've spent a week on this, and find no mathematical reason.
I realize that this isn't as impressive as opining about the matrix of Smergel-Grundhaus chi squared inequalities; but need to get straightened out on this oh so simple matter.
Anyone?
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Good question - I never really think about using the z-test. I feel like they teach it, but then given my field I have never used it. Also, I feel like both the t- and z- get taught, but in real life most settings require a multivariable setting. I also agree, that I don't grasp the setting where you would have a sample but also the accurate SD of the exact population the sample was drawn from.

The t- and z- have different peakednesses, I wonder if it is related to accurate coverage.
 
#3
Why t when sigma is unknown vs Z when sigma is known?
This why stuff is why they gave socrates the hemlock! but good on ya for yo persistance.

I guess one way to look at is that the reason 'why' is that t-test is most powerful test for normal data when SD is not known and Z-test is most powerful when it is. if it had been some other test statistic, we'd be asking why that one. So get back to runnin' your t-tests!
 
#4
How much more powerful? What are the units of measure? It sounds a lot like "it's so because I say it's so." If I don't know WHY, I either find out WHY or admit I don't know WHY.
 

Dason

Ambassador to the humans
#5
The mathematical justification is covered in almost all mathematical statistics textbooks. Requires a decent amount of math background to derive the results though.
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
Not the answer you are looking for, but you could do some simulations. Simulate a population, then draw samples from it without replacement and conduct ttests and z-tests on samples.
 
#7
Not the answer you are looking for, but you could do some simulations. Simulate a population, then draw samples from it without replacement and conduct ttests and z-tests on samples.
I did it. The standard error of sigma with n = 30 is .129. With sigma critical at 0, +1,+2,+3 sigma, solve for t and Z, lookup Pt and Z </= these t and Z criticals, find ratios for Ps.
t vs Z.jpg
For example, the difference between sigma and sigma + 3 stdev of sigma for a t test is 4% see 1.04, and for a Z test is 3.9% see 1.039. The same error on both tests, of sigma, makes ~ the same result on both tests. I did this for a week, looking for the difference; it eludes me. Anyone?
 

obh

Active Member
#8
Not the answer you are looking for, but you could do some simulations. Simulate a population, then draw samples from it without replacement and conduct ttests and z-tests on samples.
Code:
if(!"BSDA" %in% installed.packages()){install.packages("BSDA")}
library(BSDA)

reps <- 100000  # number of simulations
n1 <- 20;   # sample size
#population
sigma1 <- 12# true SD
mu1 <- 100# true mean

pvalues_t <- numeric(reps)
pvalues_z <- numeric(reps)
set.seed(1)
for (i in 1:reps) {
  x1 <- rnorm(n1, mu1, sigma1) #take a smaple
  s1=sd(x1)
  pvalues_t[i] <- t.test(x1,x2=NULL,mu = mu1,alternative="two.sided")$p.value
  pvalues_z[i] <-z.test(x1, y=NULL, alternative = "two.sided", mu = mu1, sigma.x = s1)$p.value
}
mean(pvalues_t < 0.05)
mean(pvalues_z < 0.05)
Result, as expected :
> mean(pvalues_t < 0.05)
[1] 0.04986
> mean(pvalues_z < 0.05)
[1] 0.06479
 
#9
I don't know what this contraption is, looks like n = 20 and many reps. We're not breaking new ground here, but p t ~ .05 and p Z ~ .065 is not a big difference. Please do it again with n = 30, and if possible, n = 100. Thanks;
joe b.
 
#10
just google 'neyman-pearson lemma', thats basically how you know t-test is most powerful of its size, as i recall. Could be wrong about that, since it is basically not something to think about to much once you get to runnin' t-tests.
 

obh

Active Member
#11
Hi Joe,

The t-distribution has heavy tails compares to the normal distribution, and the tails are heavier for a small DF value, limit to the normal distribution tails for a large DF value.
The difference between 0.05 and 0.065 is very big, as the region of rejection is in the tail.
 

obh

Active Member
#12
Hi Joe,
I ran the simulation over a list of samples.
Even for a sample size of 30, it is better to use the t-test. Actually there is no reason to use the z-test ...
t_z_pvalue0.png
 
Last edited:
#13
Hi Joe,
I ran the simulation over a list of samples.
Even for a sample size of 30, it is better to use the t-test. Actually there is no reason to use the z-test ...
View attachment 2181
The question: Why is this true? "In probability and statistics, Student's t-distribution (or simply the t-distribution) is any member of a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the sample size is small and the population standard deviation is unknown. " Why is t better when sigma is unknown????
I think mine answers that t is not better.
Does this graph answer the question? Please explain. (What are the p values of?)
Thanks;
joe b.
 

Dason

Ambassador to the humans
#14
When sigma actually isn't known using the T distribution is the mathematically appropriate thing to do because the t-score that you calculate (x - mu_o)/(s/sqrt(n)) will have a T distribution if the null hypothesis is true. If you pretend like s is actually sigma and use a Z distribution then you're doing it wrong. I'm not sure what you're referring to when you say "I think mine answers that t is not better".
 
#15
When sigma actually isn't known using the T distribution is the mathematically appropriate thing to do because the t-score that you calculate (x - mu_o)/(s/sqrt(n)) will have a T distribution if the null hypothesis is true. If you pretend like s is actually sigma and use a Z distribution then you're doing it wrong. I'm not sure what you're referring to when you say "I think mine answers that t is not better".
If you'll look above, you will see my contribution-a table.
I calculated t and Z using the same x bar, mu or s, and n = 30.
s/sigma started at 50
At t = 2.19089, Pt<2.19089 = .9817
At Z = 2.19089, Pt<2.19089 = .986
Standard error of sigma at n = 30 is .129
I then increased s/sigma to 1.129 * 50, 1 sd north, simulating error
At t = 1.1940558, Pt< 1.1940558 = .969
At Z = 1.940558, PZ< 1.940558 = .974

.9817/.969 = 1.013
.986/.974 = 1.012

then to sigma/s +2 s; then to +3s

As sigma/s varied, Pt or s < x varies little.
Both have ~ the same sensitivity to sigma +/- delta sigma.

What's going on is, when t = Z, Prob t<t crit ~ Prob Z<Z crit
Close, but not equal, at least at n = 30.

t works < n = 30, or so, because s goes wonky below ~ 30, even with Bessell.
Monte Carlo said:
 

Attachments

#16
The question is: "Does t give a more accurate picture of x bar - mu than Z".
If n<30, YES, we knew that.
If n > 30, NO, whether we know or estimate the variance, neither t nor Z is more accurate.
Proof:
t vs Z 060320.jpg
 
#17
just google 'neyman-pearson lemma', thats basically how you know t-test is most powerful of its size, as i recall. Could be wrong about that, since it is basically not something to think about to much once you get to runnin' t-tests.
I googled, read, found no mention of t or Z tests. E-mailed Neyman, Pearson and Lemma, (Leonard), no response to date.Thanks for the cite; joe b.
 
#18
Yeah, I think the thing here is you have to recognize that a t-test is a 'likelihood ratio test', or something like that. That is a little opaque so you might have to dig a bit.
 
#19
I thought I might understand WHY When sigma actually isn't known using the T distribution is the mathematically appropriate thing to do because the t-score that you calculate (x - mu_o)/(s/sqrt(n)) will have a T distribution if the null hypothesis is true. If you pretend like s is actually sigma and use a Z distribution then you're doing it wrong. I'm not sure what you're referring to when you say "I think mine answers that t is not better"., when I cannot find a case-see the tables-where Z and t results differ, n>30 or so. Perhaps "Student's t-distribution (or simply the t-distribution) is any member of a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the sample size is small and the population standard deviation is unknown. ", the "unknown" part is an example of conventional wisdom. Guess the WHY doesn't exist.
Thanks;
joe b.
 

obh

Active Member
#20
Hi Joe,

I showed that when you estimate the standard deviation, the t-distribution, give better results than the normal distribution, even when n>30.