T distribution - basic question I think

Bh78

New Member
#1
Hello,

Is it possible to use a t test to test whether an individual observation is from the same population as a (small) sample?
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Hello,

Is it possible to use a t test to test whether an individual observation is from the same population as a (small) sample?


I did not understand the last part of your question "... as a ..." It might help if you describe your question and its context. You would not have a mean with only one observation?
 

Bh78

New Member
#3
I did not understand the last part of your question "... as a ..." It might help if you describe your question and its context. You would not have a mean with only one observation?
If it is possible to say that a single observation is unlikely to be from a population by considering the z score. I just wondered if it was possible to make a similar comment on a single observation when you only have a small sample?
 

Dason

Ambassador to the humans
#4
You would not have a mean with only one observation?
Of course you would. It's pretty easy to take the mean of one observation.

If it is possible to say that a single observation is unlikely to be from a population by considering the z score. I just wondered if it was possible to make a similar comment on a single observation when you only have a small sample?
If you assume that the two groups have equal variances and are normally distributed then it would be feasible to do a t-test to test what you're interested in.
 

Dason

Ambassador to the humans
#6
Not really. It would still be a two-sample t-test. You wouldn't want to treat this as a one-sample t-test where you test the sample with more than observation against a "mean" of whatever was in the sample with just one observation because you would be treating that second value as having no variation.

Code:
> # generate some fakedata
> # both sets come from the same distribution
> y.grp1 <- rnorm(20, 0, 1)
> y.grp2 <- rnorm(1, 0, 1)
> y <- c(y.grp1, y.grp2)
> 
> y.grp1
 [1]  0.649044532 -2.256491187  1.097277554 -1.254684115  0.156735591
 [6]  0.272097697  1.142179114  0.095524117  1.612587487 -0.862135249
[11]  0.044291385  0.004164832 -0.569598917 -0.314329979 -1.853036794
[16] -0.517776731  0.331797174 -1.175629448  0.149509153  1.507455853
> y.grp2
[1] 0.6222212
> 
> grp <- rep(c("group 1", "group 2"), c(20, 1))
> 
> # Two sample t-test - this is what would be appropriate here.
> t.test(y ~ grp, var.equal = TRUE)

        Two Sample t-test

data:  y by grp
t = -0.6614, df = 19, p-value = 0.5163
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.953803  1.535259
sample estimates:
mean in group group 1 mean in group group 2 
           -0.0870509             0.6222212 

> # One sample t-test where you treat the value in the second group
> # as the mean you're testing against
> # This is what I'm saying you shouldn't do.
> t.test(y.grp1, mu = y.grp2)

        One Sample t-test

data:  y.grp1
t = -3.0309, df = 19, p-value = 0.006874
alternative hypothesis: true mean is not equal to 0.6222212
95 percent confidence interval:
 -0.5768476  0.4027459
sample estimates:
 mean of x 
-0.0870509
 

hlsmith

Less is more. Stay pure. Stay poor.
#7
But the second observation doesn't have any variation. Unless you assume a distribution?


What does the "rep" do?


You would make the conclusion you failed to reject that it came from a different distribution because the 95% CI contains "0"?


Still feels a little wonky, since you don't know the distribution of group 2.
 

Dason

Ambassador to the humans
#8
But the second observation doesn't have any variation. Unless you assume a distribution?
To do something like this you would have to make quite a few assumptions. In this case I assumed that both were normal and had the same variance. If that's the case then asking if the two groups came from the same distribution boils down to asking if they have the same mean.
What does the "rep" do?
In this case it created 20 "group 1"s followed by a single "group 2" which is used to identify the "group" that each observation belongs to.
You would make the conclusion you failed to reject that it came from a different distribution because the 95% CI contains "0"?
If you're talking about the first test then yes. But that's a good thing since all the observations came from a standard normal distribution. In the second case the CI crossing 0 doesn't matter since we were comparing against a mean of 0.6222212. But as I said before this isn't the approach you should take.
Still feels a little wonky, since you don't know the distribution of group 2.
Well of course you don't. But you never *really* know the distribution. Sometimes you gotta make assumptions to be able to do anything.