# small sample size problems

#### Hannes

##### New Member
Hi

i'm doing my master thesis and i'm searching for a way to analyse differences with small samples. The smallest is n=4 (1vs 3). Is there an option to analyse potential significant differences with this small samples? I don't think so but i'm pretty new in statistics, so i would be glad to hear your answers.

friendly regards

#### hlsmith

##### Less is more. Stay pure. Stay poor.
To put it another way, what do you hope to gain from comparing 1 vs 3, what would that contribute to your field?

#### Hannes

##### New Member
To put it another way, what do you hope to gain from comparing 1 vs 3, what would that contribute to your field?
It's about the efficiency (time) that users take to for-full a task. At the end i want to say for different tasks that for a specific task there is a significant difference between the user of one group to users of the other group.

#### ooostats

##### Member
Your sample sounds like it is just way too small to make any claims like this, especially between groups. You'll need to collect more data and that is your only option if you want to talk about differences.

#### Hannes

##### New Member
Your sample sounds like it is just way too small to make any claims like this, especially between groups. You'll need to collect more data and that is your only option if you want to talk about differences.
Oké, thanks for the info.

#### Dason

I somewhat disagree.

I'm not saying it's ideal and you would be relying on lots of assumptions that you wouldn't have any way of checking. But you could theoretically do a two sample t-test assuming equal variances.

#### ooostats

##### Member
I somewhat disagree.

I'm not saying it's ideal and you would be relying on lots of assumptions that you wouldn't have any way of checking. But you could theoretically do a two sample t-test assuming equal variances.
Right, but how can the results possibly be interpretable? Whatever the result, the risk of it being a T1/T2 is too high. Also if any test were to be used, shouldn't it be a Mann-Whitney U?

#### Dason

I believe I said you would be relying on a lot of assumptions. And assuming those assumptions hold there isn't more risk of an error. The power will most likely be extremely low regardless.

#### obh

##### Well-Known Member
I believe I said you would be relying on a lot of assumptions. And assuming those assumptions hold there isn't more risk of an error. The power will most likely be extremely low regardless.
It may be okay If you want to compare the average mouse weight to the elephant but do we need statistics for this?

#### obh

##### Well-Known Member
It may be okay If you want to compare the average mouse weight to the elephant but do we need statistics for this?
I probably exaggerated... your point is correct and the power is also depend on the effect size you want to be able to identify. But a sample size of one is too less...

#### GretaGarbo

##### Human
We actually dont know anything about the size of the difference or the size of the standard deviation. So we just don't know if the power is high or low.

Remember that most "investigations"/"tests" are sample size n=1. You go to the doctor and she takes one blod sample. You go to buy new glasses and they investigate your eye sight once (n=1). Your friend comes to visit in your new home and you ask how long time the trip took. But none suggest to go back and drive again since it would be to uncertain to rely on just one observation.

Also if any test were to be used, shouldn't it be a Mann-Whitney U?
No, Mann-Whitney is not good here. There is a lower limit on the sample size where you can get a significance at all. (I don't remeber the limits. Please inform us.)

But if you do a t-test with a huge dfifference in population means, the t-test will give significance in all 1000 replications but the Mann-Whitney in none. The difference in length beween women and men (in this country) will be significant in 20% with a t-test but in none in Mann-Whitney in n=1 and n=3.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Is this a random sample and are users randomly assigned, are all of the comparisons planned, do multiple compasions/false discovery need to be controlled for?

Quit encouraging this person. Yeah theoretically some things are possible, but should they be done and do they contribute to reproducibility crisis?

#### obh

##### Well-Known Member
shouldn't it be a Mann-Whitney U?
With small sample you can’t check the normality assumption, so you can use the non-parametric test and get a clear cut result while you don’t know how to interpret an edge results in t-test when you aren’t sure about the assumptions.

But since we use rank we also lose information which is critical in very small samples. If you check all the combinations with only 4 subjects you can’t get more than a p-value=0.25 total of 4! Possibilities but on 3! Possibilities that group A is in the edge. 1!*3!/4! (one tail) or just ¼.
(if one group has only 1 observation then the other group must have 19 observations to get "best" p-value of 0.05, but with group 2 via a group of 4 you may get a better result "best" p-value 0.066 (2!*4!)/6!). 2 groups of 3 best p-value=3!*3!/6!=0.05.
This is only the best result p-value for the extream case.

Edge Example (as Greta suggested)
With a rank test, If you compare 2 groups of elephants, A:[800kg] B: [802kg, 860kg,890kg].
you will get the same result as comparing a mouse to elephants, A:[0.02kg] B: [802kg, 860kg,890kg]. "best" p-value 0.25.
But if you run t-test and get p-value=0.0000001 it may be incorrect because you don’t meet the normality assumption and p-value should be only 0.01 (and you need to know/guess the standard deviation)

Last edited: