# T test on whole sample or on averages

#### theblackalchemist

##### New Member
Background:

Consider the following experiment:

There are two groups of subjects, Group A have been treated with a drug that affects the muscle fibre diameter and Group B with a placebo. After treatment, a biopsy is performed, and the diameter of a given number of muscle fibres looks like this.

Note that as this is a controlled experiment, the only variation between the subjects is the treatment, and the subjects themselves.

(Note: Random data typed out to illustrate concept, variances do not represent actual data)

Treatment A:
Subject 1: 10,12,15,16,17,20,45
Subject 2: 9,11,12,14,15,16,20,37

Treatment B:
Subject 3: 21,22,25,26,34,43
Subject 4: 24,25,25,26,35,40

Now, when evaluating the differences by a t test, I have two options:

Option 1:

I pool data from subjects 1 &2, and pool separately 3 & 4 and perform a t test on the extended dataset:

So essentially, the groups will be

Treated: 21,22,25,26,34,43, 24,25,25,26,35,40
Placebo: 10,12,15,16,17,20,45, 9,11,12,14,15,16,20,37

Option 2:

I take the averages of the subjects and then perform the T test.

Essentially the groups will be:

Treated: 28.5, 29.16,
Placebo: 19.28, 16.75

Question:

Now, convention (my team) say I go for Option 2, but the thing I am afraid of loosing is the variability within each subject, when I take the average. Can I have your thoughts on this ? I know that neither Option 1 nor 2 are ideal, and am at a loss as to what test to use. For the simplicity of the question, assume normal distribution, and equal variances.

#### mostater

##### New Member
Hi theblackalcehmist,

You can't go with option 1 and a t-test because that would exaggerate your sample size. In the example you gave, instead of 2 patients on placebo it would look like you had 15! Option 2 would be better in this sense. However, you are correct that you would lose the variation within each subject if you did a t-test using the averages. To account for the variation within subject, consider conducting a cluster analysis. This can be done through a mixed model analysis with group (treatment/placebo) as a fixed effect and a random intercept term for each subject. You will probably want to do this using stat software like SAS, R, SPSS, Stata, etc.... You should be able to obtain mean estimates from the model. Testing the difference between groups would be the same as testing the beta coefficient for group to be different than 0.

#### theblackalchemist

##### New Member
Hi Mostater,

Many thanks. Essentially, If I understand you, you are asking me to set up a GLM ? I do remember playing around with those a while ago, I have to look up my notes. Thanks again !

KR
PH

#### Miner

##### TS Contributor
This appears to be a spatial repeated measures design.