paired t test for groups with unequal sample size

#1
Hi,
well, i want to compare two paired groups (pre and post). the two groups have different sample sizes, and when i used paired student test (paired t-test), i observe that so many of my data are not taking into account (For example : Group 1 = 12 samples; Group 2 = 8 samples, when the t test is applied it calculate the p value between 8 samples). I wish you got my idea. so if you have any recommendations or explanation of that i will be so gratfull.
Thank you
 

hlsmith

Not a robit
#2
So you use paired tests when you have two measures on the exact same sample. Thus, the two samples are paired on units (subjects). Given this, I completely don't understand why your samples would not be the EXACT same size since you should have one sample with two measurements for all units?

Please clarify!
 

j58

Active Member
#3
Hi,
well, i want to compare two paired groups (pre and post). the two groups have different sample sizes, and when i used paired student test (paired t-test), i observe that so many of my data are not taking into account (For example : Group 1 = 12 samples; Group 2 = 8 samples, when the t test is applied it calculate the p value between 8 samples). I wish you got my idea. so if you have any recommendations or explanation of that i will be so gratfull.
Thank you
Before you do anything, you need to give serious consideration to whether the surviving samples are comparable to the lost samples. If you beleive that they are then read on.

If you have access to a decent statistical package, like SPSS, SAS, or R, you can use a linear mixed model to handle this situation. The data set will need to be formatted with one observation per line, and you'll need two independent variables: a sample ID number, which will be the same for observations taken on the same sample, and a variable to indicate whether the observation is pre or post. The IDs will be a random factor; the pre–post indicator, a fixed factor. The software will calculate a t-test using all the data while taking into account the partial matching.

Alternatively you can perform a t-test by hand. The t-statistic is

t = (m2 - m1) / se , where

m1 and m2 are the sample pre and post means, respectively, calculated using all the data; and se is the standard error of the difference, calculated as follows:

se = sqrt[s1^2/n1 + s2^2/n2 - (2/n)cov(x,y)] ,

where s1 and s2 are the sample pre and post standard deviations, n1 and n2 are the pre and post sample sizes, and n is the number of complete pre-post pairs. You will have to look up how to compute the sample covariance, cov(x,y), which you will compute using only those samples having both pre and post observations, You will also need to look up how to compute the degrees of freedom for the t-test. See the wikipedia article on the "Welch-Satterthwaite equation".
 
Last edited:
#4
Thank you for your response.
Actually, i have one group of subjects, each subject gives me different number of observations before and after the use of a system. That's mean that every subject in the "pre" session gives me a different observations number than for the "post" session.
For example: one subject of the group give me 11 observations in "pre session" and for "post session" he gives me 15 observations. So, that's why i will have lost data if i use paired t test to study the statistical difference between pre and post sessions with the same subjects.
I hope you get my point.
 

j58

Active Member
#5
Why didn't you describe what you "actually have" in your first post? What you "actually have" bears almost no resemblance to what you described in your original post, so thank you for wasting my time. Nonetheless, the problem you "actually have" is best handled by using the linear mixed model I described above. If you cannot extrapolate my explanation, above, to the problem you "actually have," then you should hire a statistician.
 
Last edited:
#6
I have no attention to waste your time, I am new in statistics and i may be not explain the problem very well in the first place, so i should thank you for your effort for the explanation it was very helpful and i will work on it. I will come back to you with my findings.
Many thanks
 

hlsmith

Not a robit
#7
@lhoucine - you are not wasting anyone's time, this is how we all get better at statistics and formulating/presenting questions. Look forward to hearing from you again..

@j58 - I liked the suggestion for using multi-level models. I am not overly versed in the method, but I wonder about the sparsity in clusters (e.g., 1 or 2 observations clustered in an individual). Do you have experience conducting such analyses? Lastly, you make a good point to the OP about the mechanism for data missingness!
 

j58

Active Member
#8
@hlsmith - I have a lot of experience using the mixed model I described from having analyzed crossover trials with missing data. The data aren't actually sparse, since (in the OP's situation) only four parameters are estimated: intercept, treatment effect, subject variance, and residual variance.
 

hlsmith

Not a robit
#9
@j58 - hmm, I get a little suspicious with small samples. But if you have done a lot modeling similar to this, you would know better if 12 subjects with a total of 20 obs was sparse or not.

I don't know the context of your work, but with the cross-overs did you ever have to control for elapsed time if you had two groups? Or was the passing of time deemed negligible (e.g., control group not initially receiving treatment having to go longer before entering the experimental group or was it a point experiment with no or minimal washout)?
 

j58

Active Member
#10
@hlsmith - Yes, you have to control for time, or what is typically called a period effect in the crossover trial literature. In the basic 2-period, 2-treatment crossover design (call the treatments A and B), subjects are randomized to treatment sequences AB and BA. With equal numbers of subjects in each treatment sequence and no missing data, period effect is completely controlled, since the effect it will have on the observed treatment effect will be equal and opposite in each sequence. More typically (in my experience), there is slight imbalance between the treatment sequences, creating slight confounding of period and treatment effect, which one can control by including a period variable in the linear model.
 
Last edited:
#11
Well, I had trying your suggestion about using the linear mixed model method, but unfortunintely, an error appears while using a paired sample t test equivalent model in SPSS statistical software. "The levels of the repeated effect are not different for each observation within a repeated subject", I don't know what's wrong,
I will be so gratefull for any suggestions.
 

j58

Active Member
#12
I don't know SPSS, but the situation that you eventually described cannot be analyzed by using a paired t-test. You don't have paired samples. You have multiple replicate observations for each sample before and after treatment. You don't have the statistical background to analyze this data. You need to find an expert who does and allow them to analyze your data for you.