I remember reading somewhere that the minimum sample size to conduct paired t-tests was ~30, as this will ensure that there is adequate power, robustness of normality, and reliability of results.
I have to critically review a study for my course, and they have used paired t-tests using a sample size of 17. Could I say that perhaps this was an inappropriate statistical test to use because of the low sample size, and that the non-parametric alternative (Wilcoxon) would have been more appropriate?
As for your critiques, the power critique would perhaps have been useful before the data was collected, but after the study has been run it kind of becomes a moot point. You can compute the retrospective power based on the observed effect size if you like, and argue that in the future a larger sample size would be more desireable for power reasons (assuming the power analysis indicates that the study was underpowered, which historically has typically been the case), but either way it doesn't have all that much bearing on the present study after the fact. The critique about Wilcoxon's test is unfortunately just off base, since the non-parametric alternative certainly won't have greater power than the parametric test.
In case you are interested to play around with some power analyses yourself, the table above was generated with the following R code:
params <- expand.grid(d = c(.2, .5, .8), power=seq(.5, .9, .1))
getPwr <- function(D, pow) pwr.t.test(n = NULL, d = D, sig.level = 0.05, power = pow,
type = "paired", alternative = "two.sided")$n
# table of MINIMUM SAMPLE SIZES (# of pairs)
matrix(ceiling(mapply(getPwr, params[,1], params[,2])), ncol=3, byrow=T,
dimnames=list(Power=seq(.5, .9, .1), EffectSize_CohensD=paste(c(.2, .5, .8), c("(small)","(medium)","(large)"))))
In God we trust. All others must bring data.
~W. Edwards Deming
As long as certain assumptions (normality, no outliers....) are met then the sample size of 17 pairs is not necessarily of concern. Kind of like what Jake mentioned above, talking about the power would be a good thing to point out.
As with a lot of things about statistics (like p-values) certain numbers are rather arbitrary and simply based on how sure you want to be about something (http://www.jedcampbell.com/?p=262).
I will add that if assumptions held and they found a significant difference than they did not have a sample size issue. If they reported no difference, then you would wonder if they had enough observations to discern a difference. The other thing you need to question with this sample size, is whether it was reflective of the population. Can 17 people tell you about all of the other people not included in the study?
Thanks so much for your help. The points you have made have definitely made me think about it a lot more and I have come up with a number of other issues that I can point out from it.
With regards to whether the sample can be seen as representative of the population - I have also been thinking about this, as their findings are specifically for children and adolescents with clinical anxiety disorders and who are seeking treatment from a speciality community mental health clinic. To me, it seems like having only 17 participants would make it difficult to say that any significant findings can be generalised to this specific population group, particularly when they did not have a wide variety of anxiety disorders within their sample. What do you guys think?
Bӧgels, S. M., & Siqueland, L. (2006). Family cognitive behavioral therapy for children and adolescents with clinical anxiety disorders. Journal of the American Academy of Child and Adolescent Psychiatry, 45(2), 134-141.
Let me know how you go, I have a couple of methodological limitations that I have identified already however I am a little unsure if they are correct, so it would be interesting to see if we identify similar ones.