I am running a learning experiment in which both training subjects and
controls complete a pretest and posttest. All analyses are being
conducted in R. We are looking to compare two training methodologies,
and so have run this experiment twice, once with each methodology.
Methodology is a between-subjects factor. Trying to run this analysis
with every factor included (ie, subject as a random factor, session
nested within group nested within experiment) seems to me (after having
tried) to be clumsy and probably uninterpretable.
My favoured model for the analysis is a linear mixed-effects model, and
to combine the data meaningfully, I have collated all the pretest data
for controls and trained subjects from each experiment, and assumed this
data to represent a population sample for naive subjects for each
experiment. I have also ditched the posttest data for the controls, and
assumed the posttest training data to represent a population sample for
trained subjects for each experiment. I have confirmed the validity of
these assumptions by ascertaining that a) controls and trained listeners
did not differ significantly at pretest for either experiment; and b)
control listeners did not learn significantly between pretest and
posttest (and therefore their posttest data are not relevant). This was
done using a linear mixed-effects model for each experiment, with
subject as a random factor and session (pretest vs posttest) nested
within Group (trained vs control).
Therefore, the model I want to use to analyse the data would ideally be
a linear mixed-effects model, with subject as a random factor, and
session (pre vs post) nested within experiment. Note that my removal of
the Group (Trained vs Control) factor simplifies the model somewhat, and
makes it more interpretable in terms of evaluating the relative effects
of each experiment.
What I would like to know is- a) would people agree that this is a
meaningful way to combine my data? I believe the logic is sound, but am
slightly concerned that I am ignoring a whole block of posttest data for
the controls (even though this does not account for a significant amount
of the variance); and b) given that each of my trained subjects appear
twice- one in the pretest and once in the posttest, and the controls
only appear once- in the pretest sample, is there any problem with
making subject a random factor? Conceptually, I see no problem with
this, but I would like to be sure before I finish writing up.
I have a couple of comments about your approach. Since you are collecting information from the same subjects before and after a treatment, we can assume that these measurements are not independent. In these types of instances, an appropriate measure of learning would be the result of the pre-test subtracted from the result of the post-test. In this way, you can eliminate one nasty factor (session) from your model. I feel that this approach is quite accepted in the statistical community.
Now all that remains is the control versus treatment factor and the type of treatment factor. These factors are now cross-classified (no nesting) so you can perform a simple 2-way ANOVA to test for a difference between the two learning schemes while accounting for all of the variation in the model.
However, this model fails to test for a significant increase from control to treatment.