I am running a learning experiment in which both training subjects and

controls complete a pretest and posttest. All analyses are being

conducted in R. We are looking to compare two training methodologies,

and so have run this experiment twice, once with each methodology.

Methodology is a between-subjects factor. Trying to run this analysis

with every factor included (ie, subject as a random factor, session

nested within group nested within experiment) seems to me (after having

tried) to be clumsy and probably uninterpretable.

My favoured model for the analysis is a linear mixed-effects model, and

to combine the data meaningfully, I have collated all the pretest data

for controls and trained subjects from each experiment, and assumed this

data to represent a population sample for naive subjects for each

experiment. I have also ditched the posttest data for the controls, and

assumed the posttest training data to represent a population sample for

trained subjects for each experiment. I have confirmed the validity of

these assumptions by ascertaining that a) controls and trained listeners

did not differ significantly at pretest for either experiment; and b)

control listeners did not learn significantly between pretest and

posttest (and therefore their posttest data are not relevant). This was

done using a linear mixed-effects model for each experiment, with

subject as a random factor and session (pretest vs posttest) nested

within Group (trained vs control).

Therefore, the model I want to use to analyse the data would ideally be

a linear mixed-effects model, with subject as a random factor, and

session (pre vs post) nested within experiment. Note that my removal of

the Group (Trained vs Control) factor simplifies the model somewhat, and

makes it more interpretable in terms of evaluating the relative effects

of each experiment.

What I would like to know is- a) would people agree that this is a

meaningful way to combine my data? I believe the logic is sound, but am

slightly concerned that I am ignoring a whole block of posttest data for

the controls (even though this does not account for a significant amount

of the variance); and b) given that each of my trained subjects appear

twice- one in the pretest and once in the posttest, and the controls

only appear once- in the pretest sample, is there any problem with

making subject a random factor? Conceptually, I see no problem with

this, but I would like to be sure before I finish writing up.

Many thanks for your time

Dan