I don’t think that a t-test would be correct.

It seems to me that each sentence can be a 0/1 trial with a probability p. So if each sentence is independent of the previous ones, then the sum of events would be binomial distributed with parameter n and p, where n is the number of sentences.

But it seems like the number of sentences varies wildly between children. I made up some data from the example:

Name SV VS n

John 30 70 5

Katy 40 60 20

Jane 50 50 7

Of course the different n will mean that the proportions p will be estimated with different accuracy.

It seems reasonable to assume that all children does not have the same population probability p, but that the probability of having a subject-verb sentence is different for different children. That would lead us to a mixed model where there is an individual random child effect and within the child there is a binomial variable about the probability of saying a subject-verb sentence.

A model like this:

log(p_i/(1-p_i)) = mu + b_i

Where p_i is the probability of child i of saying a subject-verb sentence, where mu is an overall estimate (for all children) of a subject-verb sentence, and b_i is an individual random effect (assumed to have a zero mean (which simply means that some children are above the over all average and some are below it).

This might sound complicated but it is just a usual mixed model that is available in most statistical packages.

I am curious if other think that this is a reasonable model?