Comparing small sample to large population using ANOVA


I have a problem I was hoping to get some help with. I am trying to evaluate the impact of an after school program on GPA, Discipline offenses, and attendance for a school district.

The district has provided me with the quarterly data for each of the DVs, as well as whether or not a student was in the after school intervention in a given quarter. Imperfect data, but it's what I've got. I am planning to do a repeated measures ANOVA to compare the DV before, during, and after the intervention quarter for those students in the intervention.

My problem:
I've got about 100 kids in the intervention and their data, and the data from 15,000 kids not in the intervention (basically the whole middle and high school population of the district). I want to do a group comparison of kids in the program to those not in the program, but I've got 15,000 kids to choose from. If I include all the kids, I have major power issues. Notably, the after school kids differ markedly from the rest of the population on other variables I plan to include in the model as fixed effects: race (Black/white), Sex, lunch status (free/reduced or self-pay), and IEP status (whether or not they have an IEP).

I've tried taking random samples from the larger population, but every random sample I take results in different results. I've considered doing some sort of stratified sampling (but which IV would I use? Maybe I could make groups consisting of all combinations of each dichotomous IV and proportions from the population that are similar to those among the after school kids)? Or matching the students in the program with other students outside the program with similar profiles?

Any suggestions?
Last edited:


Less is more. Stay pure. Stay poor.
As you know from your quandary, there are many options, all with pros and cons. As for the matching you could match 2-3 student exactly to sample. You then would not need to incorporate these variables in the model (though would not be able to examine their effects). I wonder if you need to match on baseline GPA or perform descriptive statistics to ensure matching our other types of sampling procure equal initial GPAs or other variables, if your not controlling for them (e.g., if you are starting with a 4 point you may not get much change or ability for variability.

It also sounds like if you are adding levels, you may just switch to a mixed model instead of continuing the ANOVA, which may have issues with missing data or issues in determining covariance structures.