Effectiveness of Tutoring Program on Student Graduation?

Hi all,

I'm hoping that someone can help me verify my planned approach to analyze the effectiveness of a new tutoring program on student graduation status/rates.

The tutoring program began several years ago. Students in the program are called "PRB students" and the students not enrolled in it are called "non-PRB students". One of the leaders of this program approached me to ask if we could show:

1) that the program has had a statistically-significant positive impact on student graduation status/rates, and
2) how the predictive impact of the program (if it exists) compares to other predictive factors, e.g. GPA, SAT scores, etc.

Based on my readings, it seems like a Discriminant Analysis and/or Logistic Regression would be most appropriate, since multiple independent variables (PRB participation, GPA, SAT, etc.) likely play a role in determining final membership status in the dependent variable categories of "graduated" or "not graduated".

According to a reference book, the "Structure Matrix" produced from a Discriminant Analysis should provide me with the strengths of the correlations between each separate predictor and the overall discriminant function. This allows me to both rank the importance of PRB participation in comparison to other factors AND to gauge its strength.

Likewise, if using a Logistic Regression analysis, the exponent of the coefficient for each independent variable should tell me how much more or less predictive each factor is than the others, as well as giving some idea of the "strength" of the predictive factor.

Can someone confirm for me if these are appropriate techniques to use for this kind of analysis?



Not a robit
Was enrollment into the program random or did student have to self-select? This will help you understanding variables that may be mediated, interactive, or confounding. If dumb kids sign up, their faculties may have been lower to start with or if it was overachievers, the opposite comes into play. I always think about it as a weight loss intervention, if people's weights were different to start with having an end BMI under say 30 is a poor study, without controlling for baseline covariate differences. Thus if all the people in your intervention group started with a BMI around 50 and the control group had a BMI around 30, getting to <30 is kind of biased.

You need to run a logistic reg on student characteristics with outcome being enrollment in the program. This will help you find imbalances between groups. Exp(coefficients) = the odds ratio of the outcome. So sex, with reference group being females and coefficient = 0.7, exp(0.7) = Males have a 2 times greater odds of the outcome than females.

Analytic design, if you have historic data for before the program, I would look at an interrupted times series with a negative control group (non-enrollees), using inverse weighted propensity scores. Which this isn't a novice design, but not too complicated. Conducting other approaches may not be the best specification.

Tell us more about the sample size and years of data you have!


New Member
Hi hlsmith,

Thanks so much for the quick reply and helpful advice! Let me answer some of your questions:

1) Enrollment into the program was self-selected, not random.

2) Per your advice, I ran the log reg on some select student characteristics with outcome being enrollment in the program. The characteristics were high school GPA, high school percentile rank, total SAT score, Pell Grant recipient status, sex, and minority status. "stype" is the group membership variable, with value PRB representing students who joined the tutoring group and TDL representing traditional students. Here are the results:

View attachment 6690

My interpretation: the global tests indicate that our model is significantly better than no model. The type 3 analysis of effects indicate that several of our predictor variables - hsgpa (HS GPA), hspctrnk (HS percentile rank), sattotal (SAT score), sex, and minority status - very likely have an impact on group membership outcome.

Based on the odds ratio estimates, HS GPA appears to be the strongest predictor of joining the tutoring group (PRB). Since other variables of interest also appear to predict group membership, my guess is that we reject the null hypothesis that these variables (except for Pell_Recipient status) play no role in predicting group membership and that the next step would be to try to normalize the TDL group data to the PRB group data, i.e. massage the TDL group to look more like the PRB group (e.g. by removing students with low hsgpa, hspctrnk, and sattotal), then run a Log Reg on that data to verify that these variables no longer play a role in determining group membership, then run a Log Reg on this new population with graduation status as the outcome variable.

Am I close to the mark or way off? :D

Just for fun, I am also including the results of the Log Reg with graduation status as outcome, without having normalized the data in any way:

View attachment 6691

I suspect it's not worth trying to interpret those results at the moment, so I'll leave it up to your wisdom to comment on anything that might look interesting.

3) We do have historic data going back many years, so I did some reading on the interrupted time series that you suggested, but it seems to me like that analysis is most appropriate for treatments that are applied to an entire population, not necessarily one where a new opportunity arises and some subjects elect to participate. Can you help me understand why this analysis is justifiable for our situation? Can you also advise some next steps, if appropriate?

Thanks again for the excellent help, hlsmith! :)
Last edited: