I work at an alternative high school in Oakland, CA and I volunteered to do some research into our graduation and retention rates. I have a lot of descriptive statistics pertaining to the historical data but my supervisor and I want to be able to make some inferences concerning the likelihood of a new student graduating from our program if for instance they make it through the first 6 months.

Here are some of the basic details about my data set:

471 - Students

101 - Graduated (~21%)

370 - Left the program (~79%)

Of the 471 Students

212 - Lasted at least 6 months (~45%)

99 - Graduated (~47%)

113 - Left the program (~53%)

Having provided these numbers, then, my main question is what type of analysis do I need to undertake to be able to state "If a student makes it 6(or 9 or 12, etc) months, there is a X% chance that s/he will graduate"?

This seems to be a simple conditional probability problem but I am getting hung up on the probabilities of the different events. Is it statistical accurate to say the probability of a student graduating our program is 21% or the probability of a student lasting at least 6 months is 45%?

Thank you again for your help and guidance in this matter. ]]>

Quick question about the interpretation of effect sizes.

I was taught that (in line with APA formatting) you should always report an effect size (so for a Pearson's correlation, I should report rē), but that you only

Is the interpretation of rē the statement about the % of variability that can be explained? And if so, does that mean I just report rē and leave out the statement about % when I have non-significant results?

Thanks! ]]>

I'm analyzing count data in stata using negative binomial regression because of overdispersion. I'm reporting incident rate ratios using the model I've described below. Is there also a way to calculate an absolute rate reduction based on the level of exposure? This study is policy related, so I think reporting the absolute rate reduction would be of additional benefit.

Exposure: 4 level categorial variable

Outcome: Injury count

Covariables: several continuous variables

Offset: log of the population at risk

Clustered at the state level b/c of correlated data.

Thanks for any advice you can offer. ]]>