Suppose there are 1,050 students all age 15 and each student is randomly assigned to one of seven different teachers. However, teachers are allowed to decide how many students they want to accept in their classroom. One teacher is only willing to accept 32 students, for example. Another teacher is willing to accept 279 students. The distribution of students is shown below.
The seven teachers are given one task only. In just 5 minutes, they are supposed to teach their class how to draw a pony. After the lesson has finished, each teacher must give her students a test which involves drawing a pony on a sheet of paper. The students cannot cheat when they are taking the test. They can either pass or fail the test. Passing (or failing) is completely subjective and entirely at the discretion of the teacher who taught the lesson.
The overall "pass rate" is 84% for the total of 1,050 students. However, some teachers are much lower than this rate, while others are much higher.
Using SAS, for each of the 7 teachers I want to be able to quickly know if the pass/fail rate is within an acceptable range, but I don't know how to calculate that given the fact that each teacher has a different class size. For example, at first glance it looks like teacher Rogers has a higher-than-average pass rate (86% versus 84%) but the significance of teacher Rogers having an 86% pass rate is affected by the fact that she only taught 32 students.
Also, if I compare each individual teacher's "pass rate" with the mean rate for all 7 teachers, there is a problem: The mean for the 7 teachers includes the pass/fail data for the teacher I am comparing. In other words, if I want to compare teacher Rogers with the other teachers (the "peer" group), I need to remove the data for teacher Rogers from the peer group, correct?
For example:
I can manually write the necessary code to calculate odds ratios and test for significance with Chi square for each of the seven teachers, making sure to remove from the "peer group" the data for the teacher who is being compared to the control group. But that seems like a very slow and cumbersome process.
Can anyone suggest an easier and faster way to compare the pass/fail rates among the 7 teachers? I would be most grateful! =) Especially if it involves SAS.
1. Do you think that your "teacher willingness" criterion is at odds with your random assignment criterion?
2. Why do you want to disallow each teachers' class from the odds calculation? Since your experiment involves a census of all students, why not count everybody?
Please forgive me if this hypothetical scenario seems a little weird, and please remember this is a fictitious and purely hypothetical (albeit entirely unrealistic) situation. It is based on a real-life scenario which doesn't involve teachers, students and tests of drawing ponies. Instead, the real life scenario involves something comparable, but which is much more complicated to explain explicitly. That's why I made-up this scenario, knowing in advance that some of the assumptions are not impossible. For example, the idea that all students would be age 15 is unrealistic as we all know there are no "homogenous classrooms" of students, and we all know that variation in SES, race, sex, etc., does abound.
1. I see what you mean. This is a hypothetical scenario, and for my purposes I needed each sample to be of different size. Perhaps it would be better if we just pretend that each class size is different, for reasons that are beyond our knowledge. In this scenario, it is important for each teacher to have a different number of students. I don't want to simply be able to compare the 80% "pass rate" in the Johnson classroom (odds of passing 174/44 = 3.95) against the 86% pass rate for the Johnson classroom (odds of passing 27/5 = 5.4) without taking into consideration the differing class sizes. An odds of 3.95 may seem lower than an odds of 5.4 at first glance without testing for statistical significance, which takes into account the size of each sample.
I will answer your question #3 next, because it helps answer your question #2.
3. I want to measure the "leniency" of each teacher in passing students, assuming the test assignment is uniform for all students, but also the outcome of the test is completely subjective and at the discretion of the teacher, and also assuming there is very little random variation among the students, and taking into consideration the fact that not all teachers taught the same number of students. Some teachers might be more likely than others to pass students because they are more lenient. Essentially, I want to see whether or not some teachers are "more strict" or "more lenient" than some sort of "benchmark." But I am having difficulty deciding on a "benchmark" against which each teacher should be compared. Should the benchmark be the overall odds rate for the total population? That leads me to your question #2 . . .
2. In comparing rates between two different groups (such as group #1= Rogers classroom, whereas group #2= peers) I don't want to intermingle the data for the two groups being compared because that seems strange to me. Do you agree? My concern is that the rate for one classroom might be an extreme outlier such that it perhaps influences the overall population rate, so that was the reason why I though perhaps there should be better way of "benchmarking" the rate against which each classroom should be compared.
You should just describe your actual scenario, this hypothetical is too arduous. We all work with cumbersome data and scenarios, this just seems silly.
If you want just basic odds ratios run a proc freq with a oddsratio option and make sure you modify your level of significance to account for the 20+ comparisons. So perhaps you want to set alpha to 0.001.
It isn't prospective data. The tests are completed and finished. I didn't think it was that complicated of a scenario. Just measuring the frequency of a subjective decision being made (passing or failing a student subjectively) and we want to compare the rates to see if some teachers are being too lenient in their subjective decision.
Thank you for letting me know that this is just too silly for you. I won't bother you any further.
I did not say it was too silly, it just makes it difficult to explains particulars if you are not directly writing about what you are doing. In the long run, it probably took more time since you have to try to shape the hypothetical to fit the actual scenario and deal with people's confusion.
When people do lost to follow-up or non-respondent analytics, trying to see if a subgroup differs from the sample - typically you compare the subgroup to the total sample including themselves, but I don't think you would conduct ORs. Perhaps someone else can respond if they have an opinion.