I want to get some feedback on the approaches I've considered for an analysis. Here is the setup

Unit of analysis: Physician

Variables: Physician Specialty, Number of Total Patients they take care of, Number of patients who are diabetic.

A typical line of data looks like this:

Code:

```
Physician Specialty Number of Patients Number of Patients Diabetic
1 Cancer 50 25
2 Internal Med 30 10
3 Pediatrics 20 5
4 Cancer 40 5
N=10,000 physicians in total.
```

Our first approach involved computing individual level physician rates as an outcome. Once we have these for each physician, we use a simple 1 way ANOVA on them, with specialty as the between groups factor.

The second approach is to use a Poisson regression. Here I would model the number of diabetic patients as a function of the specialty, with the total number of patients as an offset term.

I know the second way is probably more sound. My question is whether the first method is really that poor or not. We plan to submit this for journal publication eventually so If going with option 2 will nip any reviewer comments in the bud, I'm happy to go there.

note: what we have here isn't a rate per se, but a proportion, since the number of diabetic patients is a true subset of the total patients (e.g. # diabetics le # total patients) so logistic regression would be more appropriate. however for interpretative reasons, poisson regression would be better for us and the audience.

TIA