How to assess correlation of two independent categorical variables against continuous dependent variable (Medical)

I'm studying the variation of cost of the same procedure (specifically, the total cost of disposable items used for stone laser surgery, aka ureteroscopy with laser lithotripsy) across different hospital sites and surgeons. My dataset contains information from 8 surgeons and 6 sites. My 1st hypothesis is that the site and surgeon predict cost, e.g., surgeon #1 may prefer a more expensive device, or site #3 may have only the cheaper item in their inventory. My 2nd hypothesis is that site is the more dominant (or is even confounding) the effect of the surgeon, since the surgeon may not get much say if the site's inventory doesn't carry the item the surgeon prefers to use.

So, I have two categorical independent variables (surgeons #1-8, sites #1-6), and a continuous dependent variable (total case cost in $$). I know that I could model this with linear regression using dummy variables for all iterations of the categorical predictors, or could use ANOVA, but I'm not sure which test is more appropriate. Thanks


Active Member
ANOVA and linear regression with dummy variables for categories are algebraically equivalent. They exploit the same statistical tests, with exactly the same p-values. More importantly, we need to find out if these two methods are suitable for your problem. How many measurements do you have for each combination of surgeon and site?


Fortran must die
ANOVA and linear regression are exactly the same method developed by different people using different nomenclature.

You probably want to do power calculation to start with to find out if you have adequate power for the test. For federal analysis this is pretty much required these days.
Thank you for clarifying this.

@staassis - I have 18 surgeon x site combinations/interactions, and 402 total cases. Exact number of cases per combination is 30, 4, 4, 19, 32, 1, 1, 30, 18, 13, 21, 2, 56, 20, 64, 9, 28, 50.

@noetsi - great point. I guess the first question is what test would be most appropriate before doing the power calculation?


Active Member
Yes, run linear regression where the categories of Surgeon and Site are coded as dummy variables. If the residuals are not normally distributed, use bootstrap to calculate standard errors. SPSS allows this easily. The second convenient choice is R.