New to hierarchical regression, looking for advice

Hey all,

I am relatively experienced in plain old OLS and I'm trying to learn how to do mixed-effects, hierarchical modeling. I got Andrew Gelman's textbook on the subject and am 60% of the way in and feeling ready to dive in and give it a shot. But, I'm having some trouble getting started and wanted to reach out here and see if anyone could give me some advice and see if I'm going in the right direction. I'm just going to lay out the problem I'm trying to model and the specifications I've come up with. I appreciate any feedback people can provide about my approach or other ideas. The post is really, really, long, so thank you so much in advance for taking your time to help me!

The problem:

I am trying to model the cost of medical procedures. My Y variable is the log(price), and my X variables are the facility the procedure is done at, the procedure_name, and the insurer. My Y is continuous, and my x variables are almost all categorical/factor. These variables have a non-nested, hierarchical structure, I think, as follows:

20 procedure categories -> 200 procedures (I also know the medicare price for each procedure)
3 facility categories -> 200 facilities
6 insurers

With OLS fixed-effects, I have to choose between no-pooling and complete-pooling.

Model 1: complete pooling


This model is missing out on a lot of interaction. Certain facilities obviously may charge a lot for certain procedures etc. and certain faciltiies charge less for certain payers and more for others.

Model 2: Very little pooling

Log(price)=B1(Procedure_id)+B2(facility_id)+B3(insurer_id)+B4(Facility_id)x(procedure category)x(payer)

This model almost perfectly fits the data, which can cause some overfitting.

The hierarchical model seems appropriate in that it will give a compromise between those two models and also be better able to take in new procedures and faciltiies.

Here is my shot at a hierarchical model:



There's an option in stata for unconstrained variance/covariance matrix, and I thiiiink that takes care of the interaction terms? I'm not really sure. This is the way I think of these models, rather than the different error terms, although I realize they're equivalent and I should learn to think of them with different error terms. Especially because right now I'm trying to code this in stata and that's how their documentation is set up. If I need to put the interaction terms back in obviously I can do that.

Does this look at all right?! Do you have any idea how I code this in stata? I'm looking at the "mixed" command and I can't figure out how to handle the non-nested hierarchical element. After I code it in stata I want to learn to do it in python using pi-sci or something.