Hierarchical Linear Modeling (HLM) Assistance

Sorry if this gets posted twice (it doesn't look like it has posted the first time as of now).

I asked a question about fixed vs. random effects several weeks ago (http://www.talkstats.com/showthread.php/70066-Fixed-vs.-Random-Effects) to just better understand the difference between the two effects.

Now that I am actually ready to run my model I'm hoping someone with some experience in HLM might be able to provide some assistance.

A bunch of practices responded to a survey where we asked them some questions on a particular topic. We will sum their responses from these questions to get an overall score which can range from 0 to 60.

The data is set up so that each row is a unique person and everyone has data pulled for the same time period: JANUARY 2016 represents month 1 for everyone and we pull their prior 12 months of data (so Jan 2015 - Dec 2015) as well as 12 month post data (so Jan 2016 - Dec 2016).

Every member belongs to only a single practice. All members have the full 24 months of follow-time.

These are the variables that are associated with the practice (not a complete list). These should be the random effects, if I'm not mistaken.
  • survey score
  • number of doctors at the practice
  • practice ID

These are the variables that are associated with the individual member (not a complete list). These should be the fixed effects, if I'm not mistaken.
  • product (i.e. commercial, medicare, medicaid)
  • prior 12 month total cost
  • post 12 month total cost
  • number of chronic diseases

The goal is to evaluate if there are differences in POST 12 MONTH TOTAL COST between practices and to evaluate if the survey score plays any sort of role in the post cost (the belief is that the higher the score, the lower the costs).

I believe HLM is a good approach for this type of problem because the people are nested within a particular practice (similar to the classroom/student example I've seen in lots of articles).

Right now, I have this as code to try and answer this question:
proc mixed data=survey noclprint noitprint covtest;
   class practice_id product;
   model post_cost = survey_score doctor_count
                               prior_cost product disease_count
            / ddfm=residual solution outp=survey_pred;

   random intercept / subject=practice_ID;
I'm not sure if this is the correct use of PROC MIXED and I'm mainly confused a bit about what all needs to go into the RANDOM line. Right now I just have intercept but I'm not sure if the SURVEY_SCORE and DOCTOR_COUNT variables need to go in there as well since they are random effects (when I put them in there, the resulting output doesn't really seem to show parameter effects or anything which is where I am getting confused).

Any help would be greatly appreciated - this is a very new methodology to me and I'm a visual learner who likes to see working examples (but am having difficulty finding more than just theory and discussion on this topic).


Omega Contributor
Seems correct, but yes you do need the random effects on the random intercept line. I believe right before the "/".

Feel free to post your results output or log output.

I typically only run a MLM about every 1.5 years, so I usually have to try and remember all of the details myself.

P.S., what is your individual provider and practice sample sizes?
Last edited:
The overall number of members is somewhere around 90,000 (I am using PROC SORT first to help the PROC MIXED run faster). The number of members per practice range from 100 to 1,000+ and there are about 200 unique practices in total.

I have 8 variables which would be considered RANDOM EFFECTS (i.e. practice level effects)
1. the practice's survey points score (which is ultimately my variable of interest)
2. the number of physicians at the practice
3. the number of nurse practitioners at the practice
4. the number of physician assistants at the practice
5. the total size of the practice (in terms of how many patients belong to that practice)
6. the number of pharmacists at the practice
7. the number of care managers at the practice
8. the number of behavioral health specialists at the practice

When I add these 8 variables to the RANDOM statement, it results in the following (I'm just showing the intercept and survey score to keep things concise)

random intercept survey_score physician_count np_count pa_count mbr_count rx_count cm_count bh_count / type=VC subject=practice_ID;
Effect Level Estimate Standard Error DF t*Value Pr > |t|
Intercept -408.01 278.28 9.10E+04 -1.47 0.1426
SURV_PTS -9.0612 10.727 9.10E+04 -0.84 0.3983

.... and so on which would suggest that, all things being equal, each additional point to the practice's survey would yield a decrease in cost of $9.06 per member per year (though not statistically significant, p-value=.3983).

I also do get output for this COVARIANCE PARAMETER ESTIMATES (the variables in my random statement) that I don't quite understand.
Covariance Parameter Estimates
Cov Parm Subject Estimate Standard Error Z Value Pr > Z
Intercept FAC_SITE_NAME 0 . . .
physician_count FAC_SITE_NAME 0 . . .
NP_count FAC_SITE_NAME 20924 15496 1.35 0.0885
PA_count FAC_SITE_NAME 1203.92 3224.92 0.37 0.3545
MBR_count FAC_SITE_NAME 0 . . .
RX_count FAC_SITE_NAME 0 . . .
CM_count FAC_SITE_NAME 0 . . .
BH_count FAC_SITE_NAME 0 . . .
Residual 1.75E+08 822510 213.28 <.0001

so my total code would look something like this (it's still a bit condensed since, again, there are a lot of variables):

proc mixed data=survey;
   class practice_id mbr_product mbr_sex cancer_flag diabetes_flag;

   model post_12_month_cost = survey_pts
                                               physician_count NP_count PA_count MBR_count RX_count CM_count BH_count
                                               prior_12_month_cost mbr_product mbr_sex cancer_flag diabetes_flag
                        / solution ddfm=residual;

   random intercept survey_pts physician_count NP_count PA_count MBR_count RX_count CM_count BH_count / type=VC subject=practice_id;
I mean if this looks all fine that's great.....I'm guessing interpretation of the parameters and such would be the same as any type of regression (like how I described the $9.06 savings per member per year for each additional survey point above).

My questions:
1. Is what I did actually correct, at least from anyone can tell?

2. The RANDOM line is literally just for the effects considered random? I ask because in an article using students/classrooms, the authors used a pre-test score in the RANDOM line (which would be a fixed student effect) but did not put any random classroom effects in it).

3. What exactly is happening with all of the RANDOM effects in that line? I mean I guess I understand it's calculating different slopes for each value?

4. When I have a random effect that is categorical and add it to the RANDOM line, none of the parameter estimates seems to be output. Does this suggest that only continuous variables can be random effects (at least in this line of code)?
Last edited:


Omega Contributor
Yeah, I don't use proc mixed much at all. But the general structure seems fine. You have a lot of variables that seem like they may be expressing the same types of information (e.g., #PCPs, #PAs,..., etc.) and a lot of data, so in theory it should be able to hold them, but many times simplicity is best in modeling. You may be able to reduce the model saturation using -2 Log likelihood tests.

Not sure what is up with your categorical variables, not sure why you can incorporate them??
Ok, well glad to know that the lack of output is not a normal thing.

Yes, I agree that although the # of different medical types would seem to be reporting the same thing, I do technically need them. What the survey was trying to get at was how "mature" each practice is regarding their Advanced Medical Home status. In theory, these "mature" practices should have care managers, behavioral health specialists, pharmacists....all on staff and incorporated into how they treat their patients. So the reason I believe I truly do need each of these is because we are trying to measure how well they have all been incorporated into their practice (and if a practice has 0 of some of these professionals I want my model to reflect that).

Actually, these counts of professionals were the categorical variables I was trying to include in my random statement with no success. I categorized each variable as 0=0 pharmacists, 1=1 pharmacist, 2=2+ pharmacists, for example. But since that was not working, I had to just leave them as the numeric total number of pharmacists.

Regarding the comment about not using PROC MIXED, is there another SAS procedure you use instead or do you use R or some other software altogether?

Thanks again for the help! Glad to know I wasn't too far off on my approach. I also found that I can just make my categorical variables 1/0 binary and then just not put them in the CLASS statement and get the same results - that way I can still put them in the RANDOM statement and it works fine (still not sure what it is about something being treated as categorical that my SAS does not like).
Last edited: