Multilevel analysis -SAS


Not a robit
I feel like I am on a run of using macros this week. I will see if I can find it in the next 5 minutes.

Side note, don't forget to turn on ODS graphics every now and then. Some times it surprises you when they kick out residual plots.


Fortran must die
I sent the lead author an email. She is on sabbatical until August 2017 lol. That looks like the macro.....

My real problem with the macro is I do not have access to PROC IML (and won't be getting it where I work) and it does not look like the macro works without it. I do not know if SAS added it to their stats packages.
Last edited:


Fortran must die
The following author is very good in explaining HLM in the context of SAS. But they continually refer to a second level identifier variable in all their models and I have no idea at all having read the HLM literature what that is :p

This is an example of what I mean.

The PROC MIXED syntax and relevant output for the two-level unconditional growth model is shown below. Note that for this example childid is our level-2 identification variable so we use it on the CLASS statement and the sub = optionon the RANDOM statement. Reading is our criterion variable and we do not have any predictor variables in this model.
It is on page 11 about half way down the page.

proc mixed covtest noclprint data = temp method=ml;
class childid;
model reading = /solution ddfm = SATTERTHWAITE;;
random intercept /sub=childid type=vc;


Not a robit
Yeah, I read that paper before. If I remember correctly there is also a video from the authors from that SAS User Group Conference.

I think the answer is so simple you can see it. The full purpose of MLM is observations nested within a 2nd level group. My typical example is patient contributing more than one observation. If I neglected to acknowledge that the observations where not independent, I would be breaking a basic assumption. So I need to use MLM and control for patient ID variance/covariance, which ID is the second level identifier. Other examples, students nested in classrooms or people nested in states, etc. Does that help you?


Fortran must die
Is the id something which identifies the 2nd level group (say school in my case) or something which identifies the individual, the first level variable? This is what confuses me. I understand why they have to do it, but not what it is identifying.

While I am at it, since I have not found this, do you know how you identify predictors for the second level variables - the ones listed in the random statement? These are the W in the typology some HLM authors use. For example what variables predict the intercept at the 2nd level.


Not a robit
Yes, it is the identifier at the 2nd level. Example State (Iowa, Florida, Missouri, etc.), given observations are nested in this variable.

Side not, I have read, especially if using say GLIMMIX or MIXED, it is best to sort by this variable before running the model. It can save processing time and failures for convergence.

Do you know the difference between random intercepts and random effects? If not do an internet search of images on the topic. That will help you understand the difference and when you include variable in the random statement line.


Fortran must die
I have read that as well.

I am not sure what you mean by your last statement. I know the difference between a random intercept and random slope modeled at the 2nd level. And how SAS addresses that. I do not know how SAS handles predicting anything random at the second level.

For example say englishscores was a predictor at the first level which you were modeling at the 2nd level (school). And you had a second level predictor of englishscores, the percent of students in a school who spoke English (perscheng). How in the code would you indicate that this (perscheng) was a second level predictor?
Example per the Ubiquity paper you referenced:

PROC GLIMMIX DATA=religion method=quad(qpoints=10);
    CLASS country;
    MODEL ReligiousAttendance = Female|GINI College Urban Educ|Female Income Single Divorced Widowed / SOLUTION LINK=LOGIT DIST=B;
    RANDOM INT Female income / SUBJECT=Country;

Variables listed on RANDOM line are 2nd level variable and you can see they seem to be testing cross level interactions in the model as well.


Fortran must die
I apologize for my confusion. How does it know which are the 2nd level variable and which are the variables you are using to predict the 2nd level variables. For example in the model statement the predictors are to the right of the variable you are predicting.

In the software HLM predictors of 2nd level variables are shown as distinct from predictors at the first level. That follows the nomenclature of Raudenbush and Bryk who show predictors at the different levels in separate equations. Probably not coincidently they wrote the HLM software. I have a feeling that is unique to their approach, and SAS treats all predictors the same regardless of what level they are on.
Yup, you list all predictors after the "=", RHS, and any of them listed in the RANDOM line are considered random effects, 2nd level variables. Also you can have cross level interactions as seen in the above syntax example.


Fortran must die
Personally I refer the HLM approach because it makes it clear conceptually what you think is predicting a specific level 2 effect. But its good to understand what SAS does finally.
I remember the R code seemed even more cryptic. I tried a model in R and it seemed things on the right hand side even further past a pipe ("|") were random effects. But that was a few years ago.


Fortran must die
One thing that I find confusing about multilevel approaches is that at times it seems to suggest that some variables influence the DV indirectly through their impact on the group which then influences the first level DV. This is particularly obvious in say Raudenbush and Bryk who my training in HLM was based on. They have separate equations for 2nd and above level units (like school or hospital). Its true that they have an overall equation that combines these higher level equations eventually into an overall model, but at least in my classes that did not get a lot of emphasis


Fortran must die
“Significance testing for variances in level 2 residual variance/covariance matrix G provides information about which level 1 slope coefficients are random. If the variance is not statistically significant you remove the random coefficient, …”

This is testing if you should have a random component for a level 1 variable. How do you do this in sas, that is where do you find the results (and is there a specific test you need to specify).
There are two or three COV statement tests, did you buy the book I recommended.

You can also use AICC comparisons along with -2loglikelihood tests during model building.


Fortran must die
I am not sure which book you mean. I order them from the state library. Not being a fabulously wealthy medical researcher I don't buy anything. :p

I am reading Wang et el Multilevel models (which might be the book you recommended). He makes a fascinating point. SAS does distinguish between level 1 and 2 variables even though its not in the code (as it would be for example with the HLM software). SAS automatically classifies something as a level 2 variable (a predictor at level 2) if a variable varies across groups but remains constant within a group. That is one reason you have to specify a group and individual identifier.