Multilevel analysis

noetsi

No cake for spunky
This author has an interesting discussion of diagnostics in an exploratory model. But some of his comments are unclear to me. This discussion occurs in page 114 to 116 in the link below.

The author says
The vertical spread between the lines indicates between classroom variation in terms of intercepts and is consistent with a random intercept model
What does the vertical spread mean? The slope of the line, the difference between the high and low point of each line? I don't understand this.

Similarly on 115-116 the author says
The variability in the level of the classroom means differ between girls and boys; that is, the intercept variance may not be constant for gender. Most noticeable is the appearance of possibly two groups of boys—those with high segregation indices and those with lower values
I am not sure how he determines this. To me the difference is that at the top the slopes are downwards, but at the bottom they are largely upwards which suggest slope is the key. But I am not at all sure what the author is stressing here, it might also be a change in the vertical difference between the boy and girl side of the line.

http://courses.education.illinois.edu/EdPsy587/GLM_GLMM_LMM.pdf

hlsmith

Less is more. Stay pure. Stay poor.
Noetsi,

Keep asking questions, I will do my best to mediocrically sp? answer them. Cross-level interaction terms are added by including a product term into the model statement with one of the terms being a fixed effect and the other listed in the random effects statement. I have only ran a few formal MLM models and most with binary outcomes, see the below link for a simple example looking at resident radiation exposure (continuous).

So exposures are nested in physicians and these physicians rotate through different services. So some services have higher risk for radiation exposure (e.g. vascular which uses fluoroscope) and some physicians due their behaviors/practices get exposed to more radiation (individual physicians are random effects). Now there is a multiplicative effect (interaction) for high risk residents are on high risk services. So exposures are synergic, more than just the two effects summed. I believe the interaction term is significant in the test statement, along with likely the -2LL, and you can probably kick out model probability predictions and also see this. I can't recall beyond predictions, if you can easily create a linear combination of terms from the combine model to get those outputted estimates easily.

http://www.jsurged.org/article/S1931-7204(16)30048-4/pdf#/article/S1931-7204(16)30048-4/fulltext

noetsi

No cake for spunky
I have decided to avoid binary ones, as the level of complexity in the discussions I have seen is much worse than linear dependent variables. I ran into a good book the other day on ML which is unfinished, if you are interested I can send it to you. The author raised questions that made me realize I have just touched the surface of ML models (and made me realize how little my class taught me).

Thanks for the link.

This is the book I mentioned. I do not know if it was ever finished.
http://courses.education.illinois.edu/EdPsy587/GLM_GLMM_LMM.pdf

Last edited:

noetsi

No cake for spunky
I don't understand the logic here at all
On average no relationship between xi j and yi j may exist even though the effect of xi j randomly differs over clusters. In such a case, b for the xi j equals 0, but
the explanatory variable xi j should still be included in the model. For example, although gender is less important than ethnicity from a substantive point of view,
the exploratory analysis indicated that there might a random effect due to gender.
Why would you care if something that had no effect on Y, had a random effect.

noetsi

No cake for spunky
Talking to spunky made me realize that this is what I really want to know. Our customers are nested inside units that provide services. I think, as do others, that units moderate service provision and other factors that influence customers. I need to model that. ML seemed ideal to do so, although the more I read about it the less I am sure of that.

I suspect that the groups generate different slopes for a given set of first level predictors. But we have too many groups to analyze each slope separately. I am looking for a simpler way to do that, ML was my approach because it seems to be used for that.

noetsi

No cake for spunky
If you have a statistically significant random effect for a given variable, say age, does that mean its impact on the DV varies by some group variable? And do you know what group vary it varies by (I assume you have to have that group in the regression to test this, but nothing I have read addresses that point).

If you know that a predictor varies by a group is there any simple way to show the regressions between that predictor (controlling for other predictors) for each group level. I have tens of groups so a way to simplify this would be great.

noetsi

No cake for spunky
I know little of moderator variables which seem like interaction to me but are different somehow. This which can be found at the top of p13 in the link below goes to the heart of my confusion on multilevel variables.

The first equation under 2.4 states that the relationship, as expressed by the slope coefficient between the popularity (Y) and the gnder (X) of the pupil, depends on the amount of experience of the teacher (Z). If is positive, the gender effect on popularity is larger with experienced teachers. Conversely if is negative, the gender effect on polarity is smaller with experienced teachers. Thus, the amount of experience of the teacher acts as a moderator variable for the relationship between popularity and gender....
http://joophox.net/mlbook2/Chapter2.pdf

What is the difference between a moderator effect and a cross level interaction? To me they seem the same. The impact of the predictor on Y at the first level will be influenced by the level of the 2nd level predictor in both moderation and cross level interaction. Just as importantly it would seem that a 2nd level variable might have limited direct effect on Y, but have significant impact on another predictor without being specified as an interaction effect. I am not sure how the combined equation handles this.

hlsmith

Less is more. Stay pure. Stay poor.
Effect modification (moderated relationship) is the same thing as interaction. Its more of a generic term in my opinion since you can have additive or multiplicative interactions and these can be antagonistic or synergistic.

noetsi

No cake for spunky
So the way to test for moderation is to specify a cross sectional interaction? I keep thinking that random effects get at this which is very different.

hlsmith

Less is more. Stay pure. Stay poor.
You have to take my recommendations with a grain of salt, but you can have first level interactions or 2nd level interactions or cross level interactions. You just need to list them in the model.

I get why you are confused in regards to cross-level interactions, though this would consist of 1st and 2nd level variables interacting.

noetsi

No cake for spunky
I actually think I understand the interaction (although it is not clear to me you can generate simple effects for cross level interaction as you can with normal interaction, that is the main effect of one variable at some level of another which is recommended for interaction).

Some authors suggest you standardize variables before you run them in multilevel regression and some suggest you convert the variables after you run the regression. The standardized coefficient won't differ much if you do that, but the variance will a lot. I was wondering which is considered best practice, obviously authors disagree.

noetsi

No cake for spunky
Plotting a regression line for each group is often recommended. However, I really want to run a multiple regression since I have control variables that need to be in the model. Is there any way to visually get at a regression space the way you would a line?

hlsmith

Less is more. Stay pure. Stay poor.
Didn't the Wang book use spaghetti plots if I remember right?

noetsi

No cake for spunky
You want to use maximum likelihood if you can in MultiLevel models because it allows you to use deviation to determine which model is better. But Maximum Likelihood is biased for variance parameters when the sample size is small. Specifically when N- Q-1 >= 50 the rule of thumb is it won't matter which approach you use. The problem is that while I know Q is largely the number of level 2 predictors, I am unsure what N is. I assume this means the sample size but the author who mentions it (Snijders and Bostker) do not define it (probably they assume its obvious)?

Does anyone know what N is?

noetsi

No cake for spunky
In multilevel models with random slopes the observations are heteroscedastic because their variances depend on the explanatory variables,....However, their residuals are assumed to be homoscedastic.....In chapter 8 it was explained that the multilevel model can also represent models in which the level 1 residuals have variances depending on an explanatory variable say, X. Such a model can be specified by the technical device of giving this variable X a random slope at level 1.
I do not understand this. Any comments would be appreciated. I do not understand if you do or do not need to check multilevel models for hetero.

Incidently Sir Fisher should be shot for using the term heteroscedastic - one of the hardest words to spell I have run into...

noetsi

No cake for spunky
I am really confused how you inspect the residuals in multilevel models. Some suggest inspecting them separately, that is level one and level two and some suggest looking at each level 1 separately for each group [which for me would be very time consuming since I have 30 plus groups - also I am not sure what you do if some groups show unequal error variance and others do not).

Can you just inspect the residuals of the combined model. Similarly I do not know how you do analysis of violations of assumptions at the 2nd level for things like non-linearity.

noetsi

No cake for spunky
Misspecifying the number of random effects and/or their covariance structure can also lead to biased point estimates when the outcome variable is discrete (Litière et al., 2007). Thus, when choosing to model clustered data with HLM, researchers with continuous outcomes and large sample sizes can be fairly confident that their results are robust to a misspecified covariance matrix or the exclusion of a random effect. However, with continuous outcomes with small or moderate number of clusters or with discrete outcomes, a violation of either assumption can adversely affect inference from model estimates.
When they say large sample sizes do they mean the number of cases, or how many groups there are? I have seen both of these used to determine sample size. Is there a rule of thumb of what large is?

noetsi

No cake for spunky
Interpretation of this variation is easier when we consider the standard deviation which is the square root of the variance. A useful characteristic of the standard deviation is that with normally distributed observations about 67 percent of the observations like between one standard deviation above or below the mean. See 19 for how you calculate this. But as an example if the mean is .45 and the standard deviation .18 than 67 percent of the means like between .45-.18 and .45 + .19 and 95 percent lie between .45 –(2*.18) and .45 + (2*.18). the more precise value is 1.96 rather than 2 in the example above.
Where do you actually find this square root of the variance in the model? I am not sure what they are using. This deals with analyzing how slopes vary across groups when a random effect is present.

hlsmith

Less is more. Stay pure. Stay poor.
Just guessing, but the SEs of slopes are the stdev of the population.

I don't know the full context you are looking at but the stdev of posterior densities (from bayes) are analogous to the standard errors from frequentist approaches.