# Guidance on HLM/multilevel modeling

#### noetsi

##### Fortran must die
The problem with that, to me, is that nearlh all data is nested in something so all SE would always be biased. So all OLS analysis would be incorrect (at least as far as CI and tests were concerned). Also unequal variance would always be part of OLS models - and obviously none of that is true.

What variables would exist, do exist, that can not be conceptually nested in some other variable.

#### Dason

What exactly do you mean by all data is nested?

Because sure most data is nested in something. But if we only observe one group in that nest then it doesn't matter and OLS works fine.

#### noetsi

##### Fortran must die
But how often would that actually occur in a non-case study example (which would be rare in quantiative methods). For example how often would you have just one school in your model or one of any group that a level 1 variable would nest in? It is not as simple as leaving out the groups (disagregation). Not placing the higher level (group) variable in your model in most cases won't eliminate any problem (and indeed leaving out the grouping variable causes signficant problems).

#### Dason

I guess I'm missing your point because it seems that you're arguing for using HLM instead of OLS.

#### noetsi

##### Fortran must die
I am arguing if HLM proponents are right OLS will always be invalid because virtually all real world data is nested and (according to them) will have signfiant errors in them as result whether the group variable is or is not in the model. For example you would expect unequal variance requiring WLS (as HLM uses). And clearly this is not the case, quite often OLS works fine and has none of these problems (the problems also would exist in ANOVA).

#### trinker

##### ggplot2orBust
noetsi said:
One of the problems I have working with HLM is their argument that when data is nested than OLS will inherently have unequal error variance (heteroscedacity) and observations will not be independent. Since all data is inherently nested in something, then all OLS would be badly harmed by these conditions. And clearly that is not the case.

HLM, like many recent discoveries has a habit of overselling itself and degrading other methods that in fact work fine.
Let's say a raptor hypothetically wrote a paper and that paper was looking at state data for NEW State schools. And let's hypothetically say this raptor ran a model he new on a random sampling of schools from that data base. This model was a hierarchical multiple regression (not to be confused with the hierarchical of HLM) because this raptor was familiar with these tests though he knew that likely the data was structured and was somewhat concerned about biased estimates. So this handsome raptor used the durbin watson test to check independence of error terms though he knew the data was structured. This test was not significant therefor indicating there was no auto correlation. Would it be ethical to turn in the results as is knowing the data was structured? Let's critique this raptor's actions in an honest way.

1. Is there likely to be nonindependence of errors in this method though schools were randomly selected from the data base?
2. If there is autocorrelation are the estimators biased? I've seen people dbate this both ways.
3. Is autocorrelation one of the signs of structure.
4. Is durbin watson an appropriate technique to find autocorrelation?
5. Is it shody statistics to print these results though knowing the data is structured even if the test indicates otherwise and the possible problem is mentioned in the limitations.

I think the answers I get may not be what this raptor wanted to hear.

#### Jake

I second Gelman & Hill 2007. Pinheiro & Bates 2000 is also a useful reference, even though the syntax is all with the rapidly dating nlme package. Chapter 7 of Baayen 2008 (the author has a preprint of the book posted on his website HERE) is also handy, albeit obviously less comprehensive than the others--the examples are from linguistics, but the upside of this is that he covers cross-classified data structures pretty extensively, and all using the lme4 package. In my experience, cross-classified data (e.g., students crossed with, rather than nested under, classrooms) are far more common in experimental contexts than strictly hierarchical data, although the latter sometimes occur.

I recently ordered and just received Zuur et al 2009, "Mixed effects models and extensions in ecology with R," but I haven't started digging into it yet, so I can't really say else much about it except to point out its existence.

#### noetsi

##### Fortran must die
I think that may be all true (I can't imagine what a handsome raptor looks like), but it does not really address my point. If HLM is correct in its assumptions, when can you ever validly run OLS. My answer would be, in practice, never. Because all data is ultimately nested, all variables are ultimately nested.

To return to a more practical element I am working through a homework problem that asks what the critical value around the grand mean in a level 2 equation tells you about the existence of group differences (that is do they exist or not). The grand mean tells you about the intercept in the level one equation but barring centering there does it really tell you if the groups vary? I don't see how. The random term U in the 2nd level equation tells you not the grand mean about how groups vary.

Even more confusing to me is the difference (except for how they are calculated) between a CI and a plausible value range?

#### Dason

I think that may be all true (I can't imagine what a handsome raptor looks like), but it does not really address my point. If HLM is correct in its assumptions, when can you ever validly run OLS. My answer would be, in practice, never. Because all data is ultimately nested, all variables are ultimately nested.
Even experimental data? Also note that even if we do have nesting we only really care if we have multiple responses from each group in the nest.

If you have observations from a whole bunch of schools but only have one response from each school then you don't need to care that the observations are nested within school because you only have one. So it seems like some of your argument is just a strawman.

I think HLM should be used when it's appropriate but I don't agree that it's needed always. Maybe it's appropriate for all of the data you use - but it certainly isn't necessary for most of the data I work with.

#### Jake

Also note that even if we do have nesting we only really care if we have multiple responses from each group in the nest.

If you have observations from a whole bunch of schools but only have one response from each school then you don't need to care that the observations are nested within school because you only have one.
I think this point is key. From what I can tell, noetsi seems to be talking about data being nested on a purely conceptual level, that is, whether we could conceive of some superset under which our current, actual data are nested. This is perhaps philosophically interesting, because it says something about the kinds of arguments for generalization that we are warranted to make based on the data at hand, but it is not relevant to the potential statistical problems of ignoring nesting that is actually present in your dataset.

#### spunky

##### Doesn't actually exist
If HLM is correct in its assumptions, when can you ever validly run OLS.
actually, Gelman (2005) has a very simple solution to what you're arguing: always estimate random effects. the logic is quite simple, actually: if the nesting effect that you're talking about isn't that important then HLM/MLM/mixed-effects regression gets "downgraded" to regular OLS regression again. but i think one part that is important for you to realize noetsi (and is sort of building up in Dason's posting about nestedness) is that the fact that something is or isn't nested doesn't impact your analysis as much as whether does nestedness creates an effect on the variance which would be strong enough to create problems with Type I error rates and whatnot. if there is not a significant ammount of variance that can be attributable to nestedness, things can be nested all you want and OLS regression will still work fine. you can translate that very easily into a children-in-schools kind of idea. if you're dealing with a district that's reasonably homogenous, with more or less the same ethnic groups, the same SES, the same proportion of boys and girls then chances are that nestedness because of school (or let's call it a "school effect" so to speak) isn't that strong to warrant the need for a mixed-regression model approach. and there is a way to measure that through the intra-class correlation...

... which brings be to the point of our handsome raptor. i haven't yet read/seen whether there is any relationship between the Durbin-Watson test and the statistical significance of the intra-class correlation... there probably is but, if not, i think i'll take this up as a fun project to present at the next departamental colloquim we are having. i think our handsome raptor is being very professional in acknowledging that regression has assumptions and such assumptions should be tested. but assuming further that this particular raptor is proficient in R, i think it would probably take him around 27 seconds to run this as a quick mixed-effects regression with a random effect for the intercept and see whether there is in fact no "school effect". my intuition leans towards saying there sholdn't be one because of the evidence from the Durbin-Watson statistic, but then again that only tests for one lag of autocorrelation, i'm not sure how auto-correlation lags get translated into variance components in terms of testing for statistical significance

ps- trinker, i'm in my school laptop right now so i probably wont be able to send out stuff until i get back home where i keep all my files in the big computer

#### spunky

##### Doesn't actually exist
ninja'ed by Dason & Jake! this has never happened to me b4! the horror! :S

#### noetsi

##### Fortran must die
I think the argument would probably be if the random effects at level 2 (or higher with more advanced models) are not statistically signficant or the ICC is below 5 percent than you can use OLS. Having said that articles I have read on HLM don't seem to see many cases when OLS of Fixed effect ANOVA is valid. But they are trying to sell a method of course.

#### spunky

##### Doesn't actually exist
The grand mean tells you about the intercept in the level one equation but barring centering there does it really tell you if the groups vary? I don't see how. The random term U in the 2nd level equation tells you not the grand mean about how groups vary.
noetsi, the u0j term at level 2 tells you about individual deviations from the cluster-level mean. if there is no u0j residual then the "grand mean" would be the mean for every single cluster and the change across people would average down to 0. so i do see a point there as for why it's importnat to keep in mind what level-2 equations tell you about level-1 variables... and also helps emphasize my point that Raudenbush & Bryk's notation is HIDEOUS when it comes to explaining this stuff... for starters they've isolated methodologists in the social sciences from statisticians because absolutely no one outside from psychology, education, sociology, etc. talks about "level 1, level 2, gamma-00 , gamma-01" and all that crap. it's only after i took classes in both depts that i started to see the connections of what they're trying to do. and the problem is that, at least when i've TAed this courses, a lot of people end up with questions like yours where it seems as if things in level-2 equations should only tell you about level-2 stuff and it's difficult to emphasize that the linear model being fitted is (and will always be):

dependent variable = fixed effects + random effects + error

#### spunky

##### Doesn't actually exist
But they are trying to sell a method of course.
well... Raudenbush & Bryk need to make \$ on that HLM software of theirs so i could see how they (and their minions) would unintentionally try to sway the discussion one way or another...

HLM is the structural equation modeling of the 21st century. when LISREL came out it was all about fitting SEM models to everything even if it didnt make sense... then the HLM software became avaiable and **bam** now everyone wants to jump on that bandwagon because it's pretty hot... i've always said that the next new hot thing in the social sciences are going to be Bayesian Networks so i'm starting to look into those so i can be a la mode when the time comes...

#### Dason

Yeah I usually don't what noetsi is talking about when they talk about HLM stuff because I definitely don't use that notation/terminology.

#### spunky

##### Doesn't actually exist
Yeah I usually don't what noetsi is talking about when they talk about HLM stuff because I definitely don't use that notation/terminology.
it's horrible! completely unintuitive, they change it whenever they want but the thing is that the lme4 or proc mixed version in the social sciences is this software HLM which is heavy on the GUI-friendly part and is a lot like SPSS, so people on this province of knowledge jumped in that bandwagon immediatley to the point that whatever notation that software uses has become the ruling discourse in the social sciences...

....for instance, on that question noetsi asked you about "the grand mean" and "level 2 predictors" with a whole bunch of gammas, etc. all he was asking was for a reason as for why the intercept of the fixed effects changes if you add predictors with random effects... which is something i believe is more understandable to you... am i right?

#### noetsi

##### Fortran must die
As best I can tell the notation they use is common in education (that is reading other articles). I don't want to get started on a pet peave - the refusal of statisticians and those who use it to agree on notation.

I don't see the CI of the grand mean (Goo in R&B) tells you anything about if groups vary. What tells you if they vary is if Uoj is signficant. Is there anyway that the CI of the grand mean tells you if group differences are significant?

To use the notation brought up above...

for instance, on that question noetsi asked you about "the grand mean" and "level 2 predictors" with a whole bunch of gammas, etc. all he was asking was for a reason as for why the intercept of the fixed effects changes if you add predictors with random effects... which is something i believe is more understandable to you... am i right?
What I mean is if the fixed effect intercept at the level that is explaining between group variation can show if between group variation is occuring (or if its ci can). I dont see how it can, but the question suggest this is possible.

#### noetsi

##### Fortran must die
Yeah I usually don't what noetsi is talking about when they talk about HLM stuff because I definitely don't use that notation/terminology.
Well it could also be that I am lost substantively as well Although I did (somehow) get a 108 of 110 on the first homework. Much better than people who understand it much better than I. I never understand how that happens...

#### spunky

##### Doesn't actually exist
What tells you if they vary is if Uoj is signficant. Is there anyway that the CI of the grand mean tells you if group differences are significant?
dont you need uoj to calculate those CIs? i'm not sure because, once again, that's another ideosincracy of R&B and tend not to dwell on it too much...