# Understanding Multilevel Model

#### trinker

##### ggplot2orBust
I'm slowly working through multilevel models and have a pretty specific question/though checking.

Here's some R code:

Code:
library(lme4); data(Dyestuff)

mod <- lm(Yield ~ 1 , Dyestuff)
resid(mod)
This is a called a null model correct? This is because you're predicting the outcome based solely using the grand mean as the prediction.

I assume this is correct because I can achieve the same results by:

Code:
Dyestuff$Yield - mean(Dyestuff$Yield)
So now let's look at a similar model in the lme4 package.

Code:
fm1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff)
resid(fm1)
My take on this is that it is nearly identical to:

Code:
fm2 <- lm(Yield ~ Batch, Dyestuff)
resid(fm2)
Is this true? The only difference is that the lme function is using REML? So basically this model is predicting based on the group mean as demonstrated by the code below?

Code:
with(Dyestuff, ave(Yield, Batch, FUN=function(x) x - mean(x)))
Again the only difference between the two fm1 and fm2 is that fm1 is using REML and fm2 is using OLS?

Can some smarter people just check/critique my thinking?

#### trinker

##### ggplot2orBust
I think I'm wrong on the reason that resid(fm1) != resid(fm2). It's not REML but the fact that more variability is being accounted for by the between group variance. Is this correct?

#### Jake

This is a called a null model correct?
Well, I mean, some people call it that (notably, Raudenbush & Bryk and Snijders & Bosker). I would just call it a regression with no predictors, or something like this. No need to introduce unnecessary new terminology.

Is this true? The only difference is that the lme function is using REML?
Close but not exactly. In technical terms, fm1 treats Batches as random effects, while fm2 treats Batches as fixed effects. Empirically you can observe that the two models differ in their predicted/fitted values (i.e., try inspecting fitted(fm1) and fitted(fm2)). fm2's predicted values are simply the mean Yield for each Batch. fm1's predicted values are very similar to the simple Batch means, but the predictions have been "shrunk" to some extent toward the grand mean (i.e., the intercept). These shrunken estimates from fm1 are called BLUPs ("Best Linear Unbiased Predictors").

#### trinker

##### ggplot2orBust
Thanks Jake, I'll run with this thread for a while...

I'm trying to work through Bate's book (p. 24). In chapter 1 he makes a plot of the estimates with prediciton intervals. I don't know how the prediction intervals are calculated (I assume the standard error, is this true)? If not how were the prediction intervals calculated:

Code:
fm1ML <- lmer(Yield ~ 1|Batch, Dyestuff)
dotplot(ranef(fm1ML, condVar = TRUE))

dat <- structure(list(Batch = structure(c(3L, 4L, 5L, 2L, 6L, 1L), .Label = c("F",
"D", "A", "B", "C", "E"), class = "factor"), X.Intercept. = c(-17.6069081921822,
0.391264626493306, 28.5623177339851, -23.0846129630835, 56.7333708414772,
-44.9954320466883)), .Names = c("Batch", "X.Intercept."), row.names = c(NA,
-6L), class = "data.frame")

se <- summary(fm1ML)\$coefficients

ggplot(dat, aes(x=X.Intercept., y=Batch)) + geom_point() +
geom_errorbarh(aes(xmin=X.Intercept.-se, xmax=X.Intercept.+se), height=0, size=1)
Here are the two plots:  Are the 2 plots equivalent (i.e. is the prediction interval = to coefficient - SE?)

#### noetsi

##### No cake for spunky
Go back to the R board trinker An earmark of Multi Level (ML) as used by the authors cited above is that they use nomeclature others don't for example hiearchical regression rather than ML. I agree with jake that calling it a null model because it has no predictors makes little sense.

#### spunky

##### Can't make spagetti
(i.e. is the prediction interval = to coefficient - SE?)
well, almost. from the ggplot code that you posed I can see he's setting the limits of the prediction on the interval [intercept estimate - standard error, intercept estimate + standard error]

#### noetsi

##### No cake for spunky
trinker I had an excellent course from an expert in HLM. When I get some more time, it is really busy right now with the legislature coming into session, I can pull them out (you mentioned you wanted to see them).

#### trinker

##### ggplot2orBust
Spunky I should have been ecplicit. I made the gap lot code. Bates uses the lattice but in doing so it's not explicit what is actually happening. The ggplot was my guess that made sense.

#### TheEcologist

##### Global Moderator
Just a tiny comment: It's definitely called a NULL model in many fields, and calling it a model without predictors could confuse more - depending on the field. Find out what normal for your field. Jake's spot on when it comes to shrinkage and BLUPS.

Now does Bates, in his book, explain how to fairly test between models differing in what is included as random effects? That is what I would like to know #### Jake

At least in the earlier edition of the text (Pinheiro & Bates, 2000), they mention that there are issues arising from the fact that the parameter value under the null hypothesis is on the border of the parameter space, and that technically there is a correction you can apply to the degrees of freedom of the LRT statistic to take account of this. But then they note that the commonly suggested correction (i.e., using an equal mixture of chi-squares with 2 different degrees of freedom) is not really quite right either, as verified by simulation. So their basic conclusion is that it is tricky to do this just right, no easily-implemented method seems to fix the problem exactly, but the simple, naive LRT is not too bad really (although slightly conservative), so they recommend just going with that.

#### TheEcologist

##### Global Moderator
At least in the earlier edition of the text (Pinheiro & Bates, 2000), they mention that there are issues arising from the fact that the parameter value under the null hypothesis is on the border of the parameter space, and that technically there is a correction you can apply to the degrees of freedom of the LRT statistic to take account of this. But then they note that the commonly suggested correction (i.e., using an equal mixture of chi-squares with 2 different degrees of freedom) is not really quite right either, as verified by simulation. So their basic conclusion is that it is tricky to do this just right, no easily-implemented method seems to fix the problem exactly, but the simple, naive LRT is not too bad really (although slightly conservative), so they recommend just going with that.
Great! I'll look into Pinhero and Bates 2000, I always thought they were the best resource for this. But this is very useful, thanks Jake.

#### Jake

If you have a copy of P&B they discuss this on pp. 83-87.

#### spunky

##### Can't make spagetti
At least in the earlier edition of the text (Pinheiro & Bates, 2000), they mention that there are issues arising from the fact that the parameter value under the null hypothesis is on the border of the parameter space, and that technically there is a correction you can apply to the degrees of freedom of the LRT statistic to take account of this. But then they note that the commonly suggested correction (i.e., using an equal mixture of chi-squares with 2 different degrees of freedom) is not really quite right either, as verified by simulation. So their basic conclusion is that it is tricky to do this just right, no easily-implemented method seems to fix the problem exactly, but the simple, naive LRT is not too bad really (although slightly conservative), so they recommend just going with that.
there is some bootstrapped likelihood-ratio test that some random spunky shared with you at some point to address this issue...

#justsayin

#### Jake

Yes, if it was really important that I get a reliable answer to this question in my own research (e.g., the question was of substantive theoretical interest), then I would use something like the bootstrap-based approach that you outlined. But most of the time, in the process of everyday data analysis and model comparison using mixed models, it's really not important that I have a reliable p-value for testing random effects. In fact I usually don't bother doing formal hypothesis tests on random effects at all. Usually I let my choice of the random effects structure be dictated by the design of the study and computational/convergence issues.

#### spunky

##### Can't make spagetti
In fact I usually don't bother doing formal hypothesis tests on random effects at all.
as weird as it sounds, i do this as well... at some point the hypothesis testing of the random effects is much more of an after thought and it's the actual covariance structure what one ends up interpreting.

#### noetsi

##### No cake for spunky
What is the value of a random effects if you don't know it is signficant? It could be just random error. If you are going to base random effects on theory why do an empirical test of them at all?

#### spunky

##### Can't make spagetti
What is the value of a random effects if you don't know it is signficant?
the blind worship of the p-value must die!

#### Dason

What is the value of a random effects if you don't know it is signficant? It could be just random error. If you are going to base random effects on theory why do an empirical test of them at all?
Jake mentioned that for the most part he doesn't do an empirical test. This is typically the approach I take as well by the way. In my case there I'm usually dealing with a designed experiment so things tend to be much cleaner in terms of deciding on a model.

#### noetsi

##### No cake for spunky
I think that comes close to heresy in many of the departments and literature I am familiar with.... 