Understanding Multilevel Model

trinker

ggplot2orBust
#1
I'm slowly working through multilevel models and have a pretty specific question/though checking.

Here's some R code:

Code:
library(lme4); data(Dyestuff)

mod <- lm(Yield ~ 1 , Dyestuff)
resid(mod)
This is a called a null model correct? This is because you're predicting the outcome based solely using the grand mean as the prediction.

I assume this is correct because I can achieve the same results by:

Code:
Dyestuff$Yield - mean(Dyestuff$Yield)
So now let's look at a similar model in the lme4 package.

Code:
fm1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff)
resid(fm1)
My take on this is that it is nearly identical to:

Code:
fm2 <- lm(Yield ~ Batch, Dyestuff)
resid(fm2)
Is this true? The only difference is that the lme function is using REML? So basically this model is predicting based on the group mean as demonstrated by the code below?

Code:
with(Dyestuff, ave(Yield, Batch, FUN=function(x) x - mean(x)))
Again the only difference between the two fm1 and fm2 is that fm1 is using REML and fm2 is using OLS?

Can some smarter people just check/critique my thinking?
 

trinker

ggplot2orBust
#2
I think I'm wrong on the reason that resid(fm1) != resid(fm2). It's not REML but the fact that more variability is being accounted for by the between group variance. Is this correct?
 

Jake

Cookie Scientist
#3
This is a called a null model correct?
Well, I mean, some people call it that (notably, Raudenbush & Bryk and Snijders & Bosker). I would just call it a regression with no predictors, or something like this. No need to introduce unnecessary new terminology.

Is this true? The only difference is that the lme function is using REML?
Close but not exactly. In technical terms, fm1 treats Batches as random effects, while fm2 treats Batches as fixed effects. Empirically you can observe that the two models differ in their predicted/fitted values (i.e., try inspecting fitted(fm1) and fitted(fm2)). fm2's predicted values are simply the mean Yield for each Batch. fm1's predicted values are very similar to the simple Batch means, but the predictions have been "shrunk" to some extent toward the grand mean (i.e., the intercept). These shrunken estimates from fm1 are called BLUPs ("Best Linear Unbiased Predictors").
 

trinker

ggplot2orBust
#4
Thanks Jake, I'll run with this thread for a while...

I'm trying to work through Bate's book (p. 24). In chapter 1 he makes a plot of the estimates with prediciton intervals. I don't know how the prediction intervals are calculated (I assume the standard error, is this true)? If not how were the prediction intervals calculated:

Code:
fm1ML <- lmer(Yield ~ 1|Batch, Dyestuff) 
dotplot(ranef(fm1ML, condVar = TRUE))

dat <- structure(list(Batch = structure(c(3L, 4L, 5L, 2L, 6L, 1L), .Label = c("F", 
    "D", "A", "B", "C", "E"), class = "factor"), X.Intercept. = c(-17.6069081921822, 
    0.391264626493306, 28.5623177339851, -23.0846129630835, 56.7333708414772, 
    -44.9954320466883)), .Names = c("Batch", "X.Intercept."), row.names = c(NA, 
    -6L), class = "data.frame")

se <- summary(fm1ML)$coefficients[2]

ggplot(dat, aes(x=X.Intercept., y=Batch)) + geom_point() +
     geom_errorbarh(aes(xmin=X.Intercept.-se, xmax=X.Intercept.+se), height=0, size=1)
Here are the two plots:





Are the 2 plots equivalent (i.e. is the prediction interval = to coefficient - SE?)
 

noetsi

Fortran must die
#5
Go back to the R board trinker:p

An earmark of Multi Level (ML) as used by the authors cited above is that they use nomeclature others don't for example hiearchical regression rather than ML. I agree with jake that calling it a null model because it has no predictors makes little sense.
 

spunky

Super Moderator
#6
(i.e. is the prediction interval = to coefficient - SE?)
well, almost. from the ggplot code that you posed I can see he's setting the limits of the prediction on the interval [intercept estimate - standard error, intercept estimate + standard error]
 

noetsi

Fortran must die
#7
trinker I had an excellent course from an expert in HLM. When I get some more time, it is really busy right now with the legislature coming into session, I can pull them out (you mentioned you wanted to see them).
 

trinker

ggplot2orBust
#8
Spunky I should have been ecplicit. I made the gap lot code. Bates uses the lattice but in doing so it's not explicit what is actually happening. The ggplot was my guess that made sense.
 
#9
Just a tiny comment: It's definitely called a NULL model in many fields, and calling it a model without predictors could confuse more - depending on the field. Find out what normal for your field. Jake's spot on when it comes to shrinkage and BLUPS.

Now does Bates, in his book, explain how to fairly test between models differing in what is included as random effects? That is what I would like to know :)
 

Jake

Cookie Scientist
#10
At least in the earlier edition of the text (Pinheiro & Bates, 2000), they mention that there are issues arising from the fact that the parameter value under the null hypothesis is on the border of the parameter space, and that technically there is a correction you can apply to the degrees of freedom of the LRT statistic to take account of this. But then they note that the commonly suggested correction (i.e., using an equal mixture of chi-squares with 2 different degrees of freedom) is not really quite right either, as verified by simulation. So their basic conclusion is that it is tricky to do this just right, no easily-implemented method seems to fix the problem exactly, but the simple, naive LRT is not too bad really (although slightly conservative), so they recommend just going with that.
 
#11
At least in the earlier edition of the text (Pinheiro & Bates, 2000), they mention that there are issues arising from the fact that the parameter value under the null hypothesis is on the border of the parameter space, and that technically there is a correction you can apply to the degrees of freedom of the LRT statistic to take account of this. But then they note that the commonly suggested correction (i.e., using an equal mixture of chi-squares with 2 different degrees of freedom) is not really quite right either, as verified by simulation. So their basic conclusion is that it is tricky to do this just right, no easily-implemented method seems to fix the problem exactly, but the simple, naive LRT is not too bad really (although slightly conservative), so they recommend just going with that.
Great! I'll look into Pinhero and Bates 2000, I always thought they were the best resource for this. But this is very useful, thanks Jake.
 

spunky

Super Moderator
#13
At least in the earlier edition of the text (Pinheiro & Bates, 2000), they mention that there are issues arising from the fact that the parameter value under the null hypothesis is on the border of the parameter space, and that technically there is a correction you can apply to the degrees of freedom of the LRT statistic to take account of this. But then they note that the commonly suggested correction (i.e., using an equal mixture of chi-squares with 2 different degrees of freedom) is not really quite right either, as verified by simulation. So their basic conclusion is that it is tricky to do this just right, no easily-implemented method seems to fix the problem exactly, but the simple, naive LRT is not too bad really (although slightly conservative), so they recommend just going with that.
there is some bootstrapped likelihood-ratio test that some random spunky shared with you at some point to address this issue...

#justsayin
 

Jake

Cookie Scientist
#14
Yes, if it was really important that I get a reliable answer to this question in my own research (e.g., the question was of substantive theoretical interest), then I would use something like the bootstrap-based approach that you outlined. But most of the time, in the process of everyday data analysis and model comparison using mixed models, it's really not important that I have a reliable p-value for testing random effects. In fact I usually don't bother doing formal hypothesis tests on random effects at all. Usually I let my choice of the random effects structure be dictated by the design of the study and computational/convergence issues.
 

spunky

Super Moderator
#15
In fact I usually don't bother doing formal hypothesis tests on random effects at all.
as weird as it sounds, i do this as well... at some point the hypothesis testing of the random effects is much more of an after thought and it's the actual covariance structure what one ends up interpreting.
 

noetsi

Fortran must die
#16
What is the value of a random effects if you don't know it is signficant? It could be just random error. If you are going to base random effects on theory why do an empirical test of them at all?
 

Dason

Ambassador to the humans
#18
What is the value of a random effects if you don't know it is signficant? It could be just random error. If you are going to base random effects on theory why do an empirical test of them at all?
Jake mentioned that for the most part he doesn't do an empirical test. This is typically the approach I take as well by the way. In my case there I'm usually dealing with a designed experiment so things tend to be much cleaner in terms of deciding on a model.
 

Jake

Cookie Scientist
#20
Not at all. When analyzing data from designed experiments, it is an entirely conventional and well-accepted view that the design of the experiment itself should be the primary thing dictating which parameters appear in the model. Taking the opposite approach (trimming parameters from the model based on significance, etc.) is pretty much the exception rather than the rule.