mixed-effects models with (g)lmer in R and model selection

Mixed-effects models are wonderful for analyzing data, but it is always a hassle to find the best model, i.e. the model with the lowest AIC, especially when the number of predictor variables is large.

Presently when trying to find the right model, I perform the following steps:

1) Start with a model containing all predictors. Assuming dependent variable X and predictors A, B, C, D, E, I start with: X~A+B+C+D+E

2) Lmer warns that is has dropped columns/coefficients. These are variables which have a perfect correlation with any of the other variables or with a combination of variables. With summary() it can be found which columns have been dropped. Assume predictor D has been dropped, I continue with this model: X~A+B+C+E

3) Subsequently I need to check whether there are variables (or groups of variables) which strongly corrrelate to each other. I included the function vif.mer (developed by Austin F. Frank and available at: https://raw.github.com/aufrank/R-hacks/master/mer-utils.R) in my script, and when applying this function to my reduced model, I got vif values for each of the variables. When vif>5 for a predictor, it probably should be removed. In case multiple variables have a vif>5, I first remove the predictor with the highest vif, then re-run lmer en vif.mer. I remove again the predictor with highest vif (if one or more predictors have still a vif>5), and I repeat this until none of the remaining predictors has a vif>5. In case I got a warning "Model failed to converge" in the larger model(s), this warning does not appear any longer in the 'cleaned' model.

4) Assume the following predictors have survived: A, B en E. Now I want to find the combination of predictors that gives the smallest AIC. For three predictors it is easy to try all combinations, but if it would have been 10 predictors, manually trying all combinations would be time-consuming. So I used the function fitLMER.fnc from the LMERConvenienceFunctions package. This function back fit fixed effects, forward fit random effects, and re-back fit fixed effects. I consider the model given by fitLMER.fnc as the right one.

I am not an expert in mixed-effects models and have struggled with model selection. I found the procedure which I decribed working, but I would really be appreciate to hear whether the procedure is sound, or whether there are better alternatives.


Less is more. Stay pure. Stay poor.
Do you every have scenarios where you have a hypothesis you are testing, so you already know what you want in the model and have to control for even if it slips outside the traditional 0.05 cutoff. Do you have to worry about interactions or cross-level interaction terms?

If seems like you could do a covariance/variance matrix at the beginning to get an idea of which variables may be collinear.