Accounting for random effects from sampled individuals in GLM

Hello, this is my first post but I am sure it won't be my last. I am working on my thesis and just sent the first round of analyses to my committee. I understand most of the feedback, but one thing. The response was "You have issues with independence. Do you count every record of a hatchling as independent or should an individual be a random effects covariate in your model. Some autocorrelation thing should be done."
I am running a logistic generalized linear model using R that measures habitat selection of a hatchling animal. I used a random paired design, therefore every time a hatchling was located I measured a suite of variables at its location as well as a random location nearby. Locations the hatchlings were found at received a 1 and the random controls received a 0.
I suppose I counted every record of a hatchling as independent. Is this wrong going by the comment I received back? I sampled 66 individuals which resulted in some 700 locations. I read up on fixed and random effects to understand his comment. My habitat measurements are fixed variables, but I assume he is worried that individual is a random effect not being accounted for? Is there a sort of test for this? Thanks for any help everyone!!


New Member
If the RE variance is small, it will not contribute much to the model -- use that for starters. Can also try a LRT comparing model with vs. without RE.

What is your response (dependent) variable?

I suspect, though, what they're trying to get out of you is any few sentances that sound intelligent regarding random effects. And if the quote above is exactly what a professor said, I don't think it'd take much to give them a "wow" factor, or it could be that I don't have enough info. What autocorrelation do they think might be needed? Maybe some kind of spatial autocorrelation? I'd only suggest something like that as part of a model-building process that's matured through several iterations already, unless its very clear that it's needed. I guess by independence they're referring to the relationship between each subject hatchling and its paired control(s)?
Thank you so much for the response. You know, after doing some research I was kind of thinking it was just a question that wanted a response not an action.

My response variable is selected (binary) (1,0). I was thinking by independence he meant the effect each sampled hatchling has on the independent variables. The effect on the response variable wouldn't make sense in my mind because I radio tracked each hatchling, so it's not like finding one led to finding another. I was thinking he was seeing it like this: One hatchling favors a location characterized by herbaceous growth and open canopy while another favors woody growth and closed canopy. I suppose if that is the case, I would do some sort of plotting the model residuals vs. the individual hatchlings?

I honestly do not know what sort of autocorrelation he was talking about. The obvious thing is to email this guy to clarify, however I am in a terrible situation where I have a committee member that does not want to be bothered and honestly would get angry/annoyed it I replied back asking for clarification. That was the full quote as well unfortunately.

I'm not sure what RE stands for (sorry I am a newb to all of this). I compared the model with and without individual and it did not make much difference. My pseudo r2 improved a little, but that's to be expected adding more variables. I plotted individual against the model residuals and for the most part variance was the same.



Ambassador to the humans
It sound to me like you should be using conditional logistic regression. I think what they were getting at is that you have essentially pairs of observations. The hatchling location and the location nearby. If the idea is to figure out "why did they choose this specific location instead of some other nearby location" then conditional logistic regression is the way to go.
Hmm interesting that this is the first time I have heard of conditional regression in all my research. Paired random design has become very popular in ecology research for modeling habitat selection of not so mobile animals like insects and turtles. The two big published studies of my study organism that used this model did not use conditional regression. Perhaps this is an issue of ecology adapting it from other fields. This does sound like the right analysis for my data. I'll continue looking into this, but in the meantime I am curious as to how a data sheet would be set up for analysis. Right now I have the variable fields as the header for my columns. Would I just add a field with an individual and time ID. For instance, under the field ID column I would have hatchling1-1, hatchling1-2, hatchling1-3, etc. for the hatchling ID and which location that was. And then have the same ID for the control it was paired with. So there would be two hatchling1-1's, but one would be scored with a 1 for selected and one scored with a 0 for control? Thank you guys so much for the great feedback this is wonderful!
As I re-read the studies I suppose they did use this technique they just replaced the word conditional with paired. All of this time I had interpreted that they were comparing all treatments vs. all controls and looking at the means, rather than subtracting the values between each pair. I suppose that's what they meant when they said it was "analogous to a paired t-test."
Oh ok, np! So many terms and abbreviations in statistics. The worst part for me starting out was interchangeable terms like independent/predictor/covariates or logistic/binomial and in my apparent case here conditional/paired.


New Member
RE isn't standard, I was just being lazy ...
also ... oh yeah, conditional logistic sounds good, sounds like a good suggestion to me, be sure to let us know how it goes
Will do. I figured out how to set up the excel file too. I can't believe I over looked this. Thank you so much for pointing this out I would have looked quite stupid next week when I meet with my committee. Now time to run this and try to interpret the results I'll let you all know how it goes. Thanks gain!!
Beautiful I got it to run! The only annoying part was step aic wouldnt work so I had to do drop1() over and over. Here are the results of my best model. Now time to interpret it. Seems the hatchlings like habitat with lots of woody saplings, ground cover gravel, and a certain temperature. So glad I found this forum, thanks again!!
coxph(formula = Surv(rep(1, 708L), Type) ~ Temp_Selec + Woody_Sapl +
Gravel + strata(Set), data = hatchlings, method = "exact")

n= 708, number of events= 354

coef exp(coef) se(coef) z Pr(>|z|)
Temp_Selec 0.050649 1.051953 0.029757 1.702 0.0887 .
Woody_Sapl 0.017873 1.018034 0.009425 1.896 0.0579 .
Gravel 0.009015 1.009055 0.004550 1.981 0.0476 *
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95
Temp_Selec 1.052 0.9506 0.9924 1.115
Woody_Sapl 1.018 0.9823 0.9994 1.037
Gravel 1.009 0.9910 1.0001 1.018

Rsquare= 0.014 (max possible= 0.5 )
Likelihood ratio test= 10 on 3 df, p=0.01861
Wald test = 9.37 on 3 df, p=0.02473
Score (logrank) test = 9.78 on 3 df, p=0.02056
Of course running this has provoked more questions. First, why does R produce an r2 value for a logistic model? I was using mcfadden's r2 when I ran a glm to assess model fitness, however the function for the psuedo r2 will not work on the clogit() model. Second, can I no longer use categorical predictors? If the test works by subtracting each paired predictor it doesn't really make sense to subtract categorical values? Thank you again!