# Thread: LMM for within and between subjects

1. ## LMM for within and between subjects

I am a biologist wishing to examine sources of individual and temporal variation in stress in a primate population. We obtain our samples quite opportunistically, so individuals are sampled in various months over 10 years. I am attempting to use LMM (in SPSS11) in order to control for variable individual sampling and potential random individualistic effects.

I have some variables that describe traits of the female sampled (e.g., social rank), some variables that describe traits of the month in which she was sampled (e.g., food availability), and some variables that are a bit of both (e.g., her age or how much aggression she received). Ideally, I would like first to create a model that tests sources of individual variation and then, retaining those factors, test hypotheses about temporal variation.

My question is primarily about the first application: testing for intrinsic factors (age, dominance rank, reproductive status, e.g.). The problem is that, unlike examples I've seen in books, my level2 predictors change over time. I want to understand how the models deal with within versus between subject variation.
1) I assume random effects in such an analysis would be Intercept|Female and Month (since for now I'm not looking at purely temporal predictors).
2) I have one variable that is constant for most (>90%) of individuals. Since only 2 subjects have a change, I can't really examine within-subjects but only want to know how this variable affects between-subject variation. The model seems to be very sensitive to their within-subject variation. Would it be more appropriate for me to assign them a single value (or a dummy identity for the two conditions, e.g., A1 and A2)? Or is there another way to deal with this problem?
3) Age is a variable that changes for most subjects but varies much more between subjects. Is the model considering both effects?
4) Is it stupid to take this two-stage approach? Or should I be considering individual and temporal effects together from the start? Please understand that the typical approach in my field is to throw 16 variables into a model, discard the non-significant ones and call it quits...this seems lousy for hypothesis testing.

MANY thanks!

2. Originally Posted by monkeygirl
I am a biologist wishing to examine sources of individual and temporal variation in stress in a primate population. We obtain our samples quite opportunistically, so individuals are sampled in various months over 10 years. I am attempting to use LMM (in SPSS11) in order to control for variable individual sampling and potential random individualistic effects.

I have some variables that describe traits of the female sampled (e.g., social rank), some variables that describe traits of the month in which she was sampled (e.g., food availability), and some variables that are a bit of both (e.g., her age or how much aggression she received). Ideally, I would like first to create a model that tests sources of individual variation and then, retaining those factors, test hypotheses about temporal variation.

My question is primarily about the first application: testing for intrinsic factors (age, dominance rank, reproductive status, e.g.). The problem is that, unlike examples I've seen in books, my level2 predictors change over time. I want to understand how the models deal with within versus between subject variation.
1) I assume random effects in such an analysis would be Intercept|Female and Month (since for now I'm not looking at purely temporal predictors).
2) I have one variable that is constant for most (>90%) of individuals. Since only 2 subjects have a change, I can't really examine within-subjects but only want to know how this variable affects between-subject variation. The model seems to be very sensitive to their within-subject variation. Would it be more appropriate for me to assign them a single value (or a dummy identity for the two conditions, e.g., A1 and A2)? Or is there another way to deal with this problem?
3) Age is a variable that changes for most subjects but varies much more between subjects. Is the model considering both effects?
4) Is it stupid to take this two-stage approach? Or should I be considering individual and temporal effects together from the start? Please understand that the typical approach in my field is to throw 16 variables into a model, discard the non-significant ones and call it quits...this seems lousy for hypothesis testing.

MANY thanks!
Ooh, this is a question I can really sink my teeth into...:-) Although I have to honestly say this might be something that is too hard to do in writing. It's easy to misunderstand without lots of little questions.

But I really ought to go to sleep, and would probably think better tomorrow, so for now one answer and one question.

For individual age, it seems that it could become confounded with the time varying covariates. Could you just use "Age at first observation" for each individual? Then use time or its covariates to basically cover the effect of ongoing time. Or is there something about specific ages (eg. reaching ***ual maturity) that is needed.

You say you have observations over 10 years. How many for each female (what's the range?). Are measurements taken in discrete time or continuous, and how often for a single female? i.e. do you measure all females (that you can find) once each month? Or could one be 4 days apart and another 20 days apart?

Do you have more than a full years data on a single female? Are there seasonal variations in the months. What I'm getting at is if you follow the same female for say 4 years, and some covariates or the DV vary a lot seasonally, you might have trouble with treating time as both a linear (as in age) and a circular (as in month 12< month 1) variable. Actually, that might work....

The variable that is nearly constant--is it continuous or categorical?

Anyway, I'll check in tomorrow. Couldn't help myself there.

Karen

3. Originally Posted by TheAnalysisFactor
For individual age, it seems that it could become confounded with the time varying covariates. Could you just use "Age at first observation" for each individual? Then use time or its covariates to basically cover the effect of ongoing time. Or is there something about specific ages (eg. reaching ***ual maturity) that is needed.
Thanks so much for your interest in helping. I could definitely use a single age for each female, if that seems like the best idea. Though, to be clear, the time-varying covariates wouldn't necessarily be changing in one direction over time. Just that in a given month there would be things like the amount of food available and the amount of aggression going on that would potentially affect an individual's stress levels in the month.

You say you have observations over 10 years. How many for each female (what's the range?). Are measurements taken in discrete time or continuous, and how often for a single female? i.e. do you measure all females (that you can find) once each month? Or could one be 4 days apart and another 20 days apart?
I've got the measures summarized by month. We're only able to measure some females in a given month and it's pretty opportunistic. Females may be sampled in 1 month or in 40 months. Each datapoint would be a female, her associated characteristics (age, rank, and reproductive state), characteristics of the study month (diet quality, overall aggression rates) that are common to all females, and perhaps things that are specific to her in a month (the amount of aggression she personally received).

Do you have more than a full years data on a single female? Are there seasonal variations in the months. What I'm getting at is if you follow the same female for say 4 years, and some covariates or the DV vary a lot seasonally, you might have trouble with treating time as both a linear (as in age) and a circular (as in month 12< month 1) variable. Actually, that might work....
Yes, some females were present the whole 10 years (though we wouldn't get samples in all months). We don't expect strict seasonality...so "October" might not be important, but individual months (e.g., "October 1998") have particular characteristics of interest.

The variable that is nearly constant--is it continuous or categorical?
Categorical.

4. Analysis Factor -- Here is a fake scenario that is almost a direct analog for what I want to do with chimp data but has variables that could be more generally understood. The questions are more or less the same:

Let's say you want to study factors that impact happiness among women, using LMM in SPSS. You select a small gym and for 10 years you go in one day a month and interview 25 women as they leave. Because some women go to the gym more than others, they appear in your sample many times, while others maybe only once. Let's assume that women aren't more or less happy the longer they attend the gym

You define 4 variables that you think might describe individual variation in happiness:
- Age
- Income category
- Parent's income category
- Whether single, dating, or married

You also define variables you think might affect variation over time:
- the US economic condition that particular month
- the amount of traffic in the city that month
- the number of time personally stuck in traffic that month

(1) The approach I would prefer to take is BEFORE modeling the temporal factors to first do an LMM of inter-individual factors, but with the knowledge that there is a lot of unexplained temporal variation going on and individuals' happiness scores may depend on when they were sampled. Would it be appropriate to enter the individual factors as fixed and then use both Intercept|ID and Month as random variables...noting here that Month is specific (Oct98, not Oct)? A friend and I are debating -- he says that Month would have to be a fixed effect because otherwise it only controls for within-individual variation. But, I think this is only if one enters RANDOM = (Intercept + Month)|ID rather than RANDOM = Intercept|ID and RANDOM = Month. Is that correct?

(2) Some individual effects are constant within a subject (parent's income category), some change (Age, dating status), and one only changes for a tiny fraction of individuals (Income category). It's the last one I'm particularly worried about. Should I keep this constant (by exclusion or a dummy variable) since the sample isn't really sufficient to model within-subject effects of income? The problem I am encountering is that "income" is strongly correlated with "happiness" but the only individual who changed "income" actually reversed the pattern. She completely changes the outcome of the model.

(3) Since age is autocorrelated within an individual, should it be kept constant (use the average for that woman) or is it better to describe her actual age at each datapoint?

(4) Let's say I were to create a model with an intercept-only random effect of woman and fixed effects of "Income" and "Number of times caught in traffic", and the traffic variable is significant. Because this variable changed both within and across individuals, am I right to thing that is the model telling me about BOTH within and between subject variation?

THANKS!

5. Originally Posted by monkeygirl

(1) The approach I would prefer to take is BEFORE modeling the temporal factors to first do an LMM of inter-individual factors, but with the knowledge that there is a lot of unexplained temporal variation going on and individuals' happiness scores may depend on when they were sampled. Would it be appropriate to enter the individual factors as fixed and then use both Intercept|ID and Month as random variables...noting here that Month is specific (Oct98, not Oct)? A friend and I are debating -- he says that Month would have to be a fixed effect because otherwise it only controls for within-individual variation. But, I think this is only if one enters RANDOM = (Intercept + Month)|ID rather than RANDOM = Intercept|ID and RANDOM = Month. Is that correct?
I assume ID is the subject ID of each female?

Month is not random. If you want a random intercept model, you can specify it with either intercept or ID (I'm actually much more familiar with mixed models in SAS than SPSS, but I think they're quite similar).

In this model, you have a two-level nested design. Month is nested within female. It's a standard growth curve or repeated measures design, if I'm understanding it correctly.

With the random intercept model, the residual error will measure the month-to-month within individual random variation.

Originally Posted by monkeygirl
(2) Some individual effects are constant within a subject (parent's income category), some change (Age, dating status), and one only changes for a tiny fraction of individuals (Income category). It's the last one I'm particularly worried about. Should I keep this constant (by exclusion or a dummy variable) since the sample isn't really sufficient to model within-subject effects of income? The problem I am encountering is that "income" is strongly correlated with "happiness" but the only individual who changed "income" actually reversed the pattern. She completely changes the outcome of the model.
I honestly don't know. I'd have to see exactly what's going on--try out the data. That's a tricky one if it changes the whole model.

Originally Posted by monkeygirl
(3) Since age is autocorrelated within an individual, should it be kept constant (use the average for that woman) or is it better to describe her actual age at each datapoint?
This is what I meant. If you use her age at each datapoint, I believe it's confounded with month, once you know her age at any one point (since it's clear now that you want to think of months as linear). You could use her first age or average--they'll tell you the same thing.

Originally Posted by monkeygirl
(4) Let's say I were to create a model with an intercept-only random effect of woman and fixed effects of "Income" and "Number of times caught in traffic", and the traffic variable is significant. Because this variable changed both within and across individuals, am I right to thing that is the model telling me about BOTH within and between subject variation?
Yes.

And sorry it took so long to respond. I had to think about it a bit.

Karen

6. Thanks so much, AnalysisFactor. Your answer is very clear. It sounds like maybe I was making things more complicated than they needed to be!

7. Originally Posted by monkeygirl
Thanks so much, AnalysisFactor. Your answer is very clear. It sounds like maybe I was making things more complicated than they needed to be!
I just wanted to follow up with this.

I just came across a resource about crossed random effects and it made me think of this study.

When I was trying to figure out the full model (not the initial one we talked about which is truly nested), I kept going back and forth between whether observations were nested within female or month. I realize now, that they are nested within both, and month and female are crossed.

Dr. Lesa Hoffman, who teaches mixed models classes, has her lectures in mp3 on her web site. Go to http://psych.unl.edu/psycrs/945/index.html and find the lecture on Crossed Random Effects.

BTW, this is a great resource for anyone needing to learn mixed models. She has dozens of lectures.

Karen

8. Thank you. These lectures are indeed very informative!

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts