How to extract coefficients corresponding to data from each time in a mixed effect model in R?

jordanlau

New Member
I am not good at statistics and might have a naive question. Any help is appreciated.
Here is the simplified question: Patients received two different vaccines twice and we collected their blood after each vaccination to measure antibody levels in the blood. There are other variables but here I omitted them and just want to know how different vaccines affect the antibody levels. It seems that the longitudinal data mixed effect model is a proper analysis.
Here is the simulated data. 6 patients received Vaccines A or B for the first vaccine and all received Vaccine A as the second vaccine; after each vaccination they visited a hospital to measure blood antibody levels (2 data points at 2 Visits). It is already confirmed with studies that Vaccine A induced higher antibody levels than Vaccine B. So if using only Visit 1 data to do a linear regression, we can see Vaccine A induced significantly higher antibody levels compared to that with Vaccine B. As all patients received A as the second vaccine, if using only Visit 2 data, there is no significance. My question is, if we include all the data in a linear mixed effect model, how to extract coefficients corresponding to each visit data? I only know how to use the summary() to get the coefficients for combined both visits data, however, the significance for the Vaccine type doesn't seem to be right.

data_raw = data.frame(ID=c(1,2,3,4,5,6,1,2,3,4,5,6),
Antibody=c(50,60,70,30,40,35,101,102,102,102,101,103),
Visit=c(1,1,1,1,1,1,2,2,2,2,2,2),
Vaccine=c("A","A","A","B","B","B","A","A","A","A","A","A"),

VaccineChange=c(0,0,0,0,0,0,0,0,0,1,1,1))

Linear regression of Visit 1 data confirmed that Vaccine A induced higher antibody levels.

lm_Visit1=subset(data_raw, Visit ==1)
summary(lm(Antibody~Vaccine,lm_Visit1))

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.000 4.564 13.145 0.000193 ***

VaccineB -25.000 6.455 -3.873 0.017948 *

Linear regression of Visit 2 data indicated no significance as everyone got Vaccine A.

lm_Visit2=subset(data_raw, Visit ==2)
summary(lm(Antibody~VaccineChange,lm_Visit2))

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 101.6667 0.4714 215.7 2.77e-09 ***

VaccineChange 0.3333 0.6667 0.5 0.643

If using mixed model with both visits data, the vaccine variable is significant. Is it possible to extract coefficients corresponding to each visit? Just to show similar results to linear regression that Visit 1 showed significance but not Visit 2.

summary(lmer(Antibody~Vaccine+(1|ID),data_raw))

Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 87.889 6.457 10.000 13.610 8.87e-08 ***

VaccineB -52.889 12.915 10.000 -4.095 0.00216 **

Originally, I performed two linear regression with each visit data. But I was told that for visit 2 data I should use linear mixed model as I got both visit data to account for repeated measurement within patient differences. Other covariates are not included in this simulated model just for simplicity.

Many thanks,
Jordan

Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
in your data, it looks like everyone received A as the second dose?

did you control for time between vaccines? did you control for time until blood draw for antibodies after vaccine? i imagine the times could vary between subjects. did you look at the right antibody (capsid) or could natural immunity be acquired between vaccines and before study unbeknownst to the researcher?

If you did not have a baseline taken at the second dose time period, how would you know the effect, could have both started at the same level or they could have started at different levels by waned at different rates. So if I had a weight loss program and had people change diet then had them exercise but only collected data say 14 days after each intervention, I don't know what is happening in the interim. Ideally, I measure baseline weight, post intervention weight, second baseline weight if second intervention is not started immediately after the first intervention, and post second intervention weight. If I don't have all those values, i could be missing something important. Also, if I didn't randomize the subjects a confounding effect may be unaccounted for, right? Can you describe your study i a little more detail.

jordanlau

New Member
in your data, it looks like everyone received A as the second dose?

did you control for time between vaccines? did you control for time until blood draw for antibodies after vaccine? i imagine the times could vary between subjects. did you look at the right antibody (capsid) or could natural immunity be acquired between vaccines and before study unbeknownst to the researcher?

If you did not have a baseline taken at the second dose time period, how would you know the effect, could have both started at the same level or they could have started at different levels by waned at different rates. So if I had a weight loss program and had people change diet then had them exercise but only collected data say 14 days after each intervention, I don't know what is happening in the interim. Ideally, I measure baseline weight, post intervention weight, second baseline weight if second intervention is not started immediately after the first intervention, and post second intervention weight. If I don't have all those values, i could be missing something important. Also, if I didn't randomize the subjects a confounding effect may be unaccounted for, right? Can you describe your study i a little more detail.

Yes, everyone received A as the second dose.

These questions are brilliant. In this study trial, we can't make the time interval between vaccination, blood collection the same, however, we collected the data and add these as covariates in the model. For the simplicity of the question here, I removed all other variables from the model. We used virus to measure the neutralizing antibodies.

Yes, the baseline levels are very important to understand the waning of antibody levels. However, since it requires patients to do more visits and with a huge population studies, this is not always feasible. Also, the decay of antibodies over time was reported in literature so we did not focus on this part. In addition, there may be many factors affecting antibody levels in ordinary people, however, infection or vaccination are the most decisive factors. These people got 1st vaccine, after similar intervals, we collected blood, then similar intervals, they got 2nd vaccine and 2nd sampling. We collected all the interval data, together with their demographics, and put them into a multivariable model. We try to understand the antibody levels after each vaccination.

We already know that Vaccine A induces higher antibody levels than Vaccine B (use only Visit 1 data). For the second visit, all people received Vaccine A. I used data from Visit 2 to perform a linear regression, results showed that there is no difference in antibody levels since everyone received Vaccine A, regardless of what vaccine (A or B) they received at Visit 1. However, a statistician told me that, since I already had both visits data, using only Visit 2 data is not correct. I should use both visits data to do mixed effect models to account for within patient variance. I was told it is possible to extract variables' coefficients corresponding to each visit in a mixed effect model, but I could not find how to do it.

So I performed linear mixed effect model with all the data. However, I don't know how to show the same results as what I got from linear regression with Visit 2 data -- which showed no difference in antibody levels since everyone got Vaccine A. Combining all data, the vaccine type variable will be significant in the model as data from Visit 1 is significantly different since some people got Vaccine B with lower antibody levels.

Thanks again for your help. Please let me know if you need more clarification.

hlsmith

Less is more. Stay pure. Stay poor.
Just curious how many patients you have data for?

So you want the average difference in antibody levels between vaccines at visit 1 and difference at time two based on original vaccine status, correct? emmeans package helps to get estimates and contrasts out of regressions in R. I have not used it much.

I haven't done many of these designs, even though they are common. Everyone getting A in second period makes it a little tricky, since the model doesn't know you want to carry the classifications forward.

When you add visit you get:

Fixed effects:
Estimate Std. Error t value
(Intercept) 18.179 6.438 2.824
VaccineB -25.012 4.325 -5.784
Visit 41.827 3.702 11.299

which would be: first visit A = 18, B= 18 - 25; second visit A = 18 + 41; B = 18 + 41 - 25, which doesn't make sense. You can rewrite dataframe to make it AAABBBAAABBB, so it gets the second value comparison correct, but you likely also need an interaction term. Pretty busy right now, so let me know if you make progress - but currently you are missing visit in your model.

hlsmith

Less is more. Stay pure. Stay poor.
Code:
data_raw = data.frame(ID=c(1,2,3,4,5,6,1,2,3,4,5,6),
Antibody=c(50,60,70,30,40,35,101,102,102,102,101,103),
Visit=c("T1","T1","T1","T1","T1","T1","T2","T2","T2","T2","T2","T2"),
Vaccine=c("A","A","A","B","B","B","A","A","A","B","B","B"),
VaccineChange=c(0,0,0,0,0,0,0,0,0,1,1,1))

summary(lmer(Antibody~Vaccine*Visit + (1|ID),data_raw))
I am sure it is more complicated than this, but this gets you:
T1 w/ A: 60
T1 w/ B: 60 -25 or difference at t1 = -25
T2 w/ A: 60 + 42
T2 w/ B: (60 + 42 - 25) + 25 or difference: (60 + 42) - ((60 + 42 - 25) + 25)

hlsmith

Less is more. Stay pure. Stay poor.
P.S., I would have to see your full manuscript, but intuitively these results would only make sense if you have a large gap between the two vaccinations where the IgGs plummet in both groups. So results would be conditional or only generalizable to others with a similar gap in vaccine timing. But we couldn't make conclusions on populations with a shorter time gap. Also, how was the first vaccine assigned. If self-selection or based on when it was released - you could have confounding (see negative control papers from Eric Tchetgen Tchetgen on healthy behavior seekers) or the targeted organism may not be the same. Just general comments. Lastly, with MLM models, you can play around a bit to find the best covariance structure for the model. Lastly, if you have a huge data set, you could have a random holdout set to test the models on.

jordanlau

New Member
P.S., I would have to see your full manuscript, but intuitively these results would only make sense if you have a large gap between the two vaccinations where the IgGs plummet in both groups. So results would be conditional or only generalizable to others with a similar gap in vaccine timing. But we couldn't make conclusions on populations with a shorter time gap. Also, how was the first vaccine assigned. If self-selection or based on when it was released - you could have confounding (see negative control papers from Eric Tchetgen Tchetgen on healthy behavior seekers) or the targeted organism may not be the same. Just general comments. Lastly, with MLM models, you can play around a bit to find the best covariance structure for the model. Lastly, if you have a huge data set, you could have a random holdout set to test the models on.
Thank you so much for taking the time to help me!

I have over 300 patients in the study. The gap after vaccinations should be long enough - several months, so the antibodies decay quite a lot.

The complete formula is like Antibody~Group+Treatment+Age+Sex+Ethnicity+Infection+Visit+Vaccine, in which I put many variables.

Vaccines are assigned due to their availability. Your comment is valuable. I haven't thought about this confounding factor. I will read the literature. Not sure about MLM models and random holdout set... I haven't got experience in machine learning. Yes, I should learn some ML in my free time.

I added Visit into the model

Code:
data_raw = data.frame(ID=c(1,2,3,4,5,6,1,2,3,4,5,6),
Antibody=c(50,60,70,30,40,35,101,102,102,102,101,103),
Visit=c("T1","T1","T1","T1","T1","T1","T2","T2","T2","T2","T2","T2"),
Vaccine=c("A","A","A","B","B","B","A","A","A","A","A","A"),
VaccineChange=c(0,0,0,0,0,0,0,0,0,1,1,1))

summary(lmer(Antibody~Vaccine+Visit+(1|ID),data_raw))

Fixed effects:
Estimate Std. Error      df t value Pr(>|t|)
(Intercept)   18.179      6.438   7.602   2.824 0.023544 *
VaccineB     -25.012      4.325   8.987  -5.784 0.000266 ***
Visit         41.827      3.702   6.380  11.299 1.87e-05 ***

summary(lmer(Antibody~VaccineChange+Visit+(1|ID),data_raw))

Fixed effects:
Estimate Std. Error     df t value Pr(>|t|)
(Intercept)      3.632      7.982  8.657   0.455  0.66031
VaccineChange   20.930      6.357  4.008   3.292  0.03006 *
Visit           43.868      4.602  3.587   9.532  0.00113 **

The results make sense. But I want to show that after changing vaccine everyone got Vaccine A, there were not differences in antibody levels among patients, just like what linear regression of Visit 2 data showed in my original post. A statician told me that linear regression is not appropriate, should use mixed effect model, and the coefficients related to each visit can be extracted from mixed effect model, but I don't know how to extract it.

hlsmith

Less is more. Stay pure. Stay poor.

Code:
data_raw = data.frame(ID=c(1,2,3,4,5,6,1,2,3,4,5,6),
Antibody=c(50,60,70,30,40,35,101,102,102,102,101,103),
Visit=c("T1","T1","T1","T1","T1","T1","T2","T2","T2","T2","T2","T2"),
Vaccine=c("A","A","A","B","B","B","A","A","A","B","B","B"),
VaccineChange=c(0,0,0,0,0,0,0,0,0,1,1,1))
library(lme)
lmer_fit = lmer(Antibody~Vaccine*Visit + (1|ID),data_raw)
summary(lmer_fit)
library(emmeans)
emmeans::emmip(lmer_fit, Vaccine ~ Visit)  #Vaccine classifications wrong but  it shows you the effects
emmeans::emmeans(lmer_fit, pairwise ~ Vaccine | Visit)
Difference at T1: 25 (SE: 4.6)
Difference at T2: -0.3 (SE: 4.6)

Review it and think about it for awhile and see if you can understand what is going on. After contemplating, post any questions and I can clarify.

hlsmith

Less is more. Stay pure. Stay poor.
Renaming Vaccine to Initial_Vacine or group may help to digest this.

Add some whisker on the plot would help. Also, if the samples sizes aren't the same size, non-overlapping whiskers is not the same thing as the differences whiskers excluded the null value (i.e., "0").

jordanlau

New Member

Code:
data_raw = data.frame(ID=c(1,2,3,4,5,6,1,2,3,4,5,6),
Antibody=c(50,60,70,30,40,35,101,102,102,102,101,103),
Visit=c("T1","T1","T1","T1","T1","T1","T2","T2","T2","T2","T2","T2"),
Vaccine=c("A","A","A","B","B","B","A","A","A","B","B","B"),
VaccineChange=c(0,0,0,0,0,0,0,0,0,1,1,1))
library(lme)
lmer_fit = lmer(Antibody~Vaccine*Visit + (1|ID),data_raw)
summary(lmer_fit)
library(emmeans)
emmeans::emmip(lmer_fit, Vaccine ~ Visit)  #Vaccine classifications wrong but  it shows you the effects
emmeans::emmeans(lmer_fit, pairwise ~ Vaccine | Visit)
Difference at T1: 25 (SE: 4.6)
Difference at T2: -0.3 (SE: 4.6)

Review it and think about it for awhile and see if you can understand what is going on. After contemplating, post any questions and I can clarify.
I really appreciate your time! The idea to include Initial Vaccine as the variable is intelligent! This solved my question to test how the changing vaccine affects antibody levels.

Thanks again and all the best!
Jordan

hlsmith

Less is more. Stay pure. Stay poor.
Glad to hear! Let us know if you have any more questions.