issue with dependence of observations across models

#1
Dear all,

I have a question regarding an intervention design and statistical analysis related to it.

My intervention design has 2 cohorts: per-intervention cohort and the post-intervention cohort. Each cohort has around 1000 people, but around 70% people from the pre-intervention will go to the post intervention period. In other words, some people are only in the pre intervention cohort and some new people in the post-intervention cohort, and some in both periods. The outcome measure is binary, and I am going to run 2 logistic regressions and compare the changes of predictors. I am aware of the argument that comparison of logit coefficients across groups can be misleading if underlying variance heterogeneity is present (e.g., Allison 1999).

My question is: apart from variance heterogeneity, is there issue with the dependence of observations across models as some people are in both pre and post periods, and if so, how can I address it from a statistical standpoint? Any suggestions or references?

Many thanks.

James
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Yeah, this is inappropriate since there is correlation between only some people and you aren't controlling for this. Thus you are not addressing a component of variability - neglecting it risks have too narrow of SEs and possibly type I error. My weight is correlated with my weight from last week, they aren't independent. Are there people in the first period that are lost to follow-up and not in the second period as well as people in the post-period that weren't in the pre-period?
 
#3
To answer your question, yes, there people in the first period that are lost to follow-up and not in the second period as well as people in the post-period that weren't in the pre-period, and there are people in both periods.

Thanks.
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
What happens if you just use those in both cohorts? What are the biases introduced? Can you identify people in both sets, link them be an identifier?
 
#5
Yes, I can identify those in both sets as each person has an ID. My question is like yours: what bias would be introduced if I use those in both cohorts? Or for independence of sample, for the post-period model, I only use those who are not in the pre-period?

Thanks.
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
Well one issue if you use all data, which i wouldnt do, is you could have no difference in those in both groups, but all that arent in post have one outcome and the complement in the post cohort, showing an effect that doesnt exist. You could have any derivative of this. Best case scenario is that inclusion status/ missingness was random, which i doubt, and result are biased toward null.ward

Just using those in both opens the door for some type o selection bias.

Can you describe the project in more detail so we can better understand risks.
 
#7
Additional information: the near 1000 people are all inmates in a prison, not randomly selected. The purpose of the study was to examine is there any change in terms of risk factors between pre and post.

Thanks.