# question about survival analysis and late entry data

#### sunny3333

##### New Member
I have a large ongoing national longitudinal survey data. The original cohort contains people between age 51-61 in 1992. Since the beginning of the survey, it adds a new cohorts of respondents between age 51-61 to the study every 6 years. So in 1998 a new set of people aged between 51-61 at 1998 were added. And in 2004 another set of people aged between 51-61 at 2004 were added. All cohorts are followed up to 2014. My outcome of interest is onset of a chronic disease, my risk factor is a baseline characteristic (baseline being 1992, 1998, and 2004 for the 3 cohorts respectively). I’ll be doing a survival analysis using age as time scale with left truncation as many literature suggested appropriate for longitudinal health survey data.
Now my questions are: Can I use all three cohorts in my analysis? If so, how do I handle the fact that the respondents entered the study at different time? If I want to do propensity score matching on my risk factor, how do I handle the 3 different cohorts?
Thanks you very much for your help!

#### sunny3333

##### New Member
Thanks Miner for your response. I'm not sure I should treat my data as left censored though. I was going to treat it as left truncated. My data looks something like the following:
cohort 1 person 1: entered study in year 1992 at age 52, was followed till year 2000 at age 70 when developed the event of interest, risk factor measured at baseline survey (1992).
...
cohort 2 person 1: entered study in year 1998 at age 56, was followed till year 2002 at age 60 when developed the event of interest, risk factor measured at baseline survey (1998).
...
cohort 3 person 1: entered study in year 2004 at age 53, was followed till year 2008 at age 57 when developed the event of interest, risk factor measured at baseline survey (2004).
...
Now I would like to use Cox model to examine the effect of risk factor on incidence of the event. And I'm using age as the time scale. So for the 3 persons above, their time intervals are (52, 70), (56, 60), (53, 57).
Does this approach make sense? How should I handle the cohort indicator? Stratify on it? If I want to do propensity score matching on my risk factor, how do I handle the 3 different cohorts? Thanks for your advice.

Last edited:

#### Miner

##### TS Contributor
I'm coming from a different field, namely industrial statistics, so I am not familiar with the conventions in your field. However, survival analysis is an adaptation of reliability analysis. In reliability, we have an analogous issue with product failing in the field. Every day/week/month new product is shipped. This is analogous to new cohorts of product every day/week/month. When I analyze this scenario, I use arbitrary censoring and include all cohorts.