I am analysing time to event data using Cox Regression (SPSS) as there are a number of covariates to be included.

Previously, the end of the study was 2009. Recently, some participants who were recorded as not having had the event by 2009, were found to have had the event by the end of 2013. So now the cases who had the event have an end of study time of 2013, but all other participants remain at the 2009 end of study. There had, of course, been all the usual loss to followup etc in the dataset with the end 2009 timepoint, ie not everyone made it to end of 2009 for various reasons.

The investigators want to include these new cases, but I don't because their inclusion in the dataset post 2009 is dependent on their status; for all the others, their status remains as it was by end of 2009. The investigators say they know that the other participants will be disease free (because they would have been reported to the participating clinics), but surely we have to have the most recent follow data and date for them. I am very concerned that the new cases, who were detected after previous end of study time, will bias results as we don't have similar post 2009 data for the others.

This is the way it is with survival analysis - there can often be cases with the event under study occurring after the end of the study, but the goal posts just can't keep on moving as more cases are detected without subsequently following up all other participants (even though some will be lost to followup). AM I talking rubbish or not?

This is a good example of the adage "Just because you can do something doesn't mean that you should." You are correct in your concern that inclusion of these additional data without including the rest of the sample could potentially bias the results. The ideal would be to include all of the new data, including those that are confirmed disease free.
I'm also interested in this question, so just out of curiosity: I guess you did consider to censor all other (supposedly disease free) samples for 2009?


Yes, used 2009 as latest date for censoring, so those with disease and those disease free now have same end of study time, although those with the disease followed up until 2013. Those diagnosed with the disease between 2009 and 2013 had to be excluded.

This is because to include them would have violated the assumption of non-informativeness of censoring, that is, the disease-free group were now censored at 2009 BECAUSE they were disease free while the disease group were censored at 2013 BECAUSE they had the disease. So, just had to go with the data as of end of 2009.


Recently, some participants who were recorded as not having had the event by 2009, were found to have had the event by the end of 2013.
so wait, just to be clear, are these people that had the event between 2009--2013, or did they actually have the event before the 2009 cut-off but you didn't know it until later? I thought I understood the first time I read it, but on 2nd look it's not clear when their events occurred.

If the events occurred before 2009, and you didn't know by study end just because of time-lag of data recording, that's a subject-domain problem that many people run into (if that's any consolation), and is indeed unfortunate. If these happened 2009-2013 I agree with you, too, and wouldn't want to include them, except possibly, as was mentioned, if censoring could account for it somehow.


Hi, sorry for any confusion. Yes, there were some people that had the event AFTER 2009, so they had the event after the previous end-of study date. These are the ones I agonised about and have now told the researchers must be excluded.

All people who had the event pre 2009 are included.