Am I introducing a selection bias in my prognostic biomarker study ?

Hi all,

I'd be glad to get the advice from someone who's better than me at biostatistics, especially regarding selection biases in survival studies.

I have 300 patients with a specific disease, an available biopsy at study inclusion, and prospective follow-up.
I want to study if the expression of 3 biomarkers is an independent factor of survival, i.e., independent of other - already known- cofactors of survival in this disease, that is age of the patient and stage of the disease.
I plan on fitting a Cox proportional hazards ratio model on survival including as explanatory variables: age of the patient, stage of the disease, and my 3 biomarkers as measured in the patient's biopsy.
I would need around 50 events (10 events per variable), as I have 5 variables in the model.
Now, I can't study the whole cohort of patients (it's too expensive). I can only study 100 patients. If I randomly select 100 patients from that cohort, I would have around 30 events, which is not enough to fit the model.
So I want to select primarily patients with an event, that is, dead patients. I would select 50 dead patients and 50 patients who're still alive with comparable follow-up times.

Is the methodology correct ?
Am I introducing a bias in this study, that would make my results definitely irreproducible ?
(if yes, I would be happy to find a solution !)

Thanks for your help !!


Omega Contributor
So say you randomly select from deceased, would the competing event possibly proceed the event of interest during the study followup period?


Omega Contributor
That probably affects things. I would look up the use of a case-control study design when performing survival analysis with competing events. See what others have done.

I have the feeling that you can run case-controls models with Proportional Hazard models, and the estimates are fine. Though you probably need to correct the model intercept, since you are forcing a false prevalence on the outcome of the model (say 50%). You issue would be that you also have to address competing events. Which I have not done personally, but I believe if there is a competing event you have to let the model know and there follow-up time gets weighted differently.