Logistic Regression vs survival analyses in large cohort studies

#1
Hello,

I was wondering if anyone could give me some advice regarding the right statistical approach for a large cohort.

I have a large prospective cohort of ~1.5 million lines of data. It contains people who were assessed as being eligible for an intervention who either went on to have the intervention, wait for the intervention, and not have the intervention. These people went on to potentially deteriorate, require permanent care, and/or die at different time points. The cohort contains data which spans approximately 8 years. Obviously the people at the start of the study had a higher chance of dying before the end of the study compared to the people who entered the study at a later time point.

I have kind of gotten lost in reading about logistic regressions, survival analyses and I honestly don't know. I am familiar with SPSS however I don't have a stats background (medicine) and the idea of trying to use SAS kind of scares me. Some of my colleagues have suggested that I might need to use SAS if I was to analyse this study properly. If anyone knows papers which potentially have similar sounding statistical analyses in papers I would be grateful if you could point me to them. As my particular field does not have studies of this nature and I'm struggling to find something similar to compare.

Anyway, advice would be incredibly appreciated. I am developing a research proposal around it.
 
#2
If you have recorded the days on which participants died or dropped out of the experiment, then survival analysis is a place to start. SPSS has basic survival analysis functionality. The best implementations of survival analysis are in Stata and R, in my opinion... In the ideal world (if you had statistical training), you would model deaths as counting processes with stochastic intensities.
 
#3
If you have recorded the days on which participants died or dropped out of the experiment, then survival analysis is a place to start. SPSS has basic survival analysis functionality. The best implementations of survival analysis are in Stata and R, in my opinion... In the ideal world (if you had statistical training), you would model deaths as counting processes with stochastic intensities.
Someone suggested that I might be able to analyse the cohort year by year as a way to counteract the people entering the study at the end have less chance of experiencing the event. Is this something a potential option?

Would there be something wrong in using SPSS in this situation with its basic survival analyses ? I want to use the "right" approach. I can self teach. However while I am very good at medicine, I know my natural brain doesn't lend itself to coding or advanced mathematics.


Also the start of the study is that participants are eligible for the intervention. However they either 1) get the intervention, 2) not get the intervention or 3) have to wait for the intervention. Would there be a way to incorporate the "wait time" into the model? Or would it be better to just turn it into 3 different categorical variables?
 
Last edited:
#4
Survival analysis will address all these questions if you study it carefully. This includes participants falling into different categories and some of them allowing intervention with a delay.

It is not good to bin the data into cohorts. By applying such a crude discretization you are ignoring a lot of information. Information is money. You do need to study survival analysis if your want to do this properly. Survival analysis is the simplest acceptable collection of methods in your case.
 
Last edited:

hlsmith

Omega Contributor
#6
So you have data for 1.5 M people?

you have people eligible for intervention but don't take it, are any of those characteristics also associated with risk of outcome? A sort of flipped confounding by indication.

I usually recommend propensity scores but with as many people as you have it probably isn't needed unless you have time-varying confounders.

how do you know someone would have taken the intervention but had to wait?

What is the outcome and do you have competing events, say a person dies which prohibits them from getting result of interest?

Most programs are fine for survival analysis.
 
#7
So you have data for 1.5 M people?

you have people eligible for intervention but don't take it, are any of those characteristics also associated with risk of outcome? A sort of flipped confounding by indication.

I usually recommend propensity scores but with as many people as you have it probably isn't needed unless you have time-varying confounders.

how do you know someone would have taken the intervention but had to wait?

What is the outcome and do you have competing events, say a person dies which prohibits them from getting result of interest?

Most programs are fine for survival analysis.
Essentially I am applying to access data from a couple of linked databases from the government. I dont have the data yet but 1.5 million number is an estimate from the people I am liasing with.

The cohort I am requesting is people who have applied to the government to be eligible for home care services for the elderly, and then found to be eligible. It's a government program and doesn't have contrainidications per se to its implementation.

It is common to have a wait time before receiving the program though secondary to popularity, funding and ageing of the population. Some people may be found to be eligible but refuse those services or not request them, but these are not contraindications to the program.

Outcomes of interest are nursing home respite use, nursing home placement and death. A number of varying outcomes may occur in a participant.
 
#8
Can I also clarify by utilising wait time for the intervention is that a multilevel survival analyses ?

Would it be better to look at doing 2 survival analyses. One looking at the impact of the intervention and one looking at the impact of wait time ?
 

hlsmith

Omega Contributor
#9
Multilevel means you have patients clustered in some group, say classroom, town, etc. I fail to see groups in times unless create groups, which would result in loss of information.