Comparing two survival curves

#1
Hi

I have two survival curves/functions given by two sets of data points

(t1, P(T>t1)), (t2, P(T>t2) ), ..., (tn, P(T>tn) )

(t1, P*(T>t1)), (t2, P*(T>t2) ), ..., (tn, P*(T>tn) )

where the time points are t1, t2, ..., tn and for each time point I have an estimated probability of survival until that data point.

The first survival curve, P(T>t), is estimated using a Kaplan-Meier estimator for multiple subjects/observations.

The second survival curve, P*(T>t), is estimated using a machine learning method and it models the survival for a single subject.

I am now in need to compare the two survival curves and ascertain whether there is a significant difference between the two. My situation is just very peculiar since I am dealing with a survival curve for a group of subjects vs a survival curve for one subject. Traditionally you would have two survival curves for different groups (eg male and female) and then you could check if they are different using a log-rank test.

I'm quite stuck so any ideas/references would be highly appreciated.

Thanks in advance.
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Can you clarify the one-person model? So how do you get a curve for one person? Can you provide a link to the method used, so we can understand it better? I would start by trying to plot them both. Are there standard errors to use for confidence interval calculations?
 
#3
Can you clarify the one-person model? So how do you get a curve for one person? Can you provide a link to the method used, so we can understand it better? I would start by trying to plot them both. Are there standard errors to use for confidence interval calculations?
Hi
The method I used was from the following paper: Survival analysis as a classification problem (Chenyang Zhong and Robert Tibshirani), section 2.1-2.3.
https://arxiv.org/abs/1909.11171

They use an approach, call stacking, where they group together observations based on whether or not they failed within a given uncensored observations failure time. Then a binary classifier is applied to the "grouped" dataset and its output can be used to construct survival curves.

I have plotted both survival curves and they are very similar so I can visually conclude there is no statistically significant difference between the two but I would like to reach this conclusion through more rigorous means, ie a significance test.

Yes I have computed standard errors for every estimated survival probability, for both methods.
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
I havent looked at the link yet, but how about plotting the calibration curves for the two approaches with confidence intervals?
 
#5
sort of an interesting link, i hadn't realized that relationship. to your question, it seems that the answer to your question of whether there is a significant difference between the two would be 'no', if both methods are consistent estimators of the survival curve, and 'yes' elsewhere. I think there may be some kind of fundamental problem in constructing a statistical test like you are desiring, which compares the estimates you have in hand, but I can't put my finger on it. I mean, its like if I gave you a median (non-parametric like log-rank) and a mean and asked you if they were significantly different. It would boil down to a question of whether normality/symmetry assumptions were met or similar wouldn't it?

I think one approach would be to refit each method over samples of the patients and look at the correlation.