Why not use Kaplan Meier?

hlsmith

Omega Contributor
#2
I don't have time to review the paper right now, but it seemed super interesting. Is this examining methods for longitudinal survival data (repeated exposure data)? If so, I may try to read it next week.


A general comment, KM is usually used for an initial exploratory analysis. Though, it can't handle multiple covariates, so then proportional hazard regression are used (cox regression AKA survival regression). Then when using PHReg different distributions are explored to find best fit for data. I am guessing, as I did not read the paper, this is what may be happening!
 
#4
Thanks a lot for your answers, the full paper is here
http://dmkd.cs.vt.edu/papers/TKDE17.pdf

If I understand it right, they use Kaplan Meier because
For each censored instance, we estimate the probability of
event and probability of censoring using Kaplan-Meier estimator
and give a new class label based on these probability
values. This approach assumes that the censoring time is
independent of the event time and all the attributes X
They also say that:
Unlike other methods that handle censored data,
this approach can simply solve the uncertainty with such
censored data by labelling them as event or event-free based
on the consistent Kaplan-Meier estimator.
But in my case censoring DEPENDS on other attributes, I couldn't find an example of doing such labelling as they do here using other method
 

hlsmith

Omega Contributor
#5
Labeling them event or censored is just finding probability of outcome. You can likely just use proportional hazarards. What do you want to do, tell us about your study. This is a high level approach paper, perhaps not in line with your objectives. Have you used survival analyses before?

Thx
 
#6
Unfortunately for me I haven't used survival analysis before, I know R and weka and that's why I was assigned this paper that I have to be able to explain I think that my boss thinks that it might be useful for us.
The more I read it the most confused I get.
It does 3 main things as I understand:
1. Create a 50% dataset and a 100% dataset and use Kaplan Meier with one of them in order to label all instances (I thought it would be the 50% dataset but looking at the R code in line 132 reads the 100% dataset and calculates Kaplan Meier with it
https://github.com/MLSurvival/ESP/blob/master/ESP_TKDE2016/TKDE_code.R
(how is that???)

2. Run weka with a known classifier (for example naive bayes) in order to classify the 50% dataset (or the 100% dataset???) Should be the 100% if it is already labelled with Kaplan Meier but I'm not sure.

3. Fit a distribution (for example exponential or weibull to calculate probabilities that then uses to calculate performance measures with ROCR package (for example to calculate prediction and performance that uses to calculate AUC which is a measure of the quality of the classifier.