obtaining the AUROC in a test (subset) with time-dependent (survival) ROC from Cox model

#1
I have survival times and fixed baseline values of biomarkers and risk factors ( I actually have them measured longitudinally but want to deal with the baseline values first).

I'd like to (ideally using SAS/PHREG) estimate my Cox PH model with the baseline variables in a (2/3 of data) training subset and assess their predictive value by the area under a time-dependent ROC curve (as developed by Dr Heagerty) in the separate test 1/3 of the data subset. My belief is that I just need to output the linear predictor or predictions from the Cox model that uses the training data and then feed those predictors as a single variable into a Cox model that outputs the AUROC of the time-dependent ROC curve/s.

I wondered if this was a valid approach ? many thanks
 

hlsmith

Not a robit
#2
Can you add more details. So you are doing a random 66%/33% split for a training and test set? Then building a survival model on the training set using a set of fixed (baseline) covariates. Next you want to score the hold out set and use AUROC as a metric for accuracy? Are you planning on fitting a bunch of models to the training set and tweak it based on cross-validation or the scored holdout set? Can you provide a link to Dr. Heagerty approach?

In SAS I believe there is a PROC PLIN or something like it for scoring hold out sets.
 
#3
hi hlsmith, thanks for your help again
you're right in interpreting my post
I probably wasn't planning too much (if any) variable selection or model fitting - and therefore no cross validation in the training set. I will really just be fitting established risk factors, basic demographics (age, gender) that I want to adjust for no matter, and a biomarker.
links to Dr Heagerty approach are https://www.ncbi.nlm.nih.gov/pubmed/15737082 and https://support.sas.com/resources/papers/proceedings17/SAS0462-2017.pdf
 

hlsmith

Not a robit
#4
The second link is interesting. I will acknowledge that I have not done that before. I may read the SAS paper tomorrow and get back at you. I am also running a PROC PHREG model on Wednesday that I will apply it to and see what comments I may have.

Thanks.
 

hlsmith

Not a robit
#6
@statlimp - thanks for sharing the links. I read the SUG paper. I don't see anything wrong with your approach, given a data split won't decrease your sample sizes by too much, where you don't have sufficient power in the training phase. I think there may be a trick in SAS where you may be able to keep the validation data in your set, but set the outcome to missing. Then create your model and when you tell SAS to output predictions for the model, it will output predictions for observations with missing data, but won't use them in the model building process due to listwise deletion. Then you can pull out those predictions and see how well the model predicted survival.