Model Selection in Survival Analysis


New Member
I'm having trouble with parametric model selection in survival analysis. I founded this sequence of steps in some presentation.

1) Fit AFT model including all covariates based on the Lognormal, log-logsitic, Weibull and Generalized Gamma models for Y (totally 3 models) Use LR tests/AIC to determine your initial model

2) Do backward model selection to identify your final model

3) Conduct residual analysis

4) If it fits, write the fitted final model and interpret the model/describe the effects of covariates.
This is an appropriate procedure? I'm having doubts in 2) and 3) step.

I fitted four models without any interaction term. The LRT and AIC results indicated that Log-Normal model is the best model (Exponential and Weibull reject in LRT), with log-likelihood very close from generalized gamma.

When I should eliminate a variable from the model?

What type of residuals I should check?
This list of points aims to assess several things, which are all necessary:

point (1) aims to select the best type of model regarding the distributions Lognormal,... That is what you already did. For this you usually take all possible covariates since you don't have to be parsimonious with variables at this point (you are not yet interested in p-values which grow with increasing number of covariates)

point (2) aims so select the best set ov possible covariates. This is necessary in order to eleminate covariates which have simply speaking nothing to do but the data but increase only the number of parameters which have to be estimated. If backward selection is the best way is a possible point of discussion, it would be probably a better way to simply compare all models which make sense (if you do not have too much possible covariates)

point (3) aims to make model validation. Each regression implies several assumptions, such as linearity, normality, homogeneity,... You have to look carefully which assumptions your models need. You usually use plots of model residuals to prove these assumptions (e.g. plotting residuals vs. each covariate reveals if homogeneity is violated or not)

point (4) is finally what you really want to do: Interpret your regression parameters, especially sign, effect sizes and significances