Several concerns about Cox regression analysis models


I'm working a retrospective / prospective cohort study on a very rare disease (5 patients a year presenting in my institution). I have a sample of 30 patients that I was able to gather over several years with all available data.

I measured parameter A in my patients and noticed it significantly affects survival. There is also established international prognostic scoring system that significantly affects survival in my sample of patients (let's call this score parameter B). There is high correlation between these two parameters (p<0,001, r2=0,37). When I perform Cox regression analysis, I get significatnt overall model fit (p=0,025), but both parameter A and parameter B can't predict survival independently of each other in my small sample (p=0,098 and p=0,621 respectively). I believe it is due to high degree of correlation and they actually represent the same thing - disease activity.

When I include third measured parameter in a regression model (let's call it parameter C that is shown to affect survival of patients in previosuly published papers, but doesn't affect survival in my small sample (univariantly p>0,05)), my multivariate model on 30 patients returns significant effect of both parameter A (0,01) and parameter C (p=0,033) - parameter B (p=0,82) that is so far recognized prognosic score can't predict survival independently in a model including parameters A and C.

Is this legitimate analysis (since I have small nuber of patients and I included parameter insignificant in univariate analysis)?

Thank you in advance

P.S. - all parameters are numerical variables, I checked significance of univariate effect by including each one separately witohut others in Cox regression model.


Omega Contributor
30 patients total, correct? How many had the outcome? If the model can support that many covariates then you can included them. Are any of them categorical variables that will be dummy coded and entered as more than 1 variable?

You can definitely keep a non-significant variable in the model, if it is of clinical relevance.

You may want to look into an exact/penalized, perhaps Firth corrected model, since you have a very small sample.
Thank you for your quick response.
I have total of 30 patients and 12 events - deaths. I have no categorical variables included into models.

I am concerned about conclusion I can draw from this analysis - i.e. can parameter A predict survival independently of parameter B.
In the first case answer is no - both are too correlated to be independent estimators of survival, et least in my sample of patient and further studies with larger cohorts of patients will be needed to establish real role of parameter A in this disease.

But if I include parameter C, the story changes and parameter A predicts survival independently (even diminishes strength of parameter B to predict survival). I noticed there are several possible variables C1, C2, C3...insignificant in univariate analysis, but known to affect survival, that render parameter A significant and parameter B insignificant. C1, C2 and C3 also stay insignificant in multivariate analysis except in one case mentioned earlier. It is a bit confusing that in setting of additional variable (that is insignificant both previosuly and post multivariate analysis), parameter A becoms more important. I have no valid argument why to include C1 and not C2 into model, situation mentioned in previous post is one of 6 possible models with three variables I created and tested that gave significant result for both parameters A and C (others were significant for parameter A only).
Is it allowed to test for (and keep it as a proof) a 3 variable model including pre and post insignificant variable C (that has previously recognized prognostic significance) when this is such a big game changer for parameter A?
Last edited:


Fortran must die
Beyond power issues I think you would have a problem generalizing to a larger population with such few cases. At the least you should draw attention to contextual factors that show when your analysis could reasonably be applied and those situations that it could not be. One reason you might not be able to separate out your effects is power, multicolinearity commonly occurs when you have to few cases (or at least more cases is often suggested as the best approach to this problem). Are the two effects inherently too correlated or could you separate out the effects if you have more cases? If they are theoretically correlated (not just too little data) why not create an index or measure that collapses them into a single factor?

If you can show there is a theoretical basis for a variable in your model keeping it in should be fine. Commonly the literature reports variables that are found to be statistically insignificant.
After night long thinking, I concluded that truth is in between - whether parameter A can predict survival independently of parameter B is inconclusive at the moment based on this small sample of patients (significant overall model, but borderline significance for parameter A and B when analyzed together), further studies are needed to estabilsh the future role of parameter A in this disease.
If I undestand correctly lack of power limits further more complex multivariate analysis on such small sample. Although I'm not familiar with methods to check how much power my analyses with different number of variables have, intuitively I understand three variables maybe strain conclusions too much (especially when insignificant vaiables are added and turn to be insignificant after anysis too, but simultaneously turn the tide in favour of parameter A). Therefore I feel that it is more correct not to fish for significant results, but to analyze together only two parameters of interest (A and B).

Thank you both for your assistance, please correct my conclusion if I'm wrong.


Fortran must die
I suggest looking at something like Gpower. If you tell it the method you want to use and other basic information like sample size it will tell you the power. You can play around with different sample size to see how this changes power.
Please, one more question, if two parameters significantly predict survival in univariate analysis, and one of them becomes significant and other insignificant in multivariate analysis can you actually say that first can predict survival independently of second??

It is obvious that they both depend on each other (have substantial degree of correlation) and first is bit stronger so it steals significance when compared together. Is it legitimate to call first one independent predictor of survival?


Fortran must die
I forget if Cox regression has the same assumptions as regular regression, been too long since I worked with it, but in terms of slopes I think so. If so if one is a signficant predictor in the regression model then it predicts independent of the other. Regression, usually, involves the prediction of the unique variance in Y explained by a given X (explained is likely not the right word since we are talking correlation not causation, but it works).


Omega Contributor
Sorry don't have time to the read old posts - so hopefully I am not being redundant!!

Well, individually each is a significant predictor, though when both entered into the model at the same time one of them no longer meets your threshold of significance. Have you examine them together outside of the model, really looked at their association or dependence? Have you seen if they interact in the model?

It is your model, you can do what you want. If you think it is still a good predictor but maybe the model is under powered you can leave it in. I believe you can run the model with and without it, and conduct a likelihood test to see if it significantly adds to the model.