+ Reply to Thread
Results 1 to 2 of 2

Thread: Regression analysis with multicollinearity

  1. #1

    Regression analysis with multicollinearity




    I am using jmp software for analysis...

    I am looking at prognostic factors that influence survival of patients with a specific type of cancer.

    Background: younger patients, radiation, chemo, and complete resection (vs. partial resection) of the tumor are known positive prognostic factors.

    I am trying to determine if my identified prognostic factor (not any of the above) increases survival in these patients. The only problem is that the patients with my prognostic factor tend to be younger, and more likely to undergo radiation, chemo, and complete resection.

    I have tried stepwise analysis, but it only shows age and my prognostic factor to be significant. I wanted it to include these known prognostic factors as well.

    Interestingly, I noticed that when I make my dependent variable "survival in days" ordinal instead of continuous, I get the model to show all of these things to be significant.

    I am wondering if there's another way I could validate my prognostic factor besides stepwise analysis or if there's a way I could modify my stepwise analysis to get more things to be significant.

    I am also wondering if it is kosher to have survival in days as an ordinal rather than continuous variable.

    Any help would be greatly appreciated!

  2. #2
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Regression analysis with multicollinearity


    Hi there,

    1) Stepwise regression is a critically flawed method. You really shouldn't ever use it. See http://onlinelibrary.wiley.com/doi/1...6.01141.x/full

    or if there's a way I could modify my stepwise analysis to get more things to be significant.
    2) Try to avoid playing with your data and methods to try and get the most significant results. This is called "p hacking" or data dredging, and it's a questionable (i.e., unethical) scientific practice (even though I'm sure your intentions are good).

    You could select predictors based on pre-existing/theoretical knowledge. Or if you really want to select predictors based on the data at hand, you could try something a bit more principled like Bayesian variable selection.

    I am also wondering if it is kosher to have survival in days as an ordinal rather than continuous variable.
    Number of days is a ratio variable - I can't see any good reason to treat it as ordinal(?)

    The only problem is that the patients with my prognostic factor tend to be younger, and more likely to undergo radiation, chemo, and complete resection.
    So you could control for these potential confounds, then? I wouldn't worry too much about multicollinearity unless there are very strong correlations between these variables. The presence of some correlations between the predictors really isn't a major issue - it won't make the analysis untrustworthy in and of itself, it'll just make the standard errors bigger.

  3. The Following User Says Thank You to CowboyBear For This Useful Post:

    statsanon (12-06-2015)

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats