+ Reply to Thread
Results 1 to 6 of 6

Thread: best modeling approach?

  1. #1
    Points: 3,522, Level: 37
    Level completed: 15%, Points required for next Level: 128

    Posts
    9
    Thanks
    3
    Thanked 0 Times in 0 Posts

    best modeling approach?




    I have a data set with about 5000 pts and trying to create a risk prediction model to predict binary outcome that does not have competing risks (alive / dead). I would like to use an automatic selection method and this is what I need advice on (like LASSO, ridge, decision tree, random forest? ... if someone wants to suggest stepwise be interested).

    - The goal is not to understand biology (eg test hypothesis and adjust for confounders) but rather to create a risk prediction model.
    - It is longitudinal data
    - Events are somewhat sparse (~ 5-10% population will have event).
    - There are probably 40-50 candidate variables, likely will satisfy proportional hazards assumption.

    Any advice on what approach / how to go about this? Also how would you validate the suggested approach.

    Really appreciate any practical guidance here.

  2. #2
    Omega Contributor
    Points: 38,423, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,005
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: best modeling approach?

    What do you plan to do with the results??

    If you just want a program to do everything for you, randomforestSRC in R seems like an option. You might want to hold out a random sample to validate it on. Not familiar with a stepwise lasso but that may be a better approach for you.
    Stop cowardice, ban guns!

  3. #3
    Points: 3,522, Level: 37
    Level completed: 15%, Points required for next Level: 128

    Posts
    9
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: best modeling approach?

    Basically the goal is to predict risk of death after an intervention.

    Thank you to the pointer to the package. I'll check it out. basically sounds like you are recommending random forest modeling.

    Thanks again. Any other ideas appreciated.

  4. #4
    Omega Contributor
    Points: 38,423, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,005
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: best modeling approach?

    Well, I was also trying to lay some undertones on my dislike for automated approaches. They do not take into account actual knowledge of the data context (e.g., example dichotomizing age as an original continuous predict and losing info) and they run the risk of over fitting data. If you only have about 40 real predictors, it may be worth it to spend a day or two doing this based on a traditional survival model approach as well.


    My hesitation comes back to how results may actually be utilized in the future.
    Stop cowardice, ban guns!

  5. The Following User Says Thank You to hlsmith For This Useful Post:

    viostorm (03-16-2016)

  6. #5
    Points: 3,522, Level: 37
    Level completed: 15%, Points required for next Level: 128

    Posts
    9
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: best modeling approach?

    Quote Originally Posted by hlsmith View Post
    Well, I was also trying to lay some undertones on my dislike for automated approaches. They do not take into account actual knowledge of the data context (e.g., example dichotomizing age as an original continuous predict and losing info) and they run the risk of over fitting data. If you only have about 40 real predictors, it may be worth it to spend a day or two doing this based on a traditional survival model approach as well.


    My hesitation comes back to how results may actually be utilized in the future.
    Agree automatic methods can certainly be problematic, however this seems to be a reasonable problem for this. Consideration here is a no one will use a risk algorithm that requires 40 variables, it just isn't practical. So a parsimonious model really has some value from a practical standpoint.

    So after quite a bit of research, I think Bayesian model averaging (https://cran.r-project.org/web/packages/BMA/BMA.pdf) may be the best here. It seems also to select the same variables as LASSO from what I read online .

    Thanks again ... may try random forest modeling but I don't have a ton of experience there but certainly is cool.

  7. #6
    Omega Contributor
    Points: 38,423, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,005
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: best modeling approach?


    So you found a BMA model for survival? What program and what does it compare based on, accuracy?

    Thanks for the update!
    Stop cowardice, ban guns!

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats