+ Reply to Thread
Results 1 to 10 of 10

Thread: glm for risk ratio, endless loop

  1. #1
    Points: 29, Level: 1
    Level completed: 58%, Points required for next Level: 21

    Posts
    4
    Thanks
    5
    Thanked 0 Times in 0 Posts

    glm for risk ratio, endless loop



    Hi all,

    Thanks in advance for helping me with my problem

    Can someone please let me know how I can stop my GLM command from stalling?

    The following commmand works great:
    glm hasdied age sex hypertension smoker, fam(bin) link(log) nolog eform

    The above works great and spits out a risk ratio. However, whenever I add one more variable, it endlessly loops and never returns a value:
    glm hasdied age sex hypertension smoker apachescore, fam(bin) link(log) nolog eform

    Does anyone know why and how I can fix it? None of the variables have any missing data.

    Cheers,

    Pete

  2. #2
    TS Contributor
    Points: 4,629, Level: 43
    Level completed: 40%, Points required for next Level: 121
    jpkelley's Avatar
    Location
    Vancouver, BC, Canada
    Posts
    439
    Thanks
    17
    Thanked 88 Times in 83 Posts

    Re: glm for risk ratio, endless loop

    Sorry, I'm an R user. Is this SAS or STATA code for the GLM?

    **Edit: I'm sorry to everyone here for apologizing for being an R user. I have no regrets about my affair with R, that wicked mistress.
    Last edited by jpkelley; 10-28-2012 at 08:16 PM. Reason: regret

  3. The Following User Says Thank You to jpkelley For This Useful Post:

    petedigital (10-30-2012)

  4. #3
    Points: 29, Level: 1
    Level completed: 58%, Points required for next Level: 21

    Posts
    4
    Thanks
    5
    Thanked 0 Times in 0 Posts

    Re: glm for risk ratio, endless loop

    Thanks for the reply I'm using STATA. I'd be happy to try it in R if you could show me the commands?

    Quote Originally Posted by jpkelley View Post
    Sorry, I'm an R user. Is this SAS or STATA code for the GLM?

    **Edit: I'm sorry to everyone here for apologizing for being an R user. I have no regrets about my affair with R, that wicked mistress.

  5. #4
    Banned
    Points: 3,519, Level: 37
    Level completed: 13%, Points required for next Level: 131
    GretaGarbo's Avatar
    Posts
    419
    Thanks
    128
    Thanked 139 Times in 122 Posts

    Re: glm for risk ratio, endless loop

    Are you using a log link? If this is about binomial distribution, the standard link is the logit link. Another common link is the probit link.

    I also thought if there could be multicolinearity. Check if apachescore is linearly dependent of the others. (What is that by the way?)

    Try to run it as a linear regression model. Inappropriate, but the estimates will be done in one step and no iterations.

  6. The Following User Says Thank You to GretaGarbo For This Useful Post:

    petedigital (10-30-2012)

  7. #5
    Points: 29, Level: 1
    Level completed: 58%, Points required for next Level: 21

    Posts
    4
    Thanks
    5
    Thanked 0 Times in 0 Posts

    Re: glm for risk ratio, endless loop

    Thanks so much for the reply GretaGarbo

    I'm using a log link as it's the only one that spits out a risk ratio (all the others seem to spit out an odds ratio). I'd really like to keep the risk ratio. Incidentally it DOES work (both with and without apachescore) in 'logit' link. Why would that be?

    In order to check for linearity I've output a correlations table.

    | hasdied age male hypertsn smoker apacheescore
    -------------+------------------------------------------------------
    hasdied | 1.0000
    age | 0.0245 1.0000
    male | -0.0077 0.0315 1.0000
    hyperten~n | 0.0195 0.3789 0.0254 1.0000
    smoker | -0.0269 -0.1831 0.0406 -0.0888 1.0000
    apachescore | 0.3728 0.3089 -0.0236 0.0753 -0.0799 1.0000

    It doesn't look like there is any huge association between any of the independent variable (hasdied) and the dependent variables. Also, the correlations are reflected in the risk ratio table (i.e. being male is associated with slightly less haddied events, even if statistically insignificant).

    APACHE is a score that represents how sick a patient is. It is made up of many variables, one of which is age.

    I've tried a vanilla regression (regress hasdied age male hypertension smoker apachescore, beta). It works fine.

    I still don't understand why it works without apachescore, but as soon as I add it, it endlessly loops. Why is it doing it and how can I get it to work?

  8. #6
    Test of Gnomality
    Points: 8,289, Level: 61
    Level completed: 47%, Points required for next Level: 161
    hlsmith's Avatar
    Posts
    1,512
    Thanks
    98
    Thanked 255 Times in 248 Posts

    Re: glm for risk ratio, endless loop

    Use the basic linear regression but request the TIF or Tolerance to examine for collinearity. I remember reading "it's quite possible to have data in which no pair of variables has a high correlation, but several variables together may be highly interdependent" this making correlation matrices inept - [book source - Logistic Regression Using SAS].

  9. #7
    Test of Gnomality
    Points: 8,289, Level: 61
    Level completed: 47%, Points required for next Level: 161
    hlsmith's Avatar
    Posts
    1,512
    Thanks
    98
    Thanked 255 Times in 248 Posts

    Re: glm for risk ratio, endless loop

    Also, since it seems you may be using retrospective healthcare data, I thought it may be proper to remind or check with you that with retrospective data you use Odds Ratios (since you cannot calculate incidence) and with prospective data you use Relative Risk. You may already know this but your description kind of seemed like the data could be retrospective.

  10. The Following User Says Thank You to hlsmith For This Useful Post:

    petedigital (10-31-2012)

  11. #8
    Banned
    Points: 3,519, Level: 37
    Level completed: 13%, Points required for next Level: 131
    GretaGarbo's Avatar
    Posts
    419
    Thanks
    128
    Thanked 139 Times in 122 Posts

    Re: glm for risk ratio, endless loop

    Try to make the model so that it fits to the data. Your model is supposed to describe the reality. If it fits it (might) describe reality. If it does not fit it has nothing to do with reality.

    Check with various measures of goodness-of-fit. I don't know (much) about stata. Suggestions from other users might be welcome.

    There can be multicolinearity although all the variables are uncorrelated (just like hlsmith said above). Remember the “dummy variable trap” where each group member is coded as dummy variable but together with the intercept they add up to one number. If the “apachescore” is made up among some of the other explanatory variable the it will to some extent be co-linear. Hlsmith:s suggestions above is better but a very crude way of checking is to do a regression with apachescore as dependent and the other as explanatory variables. If the R2 is high there is multicolinearity.

    (About “apachescore”: Isn't Apache an indian people – a nation - in southern USA and Mexico? I realise that (I looked it up) that it is an abbreviation of some kind of disease. But if the disease had been the name of your country? “He is really sick his canada-score/america-score/mexico-score is 3.2”. Wouldn't that be an insult? Do the Apaches like the name of that disease? Sorry, I don't mean to be impolite but I am just trying to friendly tell you my reaction as an outsider to that name. )

    If the model does not even converge there is probably a severe problem with that model.

    If log(p/(1-p)) fits then maybe you can derive log(p) as meaningful since log(p/(1-p)) = log(p) -log(1-p) and insert the model estimates here in that formula. It would be interesting to hear others suggestions about what is meaningful and customary in this case.

    To use a model that does not fit, just because the software spits out something is not so meaningful. It is more meaningful to use something that fits and derive something more complicated.

  12. The Following User Says Thank You to GretaGarbo For This Useful Post:

    petedigital (10-31-2012)

  13. #9
    TS Contributor
    Points: 4,629, Level: 43
    Level completed: 40%, Points required for next Level: 121
    jpkelley's Avatar
    Location
    Vancouver, BC, Canada
    Posts
    439
    Thanks
    17
    Thanked 88 Times in 83 Posts

    Re: glm for risk ratio, endless loop

    To use a model that does not fit, just because the software spits out something is not so meaningful.
    This is definitely true. Good somebody said this explicitly. An all important first step is to understand (and resign yourself) the bounds of your data. The logit or probit link functions are definitely want you want. I would avoid playing with other models unless the sole purpose of this is to diagnose some problem specific to the program you're using. In any case, don't worry about the odds ratio; you can easily convert the odds ratio to a risk ratio for easier interpretation. I would go this route.

    I would suggest the R route, but some people have a difficult time handling their new R-based lifestyle.

  14. The Following User Says Thank You to jpkelley For This Useful Post:

    petedigital (10-31-2012)

  15. #10
    Points: 29, Level: 1
    Level completed: 58%, Points required for next Level: 21

    Posts
    4
    Thanks
    5
    Thanked 0 Times in 0 Posts

    Re: glm for risk ratio, endless loop


    Thanks so much for all the help so far. I feel I'm getting a better understanding of everything.

    Re: collinearity. I feel as if there is no significant collinearity. I've checked the correlations matrix and there are no significant correlations (all < 0.8). I've checked tolerance and variance inflation factor. All VIF is ~ 1 (i.e. definitely < 10). See output below:

    . regress hasdied age male hypertension smoker apachescore, beta

    --------------------------------------------------------------------------------
    hasdied | Coef. Std. Err. t P>|t| Beta
    ---------------+----------------------------------------------------------------
    age | -.0019705 .0002108 -9.35 0.000 -.1150722
    male | .0029216 .0060244 0.48 0.628 .0052142
    phhypertension | .0167842 .0062177 2.70 0.007 .0313368
    phsmoker | -.0091626 .0076796 -1.19 0.233 -.0130405
    apachescore | .0151896 .0004238 35.85 0.000 .405101
    _cons | -.0622508 .0132751 -4.69 0.000 .
    --------------------------------------------------------------------------------

    . vif

    Variable | VIF 1/VIF
    -------------+----------------------
    age | 1.32 0.759395
    hyperten~n | 1.17 0.853888
    apachescore | 1.11 0.900976
    smoker | 1.04 0.963262
    male | 1.00 0.995442
    -------------+----------------------
    Mean VIF | 1.13


    Re: fit of model. I think it's a good fit as it makes sense clinically plus the Hosmer and Lemeshow = ~0.9 and R2 = ~0.25. AUROC of the model = 0.77. I also tried using apachescore as my dependent and all the rest as independent. The R2 remained similar, i.e. ~0.25.

    Re: risk ratios. Yes, I agree - odds ratios should be fine for a retrospective study. My supervisor wants risk ratios unfortunately

    Re: APACHE score. Yes, you're right. I'd never thought of that before but I suppose some might be offended - although it's an acronym and nothing to do with Native American Indians. I didn't make it up. It's a standard scoring system for hospitals all over the world. Perhaps I could call it, sickness score?

    I'm still struggling with:
    - Can I rule out collinearity given my results above?
    - Why can't I get the model to work with apachescore? (Even more puzzling is when I was experimenting, the following works: "glm hasdied apachescore age, fam(bin) link(log) nolog eform" but the following does NOT "glm hasdied apachescore age sex, fam(bin) link(log) nolog eform")
    - Does this suggest something wrong with my model, even though it seems to fit well? Or does it only point to a problem with adding apache score as a variable since it works well without it?
    - Why does logit converge without a problem, it's only when I try and get risk ratios using log?
    - Should I just give up, use logit, then convert those odds ratios? How do I convert?

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats