# Thread: Logistic regression logit or cloglog?

1. ## Logistic regression logit or cloglog?

Hi folks,

currently I'm performing a logistic regression and I'm not sure which link function I should use. I have a binary response variable (Dead/Alive) and ten potential explanatory variables. I fitted a full model with all variables and used a stepwise selection procedure (step in R + drop1) to find out which predictors are significant.
However, first I used a logit link, because it's the way I learned it. I found two significant predictors but the overall fit wasn't good anyway. Now I tried the complementary log-log link and after the model selection there are three significant predictors and the fit is a bit better. Now my question:
Which model should I use? Is it actually allowed to use the cloglog in my case?
And if I should use the cloglog model, how do I interpret the output? Is it the same like in logit models? I didn't find any detailed interpretation of a cloglog model.

You find my two final model outputs below!

I would be very pleased if somebody can help me.

Cheers, breez

################Logit##############

glm(formula = Dead ~ Temp + Blood, family = binomial, data = a2f)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.7571 -0.8384 -0.5626 0.7597 1.9604

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.5723 1.5344 -2.980 0.002884 **
Temp 0.2371 0.1005 2.359 0.018330 *
Blood 1.9598 0.5040 3.888 0.000101 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 176.20 on 136 degrees of freedom
Residual deviance: 148.66 on 134 degrees of freedom
(2 observations deleted due to missingness)
AIC: 154.66

Number of Fisher Scoring iterations: 4

drop1(opt3, test="Chisq")

Single term deletions

Model:
Df Deviance AIC LRT Pr(>Chi)
<none> 148.66 154.66
Temp 1 155.67 159.67 7.0164 0.008077 **
Blood 1 165.79 169.79 17.1359 3.48e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

PseudoRsqu
[1] 15.62881

#############Cloglog#############

Call:
glm(formula = Dead ~ Temp + Blood + pos, family = binomial(link = "cloglog"),
data = a2f)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.0203 -0.7112 -0.4736 0.5276 2.2022

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.52764 1.36687 -3.312 0.000925 ***
Temp 0.19745 0.08479 2.329 0.019881 *
Blood 1.33987 0.35672 3.756 0.000173 ***
posA2 0.81838 0.79122 1.034 0.300983
posD 1.03977 0.90905 1.144 0.252704
posM 0.81302 0.62389 1.303 0.192525
posS -0.19103 0.66549 -0.287 0.774073
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 176.20 on 136 degrees of freedom
Residual deviance: 138.39 on 130 degrees of freedom
(2 observations deleted due to missingness)
AIC: 152.39

Number of Fisher Scoring iterations: 7

drop1(clog2, test="Chisq")

Single term deletions

Model:
Dead ~ Temp + Blood + pos
Df Deviance AIC LRT Pr(>Chi)
<none> 138.39 152.39
Temp 1 145.54 157.54 7.1500 0.007496 **
Blood 1 151.75 163.75 13.3604 0.000257 ***
pos 4 148.21 154.21 9.8119 0.043718 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

PseudoRsqu
[1] 24.05507

2. ## Re: Logistic regression logit or cloglog?

breez86,

I stumbled across this, see page 2030 today and remembered your post. Not sure if it may help you.

http://www.math.wpi.edu/saspdf/stat/chap39.pdf

3. ## The Following User Says Thank You to hlsmith For This Useful Post:

breez86 (10-09-2012)

4. ## Re: Logistic regression logit or cloglog?

Hi hlsmith,

Thank you very much for the link! It helps already a little.

However, maybe somebody can tell me if I'm correct with my interpretaion of ß:

For example the estimate of ß for Temp is 0.19745

Therefore 1-(EXP(-EXP(-0,19745))) = 0.559929871. That means the hazard or relativ risk to die is increasing about 56% per unit increase of the temperature if all other variables were held constant?

breez86

5. ## Re: Logistic regression logit or cloglog?

Hi everyone!
I arrived here because I read (Chan, Y.H. (2005). Multinomial logistic regression. Singapore Medical Journal, 46(6), 259-270) that in ordinal regression it is recommended to use the complementary log-log link function when higher caterories are more probable. However, the author does not discuss how to interpret the model's results and only says that "there is no direct interpretation of the estimates due to the complicated nature of the link".
Therefore it would be very usefull for me to know if breez86's interpretation is correct.