Interpreting coefficients from GLM output

S

schwartzaw

Guest
#1
Hi,

I'm trying to educate myself further on using GLM and to do so I figured it'd be easiest to conduct a simple experiment where I knew the outcome so that I could see what it looked like when I ran the GLM.

To do so, I made a simple table up, consisting of a single IV and DV.

Code:
Obs	Prop	Disease
1	B	1
2	A	0
3	B	1
.
.
.
499	B	1
The basic idea is that if a person has property "A" they have a certain odds of having a disease and if they have property "B" they have a much higher odds of having the disease.

You get a table that looks like this from my made up data:

Code:
Row Labels	0	1	Grand Total
A	192	52	244
B	148	107	255
Grand Total	340	159	499
Because of the way I set it up, someone with property "A" has a 21% chance of having the imaginary disease while someone with property "B" has a 42% chance of having it. Or about 2x.

When I run the GLM, this is the result I get:

Code:
Call:
glm(formula = Disease ~ Prop, family = "binomial", data = l)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.0431  -1.0431  -0.6924   1.3179   1.7584  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.3063     0.1563  -8.356  < 2e-16 ***
PropB         0.9819     0.2013   4.876 1.08e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 624.59  on 498  degrees of freedom
Residual deviance: 599.69  on 497  degrees of freedom
AIC: 603.69

Number of Fisher Scoring iterations: 4

> exp(coef(lr))
(Intercept)       PropB 
  0.2708333   2.6694387 
>
Based on what I read online, the presence of Property B is 2.7x more likely to have disease than Propery A, right? But I know, because I set it up that way, that the effect is 2x, not close to 3x.

So, my question, what am I thinking about wrong here that causes me misinterpret the coefficient?

Thanks!
 

JesperHP

TS Contributor
#4
\( Pr(Y=1) = \frac{\exp(\beta_0 + \beta_1 x)}{1+\exp(\beta_0 + \beta_1 x)}\)


Code:
b1=-1.3063   # intercept
b2=0.9819	 # coefficent of dummy x taking value=1 if B else 0

x=1 
pB=exp(b1+b2*x)/(1+exp(b1+b2*x))
x=0
pA=exp(b1+b2*x)/(1+exp(b1+b2*x)) # which is = exp(b1)/(1+exp(b1))

pB/pA
1.97
 

hlsmith

Omega Contributor
#5
I get 2.6694387 when calculating it by hand using the contingency table values in the second code box in your first post.

This would mean the exp(beta) and the raw data give the exact same answer.
 

JesperHP

TS Contributor
#6
I get 2.6694387 when calculating it by hand using the contingency table values in the second code box in your first post.

This would mean the exp(beta) and the raw data give the exact same answer.

What are you trying to calculate? And how are you doing it? ...


I would do it like this

Code:
Row Labels	0	1	Grand Total
A                192    52    	244
B	          148    107	255
Grand Tota      340    159  	499
Centering the columns so the table is more readable (hopefully :) )...

Pr(Y=1|B)=107/255 = \( \frac{\exp(\beta_0 + \beta_1)}{1+\exp(\beta_0 + \beta_1)}\) = 42%
Pr(Y=1|A)=52/244 = \( \frac{\exp(\beta_0 )}{1+\exp( \beta_0)}\) = 21%

Pr(Y=1| B)/Pr(Y=1| A) = 2
 

hlsmith

Omega Contributor
#7
I was calculating the odds ratio. 107/148 over 52/192, with a and b being the exposure and 1 being the outcome of interest.

That is the way the model is set up.
 
S

schwartzaw

Guest
#9
Thanks for all the responses! I think I need an "explain it like I'm 5". JesperHP's result of 2, the ratio of probability as hlsmith called it, is what I was expecting the coefficient to be. Is that not what the coefficient means? Am I misunderstanding the concept of odds ratio perhaps?
 

JesperHP

TS Contributor
#10
Is that not what the coefficient means?
No that is not what the COEFFICIENT means .... because if you look at the formulas I used to calculate the probabilities entering the odds calculations I understood you to be asking for they were functions of the coefficientS not just the coefficients by itself. And actually they were also functions of the independent variable the only reason you cannot explicitly see this in the formulas is because the independent variable takes on the values 0 and 1 (see formula reply #4 where x enters explicitly ... now insert x=0 and x=1 to get formulas of my reply #6).