PDA

View Full Version : log linear model with zero entries



giordano
11-22-2010, 12:08 PM
Hi,

I have a log linear model with 4 variables (3 dichotom and 1 ordinal). Some (let's say 5 of 2*2*2*4=32) entries are zero. These zeros random zeros, that means they are zero because there were no counts (no structural zero). I would like to know how to cope with zero entries using a log linear model.

Additionaly: Is there in R a function (maybe glm) with an option to cope with zero entries?

Thanks for any hint.

giordano

Masteras
11-23-2010, 05:42 AM
yes, it handles zeros. I just played with it.

giordano
11-23-2010, 10:29 AM
Thanks for reply. Yes, it works but I think that r works allways. So, to make it clear what I want to know I set up a simple example. I have a 2x2 contigency table and one cell has no values. The reason is not structural but stochastic. Another sample from the same population would fill all cells.

Here is the code in R:

xdf0 <- data.frame(x1=c(0,1,1),x2=c(0,0,1),n=c(52,202,418))
(xdf1 <- rbind(xdf0,c(0,1,0)))
(xdf1$n2<-xdf1$n+0.5)
xdf1$n3<-xdf1$n+1
xdf1


> xdf1
x1 x2 n n2 n3
1 0 0 52 52.5 53
2 1 0 202 202.5 203
3 1 1 418 418.5 419
4 0 1 0 0.5 1


--------------------------------------------
Case 1: without row 4, column n
--------------------------------------------
summary(glm(n~x1*x2, data=xdf0, family=poisson))
Output:
**************
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.95124 0.13867 28.493 <2e-16 ***
x1 1.35702 0.15550 8.727 <2e-16 ***
x2 0.72721 0.08569 8.487 <2e-16 ***
x1:x2 NA NA NA NA

Null deviance: 3.2788e+02 on 2 degrees of freedom
Residual deviance: -5.1514e-14 on 0 degrees of freedom
AIC: 26.813
**************
=> This makes sense.


--------------------------------------------
Case 2: with row 4 and value 0 (column n)
--------------------------------------------
summary(glm(n~x1*x2, data=xdf1, family=poisson))
Output:
**************
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.9512 0.1387 28.493 <2e-16 ***
x1 1.3570 0.1555 8.727 <2e-16 ***
x2 -26.2538 42247.1657 -0.001 1
x1:x2 26.9810 42247.1657 0.001 1
---
Null deviance: 7.1452e+02 on 3 degrees of freedom
Residual deviance: 4.1226e-10 on 0 degrees of freedom
AIC: 28.813
**************
=> this makes sense too. estimate for x1 is the same as in case 1. x2 and x1:x2 are .....
how is the name for this kind of values and huge standard errors? singularities?

--------------------------------------------
Case 3: adding 0.5 to n (column n2)
--------------------------------------------

summary(glm(n2~x1*x2, data=xdf1, family=poisson))
Output:
**************
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.9608 0.1380 28.699 < 2e-16 ***
x1 1.3499 0.1549 8.716 < 2e-16 ***
x2 -4.6540 1.4209 -3.275 0.001056 **
x1:x2 5.3799 1.4235 3.779 0.000157 ***
---

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 7.0763e+02 on 3 degrees of freedom
Residual deviance: 9.3259e-15 on 0 degrees of freedom
AIC: Inf

Warnmeldungen:
1: In dpois(y, mu, log = TRUE) : non-integer x = 52.500000
2: In dpois(y, mu, log = TRUE) : non-integer x = 202.500000
3: In dpois(y, mu, log = TRUE) : non-integer x = 418.500000
4: In dpois(y, mu, log = TRUE) : non-integer x = 0.500000
**************

=> All estimates makes sense. But AIC gives Inf (why?) and there are warning messages which is not nice.

--------------------------------------------
Case 4: adding 1.0 to n (column n3)
--------------------------------------------
Output:
**************
summary(glm(n3~x1*x2, data=xdf1, family=poisson))

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.9703 0.1374 28.904 < 2e-16 ***
x1 1.3429 0.1543 8.706 < 2e-16 ***
x2 -3.9703 1.0094 -3.933 8.38e-05 ***
x1:x2 4.6950 1.0130 4.635 3.57e-06 ***
---

Null deviance: 7.0213e+02 on 3 degrees of freedom
Residual deviance: 2.2204e-15 on 0 degrees of freedom
AIC: 30.839
****************
=> This looks nice. But what is the pitfall?

I would despite of warnings and AIC =inf choose case 3 as model for this data.
Has anybody experience with that?
giordano

Masteras
11-23-2010, 10:53 AM
wow, i did not read all but one was enough. Y~x1+x2 only, no interaction. first of all. then since it a simple 2x2 why not do a chi-square test (the value of which will be equal to the deviance of the glm i just said) and see?

giordano
11-23-2010, 11:42 AM
The chisquare test would show me (if P<0.001). But I would like to compute an association measure (something like odds ratio, the exponential of the x1*x2-coefficient). This is not possible if there is a zero. I thought using loglinear-model could cope with this.

Masteras
11-23-2010, 02:52 PM
you can still use odds ratio. you make an adjustement add 0.5 somewhere in the formula, i do not remember now. see google for that.

giordano
11-24-2010, 01:07 AM
Thanks for reply.I did it in case 3 (+0.5) and 4 (+1). Somehow, I feel uncomfortable.

Masteras
11-24-2010, 06:14 AM
I made a mistake, if you have on zero you make use of the Fisher's exact test and not chi square. For sure. now for the adjustment, i did not find it. but you do something with 0.5.

giordano
11-24-2010, 06:58 PM
Fisher exact test is OK for a simple contigency table (2 dimension). But if you have three or more variables which need to be done by log-linear? Maybe,like you suggest, I should add 0.5.

Masteras
11-25-2010, 03:45 AM
In that case you can do approximate Fisher's test. Done in SPSS. it is simulation tecnhique. For loglinear models R does the adjustement with function glm.