log linear model with zero entries

#1
Hi,

I have a log linear model with 4 variables (3 dichotom and 1 ordinal). Some (let's say 5 of 2*2*2*4=32) entries are zero. These zeros random zeros, that means they are zero because there were no counts (no structural zero). I would like to know how to cope with zero entries using a log linear model.

Additionaly: Is there in R a function (maybe glm) with an option to cope with zero entries?

Thanks for any hint.

giordano
 
#3
Thanks for reply. Yes, it works but I think that r works allways. So, to make it clear what I want to know I set up a simple example. I have a 2x2 contigency table and one cell has no values. The reason is not structural but stochastic. Another sample from the same population would fill all cells.

Here is the code in R:

xdf0 <- data.frame(x1=c(0,1,1),x2=c(0,0,1),n=c(52,202,418))
(xdf1 <- rbind(xdf0,c(0,1,0)))
(xdf1$n2<-xdf1$n+0.5)
xdf1$n3<-xdf1$n+1
xdf1


> xdf1
x1 x2 n n2 n3
1 0 0 52 52.5 53
2 1 0 202 202.5 203
3 1 1 418 418.5 419
4 0 1 0 0.5 1


--------------------------------------------
Case 1: without row 4, column n
--------------------------------------------
summary(glm(n~x1*x2, data=xdf0, family=poisson))
Output:
**************
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.95124 0.13867 28.493 <2e-16 ***
x1 1.35702 0.15550 8.727 <2e-16 ***
x2 0.72721 0.08569 8.487 <2e-16 ***
x1:x2 NA NA NA NA

Null deviance: 3.2788e+02 on 2 degrees of freedom
Residual deviance: -5.1514e-14 on 0 degrees of freedom
AIC: 26.813
**************
=> This makes sense.


--------------------------------------------
Case 2: with row 4 and value 0 (column n)
--------------------------------------------
summary(glm(n~x1*x2, data=xdf1, family=poisson))
Output:
**************
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.9512 0.1387 28.493 <2e-16 ***
x1 1.3570 0.1555 8.727 <2e-16 ***
x2 -26.2538 42247.1657 -0.001 1
x1:x2 26.9810 42247.1657 0.001 1
---
Null deviance: 7.1452e+02 on 3 degrees of freedom
Residual deviance: 4.1226e-10 on 0 degrees of freedom
AIC: 28.813
**************
=> this makes sense too. estimate for x1 is the same as in case 1. x2 and x1:x2 are .....
how is the name for this kind of values and huge standard errors? singularities?

--------------------------------------------
Case 3: adding 0.5 to n (column n2)
--------------------------------------------

summary(glm(n2~x1*x2, data=xdf1, family=poisson))
Output:
**************
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.9608 0.1380 28.699 < 2e-16 ***
x1 1.3499 0.1549 8.716 < 2e-16 ***
x2 -4.6540 1.4209 -3.275 0.001056 **
x1:x2 5.3799 1.4235 3.779 0.000157 ***
---

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 7.0763e+02 on 3 degrees of freedom
Residual deviance: 9.3259e-15 on 0 degrees of freedom
AIC: Inf

Warnmeldungen:
1: In dpois(y, mu, log = TRUE) : non-integer x = 52.500000
2: In dpois(y, mu, log = TRUE) : non-integer x = 202.500000
3: In dpois(y, mu, log = TRUE) : non-integer x = 418.500000
4: In dpois(y, mu, log = TRUE) : non-integer x = 0.500000
**************

=> All estimates makes sense. But AIC gives Inf (why?) and there are warning messages which is not nice.

--------------------------------------------
Case 4: adding 1.0 to n (column n3)
--------------------------------------------
Output:
**************
summary(glm(n3~x1*x2, data=xdf1, family=poisson))

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.9703 0.1374 28.904 < 2e-16 ***
x1 1.3429 0.1543 8.706 < 2e-16 ***
x2 -3.9703 1.0094 -3.933 8.38e-05 ***
x1:x2 4.6950 1.0130 4.635 3.57e-06 ***
---

Null deviance: 7.0213e+02 on 3 degrees of freedom
Residual deviance: 2.2204e-15 on 0 degrees of freedom
AIC: 30.839
****************
=> This looks nice. But what is the pitfall?

I would despite of warnings and AIC =inf choose case 3 as model for this data.
Has anybody experience with that?
giordano
 

Masteras

TS Contributor
#4
wow, i did not read all but one was enough. Y~x1+x2 only, no interaction. first of all. then since it a simple 2x2 why not do a chi-square test (the value of which will be equal to the deviance of the glm i just said) and see?
 
#5
The chisquare test would show me (if P<0.001). But I would like to compute an association measure (something like odds ratio, the exponential of the x1*x2-coefficient). This is not possible if there is a zero. I thought using loglinear-model could cope with this.
 

Masteras

TS Contributor
#8
I made a mistake, if you have on zero you make use of the Fisher's exact test and not chi square. For sure. now for the adjustment, i did not find it. but you do something with 0.5.
 
#9
Fisher exact test is OK for a simple contigency table (2 dimension). But if you have three or more variables which need to be done by log-linear? Maybe,like you suggest, I should add 0.5.
 

Masteras

TS Contributor
#10
In that case you can do approximate Fisher's test. Done in SPSS. it is simulation tecnhique. For loglinear models R does the adjustement with function glm.