Thanks for reply. Yes, it works but I think that r works allways. So, to make it clear what I want to know I set up a simple example. I have a 2x2 contigency table and one cell has no values. The reason is not structural but stochastic. Another sample from the same population would fill all cells.
Here is the code in R:
xdf0 <- data.frame(x1=c(0,1,1),x2=c(0,0,1),n=c(52,202,418))
(xdf1 <- rbind(xdf0,c(0,1,0)))
(xdf1$n2<-xdf1$n+0.5)
xdf1$n3<-xdf1$n+1
xdf1
> xdf1
x1 x2 n n2 n3
1 0 0 52 52.5 53
2 1 0 202 202.5 203
3 1 1 418 418.5 419
4 0 1 0 0.5 1
--------------------------------------------
Case 1: without row 4, column n
--------------------------------------------
summary(glm(n~x1*x2, data=xdf0, family=poisson))
Output:
**************
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.95124 0.13867 28.493 <2e-16 ***
x1 1.35702 0.15550 8.727 <2e-16 ***
x2 0.72721 0.08569 8.487 <2e-16 ***
x1:x2 NA NA NA NA
Null deviance: 3.2788e+02 on 2 degrees of freedom
Residual deviance: -5.1514e-14 on 0 degrees of freedom
AIC: 26.813
**************
=> This makes sense.
--------------------------------------------
Case 2: with row 4 and value 0 (column n)
--------------------------------------------
summary(glm(n~x1*x2, data=xdf1, family=poisson))
Output:
**************
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.9512 0.1387 28.493 <2e-16 ***
x1 1.3570 0.1555 8.727 <2e-16 ***
x2 -26.2538 42247.1657 -0.001 1
x1:x2 26.9810 42247.1657 0.001 1
---
Null deviance: 7.1452e+02 on 3 degrees of freedom
Residual deviance: 4.1226e-10 on 0 degrees of freedom
AIC: 28.813
**************
=> this makes sense too. estimate for x1 is the same as in case 1. x2 and x1:x2 are .....
how is the name for this kind of values and huge standard errors? singularities?
--------------------------------------------
Case 3: adding 0.5 to n (column n2)
--------------------------------------------
summary(glm(n2~x1*x2, data=xdf1, family=poisson))
Output:
**************
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.9608 0.1380 28.699 < 2e-16 ***
x1 1.3499 0.1549 8.716 < 2e-16 ***
x2 -4.6540 1.4209 -3.275 0.001056 **
x1:x2 5.3799 1.4235 3.779 0.000157 ***
---
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 7.0763e+02 on 3 degrees of freedom
Residual deviance: 9.3259e-15 on 0 degrees of freedom
AIC: Inf
Warnmeldungen:
1: In dpois(y, mu, log = TRUE) : non-integer x = 52.500000
2: In dpois(y, mu, log = TRUE) : non-integer x = 202.500000
3: In dpois(y, mu, log = TRUE) : non-integer x = 418.500000
4: In dpois(y, mu, log = TRUE) : non-integer x = 0.500000
**************
=> All estimates makes sense. But AIC gives Inf (why?) and there are warning messages which is not nice.
--------------------------------------------
Case 4: adding 1.0 to n (column n3)
--------------------------------------------
Output:
**************
summary(glm(n3~x1*x2, data=xdf1, family=poisson))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.9703 0.1374 28.904 < 2e-16 ***
x1 1.3429 0.1543 8.706 < 2e-16 ***
x2 -3.9703 1.0094 -3.933 8.38e-05 ***
x1:x2 4.6950 1.0130 4.635 3.57e-06 ***
---
Null deviance: 7.0213e+02 on 3 degrees of freedom
Residual deviance: 2.2204e-15 on 0 degrees of freedom
AIC: 30.839
****************
=> This looks nice. But what is the pitfall?
I would despite of warnings and AIC =inf choose case 3 as model for this data.
Has anybody experience with that?
giordano