Hi there,

I have this (fake) example:

case yes no
a 25 1
b 30 0
c 0 20

I want to compare the percent of "yes" in the three cases. I performed a logistic regresssion (logit link function) in R:
Code: 
> mod=glm(cbind(yes,no)~case,family=binomial)
> summary(mod)

Call:
glm(formula = cbind(yes, no) ~ case, family = binomial)

Deviance Residuals: 
[1]  0  0  0

Coefficients:
             Estimate Std. Error z value Pr(>|z|)   
(Intercept)     3.219      1.020   3.156   0.0016 **
caseb          22.918  52455.363   0.000   0.9997   
casec         -28.971  52998.328  -0.001   0.9996   
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 8.1118e+01  on 2  degrees of freedom
Residual deviance: 5.2935e-10  on 0  degrees of freedom
AIC: 7.961

Number of Fisher Scoring iterations: 22
Of course, since some frequencies are either 0.0 or 1.0, the (absolute value of the) corresponding logit are hugh, which explains the estimated parameters obtained and especially their SE.

Then, of course, if I go for a multiple comparison procedure (e.g., using the function glht() of the R package multcomp), I get to this:

Code: 
> summary(glht(mod, linfct = mcp(case = "Tukey")))

         Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: glm(formula = cbind(yes, no) ~ case, family = binomial)

Linear Hypotheses:
           Estimate Std. Error z value Pr(>|z|)
b - a == 0    22.92   52455.36   0.000        1
c - a == 0   -28.97   52998.33  -0.001        1
c - b == 0   -51.89   74568.01  -0.001        1
(Adjusted p values reported -- single-step method)
And there is no significant difference at all. However, the likelihood ratio test of the global model is 81.1 on 2 df and is highly significant. So there is a problem here (the case "b" and "c" cannot be else than different!). These hugh SE for the parameters are certainly responsible for this.

How can I solve this?

More generally, the question is: How I can handle comparing percentages with a logisitic regression when some cases are at p=0.0 and/or p=1.0?

Thanks in advance for any help on this.

Cheers, Eric.