missing data in regression model

Hi guys,

I'm making a regression model using the glm funtion but I noticed that one level of my factors (Agriculture in Landuse) is always missing in the output of the model.

I attached the data I used and below is the code I use.
So my goal is to explain how well a combination of the Dry and Landuse (which consists of three levels; Agriculture, Pastoral and Protected) explains y (which is the distribution of a certain animal).

I followed a similar example in the Crawley R book so I know that the model should give me a result for the different types of Landuse and the combination of Landuse and Wet but as you can see, both Agriculture and Dry: LanduseAgriculture are missing!

I read somewhere that Agriculture might be absorbed in the intercept parameter but this doesn't solve my problem because now I don't have a p-value for two parameters.

Does anywone know why I don't get a result for Agriculture and Agriculture:Dry and what I can do about it?


> y<-cbind(WD_Y, WD_N)
> pWD<-split(WD,Landuse)
> pDry<-split(Dry,Landuse)
> model<-glm(y~Dry*Landuse, binomial)
> summary(model)

glm(formula = y ~ Dry * Landuse, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-3.5847 -1.2198 -0.9366 0.3011 5.6034

Estimate Std. Error z value Pr(>|z|)
(Intercept) -21.57864 1964.72039 -0.011 0.991
Dry 0.07732 9686.42731 7.98e-06 1.000
LandusePastoral 17.29671 1964.72039 0.009 0.993
LanduseProtected 20.44246 1964.72039 0.010 0.992
Dry:LandusePastoral 0.78610 9686.42732 8.12e-05 1.000
Dry:LanduseProtected -4.46714 9686.42733 -4.61e-04 1.000

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1001.17 on 244 degrees of freedom
Residual deviance: 566.87 on 239 degrees of freedom
AIC: 833.96


Ambassador to the humans
I already explained why the parameters aren't there. Why exactly do you want to the test those specific parameters? What question are you trying to answer by testing those parameters? That's the more relevant issue and if the effect you're looking to test is estimable then we can figure it out.
Because I want to know what the effect of the different types of landuse (Agriculture, pastoral and protected) is on y (the distribution of a certain animal) ánd I want to know what the combined effect is of the different types of landuse in combination with dry (which tells me something about the greenness of the vegetation).

So it is not very usefull that both of these parameters are now grouped together in the intercept...

Mike White

TS Contributor
Could it be that Agriculture is not significant in the model so it is not included in the output? The number of records for Agriculture is less that other Landuse types and the box plots and variance of Dry ~ Landuse show that Agriculture has a much lower variance for Dry that the other Landuse types.

#Agriculture    Pastoral   Protected 
#         36         160          49 

lapply(pDry, var)
#[1] 0.005513301
#[1] 0.08484931
#[1] 0.05503401


Ambassador to the humans
You still have access to any of the means you should care about. It's just that the agriculture and dry:agriculture parameter don't need to be estimated because that would be redundant. To see how to estimate a mean at a given level of Dry you would fill in the following parameters and plug in the level of dry you care about in for x

To estimate the mean for Agriculture at a given level of dry: \( \mu + Dry*x \)

To estimate the mean for Pastoral at a given level of dry: \( \mu + Pastoral + x*(Dry + Dry:pastoral) \)

To estimate the mean for Protected at a given level of dry: \( \mu + Protected + x*(Dry + Dry:protected) \)

where \( \mu \) is the intercept.

Note that the pastoral and protected parameters aren't necessarily what you might think they are. What the Pastoral parameter represents is the estimated difference between the agriculture mean and pastoral mean when Dry=0.