# Confidence intervals for a binomial distribution

#### mogetons

##### New Member
Hi everyone,
In my thesis data set, I'm trying to sort out which years were poor and which were good based on an assortment of breeding bird metrics. The one I'm currently looking at is clutch size, where I've converted 1 and 2 egg clutches (this species never lays more) to a binomial distribution. I ran a glm that looked like this, in brief:

model1<-glm(Clsz~Year)

I then ran Tukey's comparisons from the multcomp package, and asked for confidence intervals:

BTWNyrs<-glht(model1, linfct=mcp(Year = "Tukey"))
confint(BTWNyrsMSI)
par(mfrow=c(1,1))
plot(print(confint(BTWNyrsMSI)))

Two problems I'm having: though I understand the concept of confidence intervals, I don't understand what this output is showing me. It looks to be giving some sort of interval between each pairing of years possible
View attachment 1344
How can I interpret this?

In Sigmaplot, I've made a bar chart with the means of each year, and error bars showing confidence intervals. If this was a normal distribution, these would be ok, right? However, because the data are binomial, I'm assuming these confidence intervals are not correct?
View attachment 1345

Is there a way of generating confidence intervals for each year that I can save and use to create a bar chart like the Sigmaplot one? I'd like this both for comparing non-overlapping CIs to get a sense of what's what, but also for including in my thesis.

Thanks for any help!!
Mog

#### bryangoodrich

##### Probably A Mammal
I'm not familiar with generalized linear models, but you say you are trying to sort out which years were, say, 0 = poor, 1 = good, based on (i.e., dependent on) the clutch size. Shouldn't your response be Year and your independent (predictor) variable be Clsz? Then you want Year ~ Clsz, not what you have listed.

I've only used the TukeyHSD function before, so I don't know what exactly that output is showing. However, The Tukey comparison aims at showing whether there are differences between factor level means amongst the various pairwise comparisons you can make. For instance, if you have four factors: 1, 2, 3, and 4, then Tukey will output comparisons for 2-1, 3-1, 4-1, 3-2, 4-2, 4-3. In each of those comparisons it tests if they are significantly different or not. If the interval does not contain 0, then we can conclude there is a statistically significant difference between those compared means. For instance, if the contrast (difference) between 2-1, call it L, has an interval (-3, -1), then 0 is not included. We can say that L is unlikely (at our significance level) to be 0, and the means of 2 and 1 are different. Otherwise, we conclude they are not statistically significantly different. This is sort of like a t-test for the difference of two sample means, but for factor level means. The standard errors and confidence limits will be different. Hope that helps.

#### mogetons

##### New Member
Thanks for the reply!

I'm not familiar with generalized linear models, but you say you are trying to sort out which years were, say, 0 = poor, 1 = good, based on (i.e., dependent on) the clutch size. Shouldn't your response be Year and your independent (predictor) variable be Clsz? Then you want Year ~ Clsz, not what you have listed.
Right you are. That makes a lot more sense, you're right.

I've only used the TukeyHSD function before, so I don't know what exactly that output is showing. However, The Tukey comparison aims at showing whether there are differences between factor level means amongst the various pairwise comparisons you can make. For instance, if you have four factors: 1, 2, 3, and 4, then Tukey will output comparisons for 2-1, 3-1, 4-1, 3-2, 4-2, 4-3. In each of those comparisons it tests if they are significantly different or not. If the interval does not contain 0, then we can conclude there is a statistically significant difference between those compared means. For instance, if the contrast (difference) between 2-1, call it L, has an interval (-3, -1), then 0 is not included. We can say that L is unlikely (at our significance level) to be 0, and the means of 2 and 1 are different. Otherwise, we conclude they are not statistically significantly different. This is sort of like a t-test for the difference of two sample means, but for factor level means. The standard errors and confidence limits will be different. Hope that helps.
Yes, it does help, thanks! I knew what the Tukey tests were doing, I just had NO idea what the graphic was telling me...but seeing whether it includes zero is great to know - very easy to tell at glance which pairs are significantly different and which aren't.
Thanks again!