Chi-squared test


New Member
I think I'm right in thinking I should use a chi-squared contingency table test here because the data is nominal, but it seems to over-simplify my results and doesn't really test my hypothesis.

Basically, for my final year biology project i artificially pollinated some flowers under three different conditions. I want to find out if there is a significant difference between the frequencies of successfully fertilized flowers under the different conditions. The null hypothesis is that there is no difference at all; a 100% success rate in all flowers.

My observed value table looks like this:
Condition A -------11-----------------37
Condition B ------44 -----------------4
Condition C ------45 -----------------3

So I guess my expected value table would be:
Condition A -------48-----------------0
Condition B ------48 -----------------0
Condition C ------48 -----------------0

But i read somewhere that expected values can never be less than 1?

so i used the normal way of calculating expected values (as Minitab does) and I got a significant result...

but clearly the frequencies in condition B and condition C are NOT significantly different from each other, and the overall result is being swayed by condition A.

So (yup, there's more!) I was advised to break the test down and do A v.B, B v. C, A v. C, but i'm aware that this is a pretty crude way to carry out the test. Plus I'm still not sure it's really gonna tell me what I want to know...

Is there some way I could find out if the frequencies are significantly different from each other?

ANY help would be much appreciated!



New Member

I would definitely agree with your intuition.

The chi-square test just gives a probability that the 3 distributions are not the same. And your insight clearly came to the right conclusion: condition b and c are not different but condition a is significantly different from conditions b and c. The percentage successful by the way are: 23%, 92%, and 94%: this way of summarizing makes it a little more clear just how different the 3 groups are (except there is no sample size when you summarize it this way).

When you write your report it would be good to state at what significance level the difference is: ie value from the chi-square test.

It is almost just academic (oops, I forgot, you are in the academy ::D ) to even do more analysis, but some sort of multiple comparison could be done: Tukey's test etc. Do this with some stat software package or look up some stat book on the formula's. You could do the pair analysis you are thinking about but (that's why there is statistics) you won't know the exact significant level with out using Tukey's etc. method; the concept is: if you do enough comparisons you are bound to find some significant when they really are not.

So if you want to be picking roses every day use method b or c but if you are a little more lazy use method a :) .


New Member
thanks very much for the reply,

I've looked into the Tukey test, but i always thought it had to be carried out on means? my data is in frequencies of successful and unsuccessful pollination so there isn't a mean...

I've now tried the pair analysis using the chi squared tests i mentioned before, i understand what you're saying about how using three tests is going to increase the probability of getting a significant result...could i overcome this by only declaring a result as significant if the P value is, say <0.01 rather than the usual <0.05?

be a lot happier once this report is done and dusted!


New Member
Yes, I now reviewed the Tukey test; yes it needs to be done on averages.

You said that you did the Chi-square test on pairs now. So you must see just how significant A versus B is for ex. I did the calculations and the p value went off my chart; prob is less than 0.002; ie highly significant (people often consider less than 0.05 as significantly different). So your approach is very good. Eventhough you don't know how to modify the p values slightly bec of multiple comparisons you know that A vs B and A vs C is still going to be highly significant; of couse, this seems quite obvious from the data anyway as I mentioned before. Also, I am sure you will find that B vs C is not significantly different.

By the way, your data seems quite nice; it is so easy to get lousy data and you spend page after page speculating on this and speculating on that; calculating it this way and calculating it that way.

I hope in your paper you are going to offer an explanation of why condition A is so much worse.

Best of success on your paper.