Chi square test of independence - test of difference or association?

#1
Hi everyone,

I understand chi square tests of independence, and have read the FAQ on chi square tests. What confuses me somewhat is the use of chi square in some instances.

I have always understood a chi square test of independence as being a test of an association -- whether that association is a significant one or not. Or in other words, whether the two variables are found to be independent (i.e. not associated) or dependent (i.e. associated).

And I have always understood that one- and two-sample hypothesis testing involves testing a difference, doing either a Z-test or t-test (depending on the sample size and whether sigma is known or unknown). Therefore, we are testing for if there is a statistically significant difference between a sample value and a population value (one sample test) or if there is a statistically significant difference between two sample values (two sample test).

My confusion arises in that some sources seem to suggest that Chi square tests of independence can allow us to test for a significant difference. For example, let's say we're looking at the following two variables: gender (male or female) and voting preference (democrat or republican). And we want to know if there is a statistically significant difference between the number of women vs. men who vote democrat. I would think you would do a two-sample hypothesis test with sample proportions (males as one sample, females as a second sample), and test for a significant difference in the proportion of men vs women who vote democrat.

But some sources I've read seem to suggest a Chi square test could be done, and we could have a 2x2 bivariate table which includes gender (male or female) or voting preference (democrat or republican). But doesn't chi square test for a significant association between the two variables, not for a difference? Or can we conclude that anyway, that a significant association suggests a difference between males and females? I know that chi square tests for the difference between observed vs. expected frequencies - an indirect test of the association between the variables.

Wouldn't it be more apt to do a two-sample hypothesis test? Particularly, if we were specifying a direction, and we wanted to know if women are more likely to vote democrat, then we would have to do a two-sample hypothesis test, no?

Textbooks I read characterize the chi square test of independence as a test for an association.

I am hoping I am making any sense at all. If anyone can help, I would be greatly appreciative.

Thanks,
Frodo
 

gianmarco

TS Contributor
#2
Hello,
I think you got the picture right, and that you are just missing a nuance of the same issue.
As you correctly state, Chi-sq test allows you to formally assess if there is a significant (categorical) association between two categorical variables, say gender and party (to keep with your example). If the test returns a significant p-value, you can say that the two variables are not independent from one another, AND THEREFORE party-preference is distributed differently across gender (i.e, among male and female voters). I see no 'contraddiction'.

Hope this helps
Gm
 
#3
Hello,
I think you got the picture right, and that you are just missing a nuance of the same issue.
As you correctly state, Chi-sq test allows you to formally assess if there is a significant (categorical) association between two categorical variables, say gender and party (to keep with your example). If the test returns a significant p-value, you can say that the two variables are not independent from one another, AND THEREFORE party-preference is distributed differently across gender (i.e, among male and female voters). I see no 'contraddiction'.

Hope this helps
Gm
Hi Gm,

That helps considerably, thank you. Would it be correct to say, however, that simply doing a two-sample hypothesis test with proportions is the more appropriate statistical test to do in order to answer the original question of whether there is a difference in the number of women vs. men whose voting preference is democrat?

In particular, if the question specified a direction to the test, and wanted to know if women were MORE likely to vote democrat (or if men were LESS likely to vote democrat), it would seem to me that we have no choice but to do a two-sample hypothesis test.

Would it also be safe to argue that since Chi square is only an indirect test of association (as it is testing for the difference between expected versus observed frequencies), that a two-sample hypothesis test is a more direct, and thus more appropriate, test in this case?

Thanks again for your help. I appreciate it.

Best,
Frodo
 

Karabiner

TS Contributor
#4
In particular, if the question specified a direction to the test, and wanted to know if women were MORE likely to vote democrat (or if men were LESS likely to vote democrat), it would seem to me that we have no choice but to do a two-sample hypothesis test.
What do you mean by this? A 2x2 table with Chi² test is a two-sample-test.
To know the direction of the effect, you simply inspect the proportions of
democrat voters within the female groups, compared to the proportion within
the male group.

With kind regards

K.
 
#5
What do you mean by this? A 2x2 table with Chi² test is a two-sample-test.
To know the direction of the effect, you simply inspect the proportions of
democrat voters within the female groups, compared to the proportion within
the male group.

With kind regards

K.
Hi K.,

Thanks for your answer. What I meant by my response was that in doing a two-sample hypothesis test, we can actually specify a direction within the test itself. We do this by placing all of our critical region on one side of the sampling distribution. Our alternate hypothesis, rather than simply stating that there is a difference in the proportion of men vs. women who support democrats, would dictate that we expect the population of women to have a higher proportion of those who will vote democrat. This is what I have learned as a one-tail test.

From my understanding, we couldn't do this with Chi Square, and as you mention, we would have to instead inspect the proportions or percentages of democrat voters within the female group compared to the male group.

Am I wrong in this? I could still be misunderstanding the purpose and utility of Chi Square. Thanks for your help.

Best,
Frodo
 

gianmarco

TS Contributor
#6
Hello,
I still do not totally get your concern with the chi-sq test.

What ch-sq rest actually tells you is whether or not there is a significant association between two cross-tabulated categorical variables. So, to keep with your example, our research question would be: is there any association between GENDER and VOTING for a given political party? Of course, and by extension, if there is a significant dependence, being in one of the two levels of GENDER (e.g., being male) implies tending to be in one of the two voting categories (e.g., being a republican voting person). This of course implies that, should a dependence exist, the proportion of males among republican would not be the same relative to the proportion of males voting democratic.

That said, when you analyze a contingency table you may want:
1) to assess if a dependency exist
2) to measure the size of that dependency
3) understand the "direction" of the association between levels of the two categorical variables being compared.

(1) is accomplished via the chi'sq test, which does not tell you how "strong" is the dependence (i.e., the "correlation" between the two variables);
(2) is accomplished using different association coefficients; there are a nnumber of them available, according to the size of the table and according to other considerations;
(3) is accomplished via different approach: one could be comparing percentages (which seems the method you would prefer); yet another one (which, in my opinion is more fit to the logic of chi-square test) is analyzing the table of standardized residuals. The residual (for each table's cell) is the difference between the observed count and the count you would expect under the hypothesis of independence. The residuals are standardized in order to have mean 0 and SD 1. A residual whose absolute value is larger than 1.96 indicates that that cell significantly deviates from the Null Hypothesis. The sing accompayning each residual indicates the direction of that "deviation": let's assume that the standardized residual related to the cross-tabulation of MALE vs REPUBLICAN is +2.00; this would indicate that there is a "positive" association between MALE voters and REPUBLICAN party, that is there is a larger than expected frequency of males among republican voters. In other words, males tends to vote republican party more frequently. In that situation, you may find out that the standardized residual for FEMALE would be -2.10, indicating that females tends to vote less frequently for republican.


Hope this helps.

You may want to give a read to a nice (old) little book:
Reynolds, "Analysis of Nominal Data", SAGE University Paper 7, 1984
 
#7
Hi Gianmarco,

Thank you very much for your post and for taking the time to respond and to write out your explanation, it is greatly appreciated.

I suppose I should start by saying that I'm pretty sure I understand what are you saying: that if we are looking at the association between two variables, such as GENDER and VOTING in this case, we might want to know about whether (1) the association exists, (2) the strength (or "size" as you say) of the association, and (3) the "direction" of the association. However, I would add, that since we're dealing with nominal level variables, we can't really speak about "direction", only pattern, as we can't rank or order the scores or categories of nominal level variables.

I also understand that Chi Square would only test the association for significance - that is, that it exists in the population. I know there are some different options for testing strength (Phi coefficient for 2x2 table in this case, or Lambda). And I thank you for explaining some other options aside from comparing percentages in order to look at the pattern in the data - this was helpful.

My concern from my last post specifically pertained to the original question, which is asking us to test for a statistically significant "difference" in the proportion of males vs. females who vote democrat (or republican, or whatever we may be interested in). If that is the question we are trying to answer, why bother with a Chi Square test? Why not just do a two-sample hypothesis test (to test for the significance of the difference between the two sample proportions)? Isn't that a whole lot easier, and more directly answers our question?

The test statistic formula for our significance test would be, for large samples, that Z(obtained) = Ps1 - Ps2 / standard deviation of the sampling distribution of the difference in sample proportions

with

Ps1 = the sample proportion for men
Ps2 = the sample proportion for women

Then, if the test statistic falls into the critical region, or we obtain a significant p-value, then we can say that there is a statistically significant difference

If I'm still not being clear enough, I am truly sorry. I thank you all for your help and patience.

Best,
Frodo
 

gianmarco

TS Contributor
#8
Hello,
I think we are spiralling around the same issue over and over again.
Bottom line:
the test of proportion and the chi-sq test actually address two different questions. Pick up what is more suitable to your research question.

By the way:
However, I would add, that since we're dealing with nominal level variables, we can't really speak about "direction", only pattern, as we can't rank or order the scores or categories of nominal level variables.
I was referring to the direction of the difference between observed and expected counts, in the context of standardized residuals. To my mind, positive vs. negative values do indicate a difference in "direction".


Best
Gm
 
#9
Hello,
I think we are spiralling around the same issue over and over again.
Bottom line:
the test of proportion and the chi-sq test actually address two different questions. Pick up what is more suitable to your research question.
Hi Gm,

This is precisely what I was trying to get at -- that the two-sample test of proportions is the test, in my mind, more suitable for the research question (i.e. wanting to know if there is a statistically significant difference between males vs. females who vote democrat [or republican]).

Because, as you mentioned, a Chi Square test of independence addresses a different question. I have always thought (perhaps incorrectly) that the Chi Square test is one that tests for whether there is a significant association (i.e. dependence) between two variables (NOT whether there is a significant difference between two samples).

This brings me back to my original post/question that started this thread -- confusion around what Chi Square actually tests.

The wonderful help in this thread has seemed to suggest that Chi Square and two-sample test of proportions would be interchangeable in this case, and so I'm still left somewhat confused. Maybe someone would be so kind as to briefly make plainly clear the difference.

For example, earlier you mentioned:

I think you got the picture right, and that you are just missing a nuance of the same issue.
As you correctly state, Chi-sq test allows you to formally assess if there is a significant (categorical) association between two categorical variables, say gender and party (to keep with your example). If the test returns a significant p-value, you can say that the two variables are not independent from one another, AND THEREFORE party-preference is distributed differently across gender (i.e, among male and female voters). I see no 'contradiction'.
Does this mean then: 1) Chi square could be used here to test for a significant difference in party-preference for men vs women, but that 2) a test of difference in proportions would be more suitable?

One day I will fully understand Chi Square. Thank you again for your help, Gm. I am sorry if I have been frustratingly dense. Please know that I truly appreciate it, and wouldn't blame you if you didn't want to help any further.

Best,
Frodo
 
#10
Maybe this is how it is:

If my research question were: "Is there a significant relationship/association between GENDER (male or female) and VOTING PREFERENCE (democrat or republican)?"

... then I would conduct a Chi Square test of independence, to test for if the two variables are statistically significantly associated/related/dependent. But, should a dependence/association exist, this consequently does tell me that the proportion of males voting democrat/republican is not the same as the proportion of females voting democrat/republican, and so therefore there is a difference in voting by gender (men and women are significantly different in terms of voting preference). Right?

If my research question were: "Is there a significant difference between the proportion of MALES versus the proportion of FEMALES who vote Democrat [or Republican]?"

... then I would conduct a Two-sample test for difference in proportions, in order to test for if there is a statistically significance difference in the proportion of men vs. women who vote for a particular party. This directly tests for a significant difference (right? Or I could have done a Chi Square test too?)

Does this mean then the two tests are interchangeable (or sometimes interchangeable; interchangeable with the second research question, but not the first)? I'm not sure why this is so hard for me.
 
Last edited:
#11
I think I found some clarity from yet another stats textbook I got my hands on. From the text:


  • "In fact, the chi-squared test of independence is equivalent to a test for equality of two population proportions. Section 7.2 presented a z test statistic for this, based on dividing the difference of sample proportions by its standard error ... The chi-squared statistic relates to this z statistic by X^2 = z^2."

    "For a 2x2 table, why should we ever do a z test if we can get the same result with chi-squared? An advantage of the z test is that it also applies with one-sided alternative hypotheses ... The direction of the effect is lost in squaring z and using X^2."

This last point is the one I was trying to ask about earlier when I mentioned that in doing a two-sample hypothesis test, we can actually specify a direction within the test itself. We can't do that with chi square. Thus, for example, if I wanted to know if women are MORE LIKELY to vote democrat than men (a one-tail or one-sided test), a two-sample z test helps me do this.

The textbook goes on to say that we need chi-squared for larger tables than 2x2, as we then have more than one comparison: "we could use a z statistic for each comparison, but not a single z statistic for the overall test of independence".
 

rogojel

TS Contributor
#12
That said, when you analyze a contingency table you may want:
1) to assess if a dependency exist
2) to measure the size of that dependency
3) understand the "direction" of the association between levels of the two categorical variables being compared.


You may want to give a read to a nice (old) little book:
Reynolds, "Analysis of Nominal Data", SAGE University Paper 7, 1984
Hi GM,
I think, as opposed to know, that all the points above could easily be achieved by using a logistic regression with discrete factors only. What do you think?
 

rogojel

TS Contributor
#13
Quick update: I tried this out with a dataset and it actually works very well: using logistic regression I get all the answers to the above questions plus as a bonus the posibility to model and predict probabilities for each value of the factor. So, to me, the question is now why would anyone use chi-squared at all?

regards
 
#14
Quick update: I tried this out with a dataset and it actually works very well: using logistic regression I get all the answers to the above questions plus as a bonus the posibility to model and predict probabilities for each value of the factor. So, to me, the question is now why would anyone use chi-squared at all?

regards
Hi rogo,

Thank you so much for your post.

My answer would probably be: because chi square is easier and will answer my research question. If, instead, my research question involved wanting to model the determinants of and predict the likelihood of an outcome (i.e. to make predictions), then logistic regression sounds appropriate. But I'm not sure why I would want to bother with that otherwise.

And certainly at my level of statistics, where we are focused on bivariate association, chi square would be highlighted as the most appropriate technique, because we are dealing with two nominal/categorical variables. We are taught to use the test most appropriate for the level of measurement. Of course, if my question is simply the one looking to test for a difference between males vs. females, then to be honest, if I wasn't on here and wasn't getting any help, I would have just done a two-sample test of difference of proportions. I wouldn't have even done chi square, because the question is just asking to test for a difference -- nothing else.

Best,
Frodo
 

rogojel

TS Contributor
#15
hi,
I understand your point, but, if you go for a more advanced technique because you are interested in the strength of the association and the direction of the effect then you might just as well pick a technique that gives you the answers in a very understandable form. To me the easiest interpretation is something like "the probability of an effect is x for category A and y for category B" .

regards
 
#16
hi,
I understand your point, but, if you go for a more advanced technique because you are interested in the strength of the association and the direction of the effect then you might just as well pick a technique that gives you the answers in a very understandable form. To me the easiest interpretation is something like "the probability of an effect is x for category A and y for category B" .

regards
Hi rogo,

That makes a lot of sense, thank you. If I were at a more advanced statistical level, then I'm certain you're correct.

At my level, we are focused on bivariate associations. We learn about significance tests like: one-sample and two-sample hypothesis testing; chi square; ANOVA. And we learn about measures of association that allow us test for strength/direction of associations, like: Phi, Cramer's V, Lambda (all for nominal level variables); Gamma, Spearman's Rho (for ordinal level variables); and Pearson's r (for interval/ratio variables). The caveat for measures of association being that at the nominal level, we can't talk about direction, only pattern - the values for Phi, Cramer's V or Lambda do not give us a "direction". So, we are taught to calculate the percentages and describe the table.

You can see, probably, how we are taught to use the test most appropriate for the level of measurement and for the question at hand. (This is probably why I am divorced from the reality of how statistics are actually used in the research world.)

Thus, if my question is simply: "is there a statistically significant difference between the number of males versus females who vote democrat [or republican]?", my first instinct would be to simply do a two-sample test of difference in proportions. That answers the question, and is easiest for me to calculate. But I have learned I could use chi square instead, or as you mention, even logistic regression, if I'm interested in strength and direction as well. I'm going to attempt to learn more about logistic regression moving forward.

Thanks so much for your insight. I truly appreciate it. I am learning a lot.

Best,
Frodo
 
#17
As I understand it there are three alternatives:

- to test if there is a significant difference in the proportions (of being "democrat" for females and males).
- to do a chi-squared test on a 2x2 table.
- to test a logit model (or probit)

I believe (but I am not sure) that they will give exactly the same p-value. (So that they are essentially the same model in different disguise.) (Sorry, I am to lazy to check that at the moment. :) Please correct me!)

Anyway, I think they will give very similar results. I would use the proportions, since I think it is easier and that most people have no difficulty in understanding that.

But maybe there is a need for a multiple model. Often there are several explanatory variables. Then I would use a multiple logit model (also called multiple logistic regression).
 

rogojel

TS Contributor
#18
Hi Greta,
yes, you will get very similar p-values but with the logistic
regression you also get the strength of the association, the direction and an estimate of the probability of occurrence even if you only have one x with several levels.

regatds
 
#19
As I understand it there are three alternatives:

- to test if there is a significant difference in the proportions (of being "democrat" for females and males).
- to do a chi-squared test on a 2x2 table.
- to test a logit model (or probit)

I believe (but I am not sure) that they will give exactly the same p-value. (So that they are essentially the same model in different disguise.) (Sorry, I am to lazy to check that at the moment. :) Please correct me!)

Anyway, I think they will give very similar results. I would use the proportions, since I think it is easier and that most people have no difficulty in understanding that.

But maybe there is a need for a multiple model. Often there are several explanatory variables. Then I would use a multiple logit model (also called multiple logistic regression).
Hi Greta,

Thank you very much for this. This is a great summation and very helpful.

Best,
Frodo
 

gianmarco

TS Contributor
#20
Just a quick example:
let's assume we have a 2x2 table, cross-tabulating the assumption of a medicine (aspirin vs placebo) and presence (yes) or absence (no) of heart attack (I took this example from the web).

See the table below:
Code:
               no     yes    All

aspirin     10933     104  11037
            5,001  -5,001      *

placebo     10845     189  11034
           -5,001   5,001      *

All         21778     293  22071
a) we wish to see if there is an association between assuming a specific medicine and exeriencing heart attacks. We use chi-sq test. It returns a p-value which is well below 0.05, pointing to a significant assocation, that is a significant deviation of the data from the hypothesis of independence between the two categrical variables. Of course, we wish to know which level of "medicine assumption" correlates to which level of the "experiencing hearth attack". In the above table (under the counts), adjusted standardized residuals are reported. As you can see, by inspecting the sign of the values, there is a higher-than-expected frequency of hearth attacks among thos who take placebo, while there is a higher-than-expected frequency of no heart attack among those who assume the aspirin (the same picture arises if we take into account the negative residuals: e.g., there is a less-than-expected frequency of heart attacks among those who take aspirin).

b) we may wish to know what are the odds of experiencing an heart attack if, for instance, we take placebo; we could use:
-odds ratio calculated on the 2x2 table: (10933*189)/(104*10845)= 1.83

-binary logistic regression, with heart attack as dependent variable, and medicine assumption as predictor.
Code:
                                                Odds     95% CI
Predictor      Coef    SE Coef       Z      P  Ratio  Lower  Upper
Constant   -4,65515  0,0985233  -47,25  0,000
group
 placebo   0,605438   0,122842    4,93  0,000   1,83   1,44   2,33
The estimated coefficient for the placebo is 0.605 (p-value below 0.01), which corresponds to an odds ratio of 1.83 (of course, the same of the preceding one).
Interpretation: taking palcebo increases the odds of experiencing an heart attack by a factor of 1.83.

Hope this helps
Best
Gm