+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 21

Thread: Chi square test of independence - test of difference or association?

  1. #1
    Points: 833, Level: 15
    Level completed: 33%, Points required for next Level: 67

    Posts
    20
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Chi square test of independence - test of difference or association?




    Hi everyone,

    I understand chi square tests of independence, and have read the FAQ on chi square tests. What confuses me somewhat is the use of chi square in some instances.

    I have always understood a chi square test of independence as being a test of an association -- whether that association is a significant one or not. Or in other words, whether the two variables are found to be independent (i.e. not associated) or dependent (i.e. associated).

    And I have always understood that one- and two-sample hypothesis testing involves testing a difference, doing either a Z-test or t-test (depending on the sample size and whether sigma is known or unknown). Therefore, we are testing for if there is a statistically significant difference between a sample value and a population value (one sample test) or if there is a statistically significant difference between two sample values (two sample test).

    My confusion arises in that some sources seem to suggest that Chi square tests of independence can allow us to test for a significant difference. For example, let's say we're looking at the following two variables: gender (male or female) and voting preference (democrat or republican). And we want to know if there is a statistically significant difference between the number of women vs. men who vote democrat. I would think you would do a two-sample hypothesis test with sample proportions (males as one sample, females as a second sample), and test for a significant difference in the proportion of men vs women who vote democrat.

    But some sources I've read seem to suggest a Chi square test could be done, and we could have a 2x2 bivariate table which includes gender (male or female) or voting preference (democrat or republican). But doesn't chi square test for a significant association between the two variables, not for a difference? Or can we conclude that anyway, that a significant association suggests a difference between males and females? I know that chi square tests for the difference between observed vs. expected frequencies - an indirect test of the association between the variables.

    Wouldn't it be more apt to do a two-sample hypothesis test? Particularly, if we were specifying a direction, and we wanted to know if women are more likely to vote democrat, then we would have to do a two-sample hypothesis test, no?

    Textbooks I read characterize the chi square test of independence as a test for an association.

    I am hoping I am making any sense at all. If anyone can help, I would be greatly appreciative.

    Thanks,
    Frodo

  2. #2
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts

    Re: Chi square test of independence - test of difference or association?

    Hello,
    I think you got the picture right, and that you are just missing a nuance of the same issue.
    As you correctly state, Chi-sq test allows you to formally assess if there is a significant (categorical) association between two categorical variables, say gender and party (to keep with your example). If the test returns a significant p-value, you can say that the two variables are not independent from one another, AND THEREFORE party-preference is distributed differently across gender (i.e, among male and female voters). I see no 'contraddiction'.

    Hope this helps
    Gm
    http://cainarchaeology.weebly.com/

  3. The Following User Says Thank You to gianmarco For This Useful Post:

    Frodo/Sociology (06-28-2016)

  4. #3
    Points: 833, Level: 15
    Level completed: 33%, Points required for next Level: 67

    Posts
    20
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Chi square test of independence - test of difference or association?

    Quote Originally Posted by gianmarco View Post
    Hello,
    I think you got the picture right, and that you are just missing a nuance of the same issue.
    As you correctly state, Chi-sq test allows you to formally assess if there is a significant (categorical) association between two categorical variables, say gender and party (to keep with your example). If the test returns a significant p-value, you can say that the two variables are not independent from one another, AND THEREFORE party-preference is distributed differently across gender (i.e, among male and female voters). I see no 'contraddiction'.

    Hope this helps
    Gm
    Hi Gm,

    That helps considerably, thank you. Would it be correct to say, however, that simply doing a two-sample hypothesis test with proportions is the more appropriate statistical test to do in order to answer the original question of whether there is a difference in the number of women vs. men whose voting preference is democrat?

    In particular, if the question specified a direction to the test, and wanted to know if women were MORE likely to vote democrat (or if men were LESS likely to vote democrat), it would seem to me that we have no choice but to do a two-sample hypothesis test.

    Would it also be safe to argue that since Chi square is only an indirect test of association (as it is testing for the difference between expected versus observed frequencies), that a two-sample hypothesis test is a more direct, and thus more appropriate, test in this case?

    Thanks again for your help. I appreciate it.

    Best,
    Frodo

  5. #4
    TS Contributor
    Points: 17,742, Level: 84
    Level completed: 79%, Points required for next Level: 108
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,539
    Thanks
    56
    Thanked 639 Times in 601 Posts

    Re: Chi square test of independence - test of difference or association?

    In particular, if the question specified a direction to the test, and wanted to know if women were MORE likely to vote democrat (or if men were LESS likely to vote democrat), it would seem to me that we have no choice but to do a two-sample hypothesis test.
    What do you mean by this? A 2x2 table with Chi² test is a two-sample-test.
    To know the direction of the effect, you simply inspect the proportions of
    democrat voters within the female groups, compared to the proportion within
    the male group.

    With kind regards

    K.

  6. #5
    Points: 833, Level: 15
    Level completed: 33%, Points required for next Level: 67

    Posts
    20
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Chi square test of independence - test of difference or association?

    Quote Originally Posted by Karabiner View Post
    What do you mean by this? A 2x2 table with Chi² test is a two-sample-test.
    To know the direction of the effect, you simply inspect the proportions of
    democrat voters within the female groups, compared to the proportion within
    the male group.

    With kind regards

    K.
    Hi K.,

    Thanks for your answer. What I meant by my response was that in doing a two-sample hypothesis test, we can actually specify a direction within the test itself. We do this by placing all of our critical region on one side of the sampling distribution. Our alternate hypothesis, rather than simply stating that there is a difference in the proportion of men vs. women who support democrats, would dictate that we expect the population of women to have a higher proportion of those who will vote democrat. This is what I have learned as a one-tail test.

    From my understanding, we couldn't do this with Chi Square, and as you mention, we would have to instead inspect the proportions or percentages of democrat voters within the female group compared to the male group.

    Am I wrong in this? I could still be misunderstanding the purpose and utility of Chi Square. Thanks for your help.

    Best,
    Frodo

  7. #6
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts

    Re: Chi square test of independence - test of difference or association?

    Hello,
    I still do not totally get your concern with the chi-sq test.

    What ch-sq rest actually tells you is whether or not there is a significant association between two cross-tabulated categorical variables. So, to keep with your example, our research question would be: is there any association between GENDER and VOTING for a given political party? Of course, and by extension, if there is a significant dependence, being in one of the two levels of GENDER (e.g., being male) implies tending to be in one of the two voting categories (e.g., being a republican voting person). This of course implies that, should a dependence exist, the proportion of males among republican would not be the same relative to the proportion of males voting democratic.

    That said, when you analyze a contingency table you may want:
    1) to assess if a dependency exist
    2) to measure the size of that dependency
    3) understand the "direction" of the association between levels of the two categorical variables being compared.

    (1) is accomplished via the chi'sq test, which does not tell you how "strong" is the dependence (i.e., the "correlation" between the two variables);
    (2) is accomplished using different association coefficients; there are a nnumber of them available, according to the size of the table and according to other considerations;
    (3) is accomplished via different approach: one could be comparing percentages (which seems the method you would prefer); yet another one (which, in my opinion is more fit to the logic of chi-square test) is analyzing the table of standardized residuals. The residual (for each table's cell) is the difference between the observed count and the count you would expect under the hypothesis of independence. The residuals are standardized in order to have mean 0 and SD 1. A residual whose absolute value is larger than 1.96 indicates that that cell significantly deviates from the Null Hypothesis. The sing accompayning each residual indicates the direction of that "deviation": let's assume that the standardized residual related to the cross-tabulation of MALE vs REPUBLICAN is +2.00; this would indicate that there is a "positive" association between MALE voters and REPUBLICAN party, that is there is a larger than expected frequency of males among republican voters. In other words, males tends to vote republican party more frequently. In that situation, you may find out that the standardized residual for FEMALE would be -2.10, indicating that females tends to vote less frequently for republican.


    Hope this helps.

    You may want to give a read to a nice (old) little book:
    Reynolds, "Analysis of Nominal Data", SAGE University Paper 7, 1984
    http://cainarchaeology.weebly.com/

  8. The Following User Says Thank You to gianmarco For This Useful Post:

    rogojel (07-02-2016)

  9. #7
    Points: 833, Level: 15
    Level completed: 33%, Points required for next Level: 67

    Posts
    20
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Chi square test of independence - test of difference or association?

    Hi Gianmarco,

    Thank you very much for your post and for taking the time to respond and to write out your explanation, it is greatly appreciated.

    I suppose I should start by saying that I'm pretty sure I understand what are you saying: that if we are looking at the association between two variables, such as GENDER and VOTING in this case, we might want to know about whether (1) the association exists, (2) the strength (or "size" as you say) of the association, and (3) the "direction" of the association. However, I would add, that since we're dealing with nominal level variables, we can't really speak about "direction", only pattern, as we can't rank or order the scores or categories of nominal level variables.

    I also understand that Chi Square would only test the association for significance - that is, that it exists in the population. I know there are some different options for testing strength (Phi coefficient for 2x2 table in this case, or Lambda). And I thank you for explaining some other options aside from comparing percentages in order to look at the pattern in the data - this was helpful.

    My concern from my last post specifically pertained to the original question, which is asking us to test for a statistically significant "difference" in the proportion of males vs. females who vote democrat (or republican, or whatever we may be interested in). If that is the question we are trying to answer, why bother with a Chi Square test? Why not just do a two-sample hypothesis test (to test for the significance of the difference between the two sample proportions)? Isn't that a whole lot easier, and more directly answers our question?

    The test statistic formula for our significance test would be, for large samples, that Z(obtained) = Ps1 - Ps2 / standard deviation of the sampling distribution of the difference in sample proportions

    with

    Ps1 = the sample proportion for men
    Ps2 = the sample proportion for women

    Then, if the test statistic falls into the critical region, or we obtain a significant p-value, then we can say that there is a statistically significant difference

    If I'm still not being clear enough, I am truly sorry. I thank you all for your help and patience.

    Best,
    Frodo

  10. #8
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts

    Re: Chi square test of independence - test of difference or association?

    Hello,
    I think we are spiralling around the same issue over and over again.
    Bottom line:
    the test of proportion and the chi-sq test actually address two different questions. Pick up what is more suitable to your research question.

    By the way:
    Quote Originally Posted by Frodo/Sociology View Post
    However, I would add, that since we're dealing with nominal level variables, we can't really speak about "direction", only pattern, as we can't rank or order the scores or categories of nominal level variables.
    I was referring to the direction of the difference between observed and expected counts, in the context of standardized residuals. To my mind, positive vs. negative values do indicate a difference in "direction".


    Best
    Gm
    http://cainarchaeology.weebly.com/

  11. The Following User Says Thank You to gianmarco For This Useful Post:

    Frodo/Sociology (07-01-2016)

  12. #9
    Points: 833, Level: 15
    Level completed: 33%, Points required for next Level: 67

    Posts
    20
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Chi square test of independence - test of difference or association?

    Quote Originally Posted by gianmarco View Post
    Hello,
    I think we are spiralling around the same issue over and over again.
    Bottom line:
    the test of proportion and the chi-sq test actually address two different questions. Pick up what is more suitable to your research question.
    Hi Gm,

    This is precisely what I was trying to get at -- that the two-sample test of proportions is the test, in my mind, more suitable for the research question (i.e. wanting to know if there is a statistically significant difference between males vs. females who vote democrat [or republican]).

    Because, as you mentioned, a Chi Square test of independence addresses a different question. I have always thought (perhaps incorrectly) that the Chi Square test is one that tests for whether there is a significant association (i.e. dependence) between two variables (NOT whether there is a significant difference between two samples).

    This brings me back to my original post/question that started this thread -- confusion around what Chi Square actually tests.

    The wonderful help in this thread has seemed to suggest that Chi Square and two-sample test of proportions would be interchangeable in this case, and so I'm still left somewhat confused. Maybe someone would be so kind as to briefly make plainly clear the difference.

    For example, earlier you mentioned:

    I think you got the picture right, and that you are just missing a nuance of the same issue.
    As you correctly state, Chi-sq test allows you to formally assess if there is a significant (categorical) association between two categorical variables, say gender and party (to keep with your example). If the test returns a significant p-value, you can say that the two variables are not independent from one another, AND THEREFORE party-preference is distributed differently across gender (i.e, among male and female voters). I see no 'contradiction'.
    Does this mean then: 1) Chi square could be used here to test for a significant difference in party-preference for men vs women, but that 2) a test of difference in proportions would be more suitable?

    One day I will fully understand Chi Square. Thank you again for your help, Gm. I am sorry if I have been frustratingly dense. Please know that I truly appreciate it, and wouldn't blame you if you didn't want to help any further.

    Best,
    Frodo

  13. #10
    Points: 833, Level: 15
    Level completed: 33%, Points required for next Level: 67

    Posts
    20
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Chi square test of independence - test of difference or association?

    Maybe this is how it is:

    If my research question were: "Is there a significant relationship/association between GENDER (male or female) and VOTING PREFERENCE (democrat or republican)?"

    ... then I would conduct a Chi Square test of independence, to test for if the two variables are statistically significantly associated/related/dependent. But, should a dependence/association exist, this consequently does tell me that the proportion of males voting democrat/republican is not the same as the proportion of females voting democrat/republican, and so therefore there is a difference in voting by gender (men and women are significantly different in terms of voting preference). Right?

    If my research question were: "Is there a significant difference between the proportion of MALES versus the proportion of FEMALES who vote Democrat [or Republican]?"

    ... then I would conduct a Two-sample test for difference in proportions, in order to test for if there is a statistically significance difference in the proportion of men vs. women who vote for a particular party. This directly tests for a significant difference (right? Or I could have done a Chi Square test too?)

    Does this mean then the two tests are interchangeable (or sometimes interchangeable; interchangeable with the second research question, but not the first)? I'm not sure why this is so hard for me.
    Last edited by Frodo/Sociology; 07-01-2016 at 08:03 PM.

  14. #11
    Points: 833, Level: 15
    Level completed: 33%, Points required for next Level: 67

    Posts
    20
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Chi square test of independence - test of difference or association?

    I think I found some clarity from yet another stats textbook I got my hands on. From the text:
    • "In fact, the chi-squared test of independence is equivalent to a test for equality of two population proportions. Section 7.2 presented a z test statistic for this, based on dividing the difference of sample proportions by its standard error ... The chi-squared statistic relates to this z statistic by X^2 = z^2."

      "For a 2x2 table, why should we ever do a z test if we can get the same result with chi-squared? An advantage of the z test is that it also applies with one-sided alternative hypotheses ... The direction of the effect is lost in squaring z and using X^2."

    This last point is the one I was trying to ask about earlier when I mentioned that in doing a two-sample hypothesis test, we can actually specify a direction within the test itself. We can't do that with chi square. Thus, for example, if I wanted to know if women are MORE LIKELY to vote democrat than men (a one-tail or one-sided test), a two-sample z test helps me do this.

    The textbook goes on to say that we need chi-squared for larger tables than 2x2, as we then have more than one comparison: "we could use a z statistic for each comparison, but not a single z statistic for the overall test of independence".

  15. #12
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Chi square test of independence - test of difference or association?

    Quote Originally Posted by gianmarco View Post

    That said, when you analyze a contingency table you may want:
    1) to assess if a dependency exist
    2) to measure the size of that dependency
    3) understand the "direction" of the association between levels of the two categorical variables being compared.


    You may want to give a read to a nice (old) little book:
    Reynolds, "Analysis of Nominal Data", SAGE University Paper 7, 1984
    Hi GM,
    I think, as opposed to know, that all the points above could easily be achieved by using a logistic regression with discrete factors only. What do you think?

  16. #13
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Chi square test of independence - test of difference or association?

    Quick update: I tried this out with a dataset and it actually works very well: using logistic regression I get all the answers to the above questions plus as a bonus the posibility to model and predict probabilities for each value of the factor. So, to me, the question is now why would anyone use chi-squared at all?

    regards

  17. #14
    Points: 833, Level: 15
    Level completed: 33%, Points required for next Level: 67

    Posts
    20
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Chi square test of independence - test of difference or association?

    Quote Originally Posted by rogojel View Post
    Quick update: I tried this out with a dataset and it actually works very well: using logistic regression I get all the answers to the above questions plus as a bonus the posibility to model and predict probabilities for each value of the factor. So, to me, the question is now why would anyone use chi-squared at all?

    regards
    Hi rogo,

    Thank you so much for your post.

    My answer would probably be: because chi square is easier and will answer my research question. If, instead, my research question involved wanting to model the determinants of and predict the likelihood of an outcome (i.e. to make predictions), then logistic regression sounds appropriate. But I'm not sure why I would want to bother with that otherwise.

    And certainly at my level of statistics, where we are focused on bivariate association, chi square would be highlighted as the most appropriate technique, because we are dealing with two nominal/categorical variables. We are taught to use the test most appropriate for the level of measurement. Of course, if my question is simply the one looking to test for a difference between males vs. females, then to be honest, if I wasn't on here and wasn't getting any help, I would have just done a two-sample test of difference of proportions. I wouldn't have even done chi square, because the question is just asking to test for a difference -- nothing else.

    Best,
    Frodo

  18. #15
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Chi square test of independence - test of difference or association?


    hi,
    I understand your point, but, if you go for a more advanced technique because you are interested in the strength of the association and the direction of the effect then you might just as well pick a technique that gives you the answers in a very understandable form. To me the easiest interpretation is something like "the probability of an effect is x for category A and y for category B" .

    regards

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats