Re: Chi square test of independence - test of difference or association?

Originally Posted by rogojel

hi,
I understand your point, but, if you go for a more advanced technique because you are interested in the strength of the association and the direction of the effect then you might just as well pick a technique that gives you the answers in a very understandable form. To me the easiest interpretation is something like "the probability of an effect is x for category A and y for category B" .

regards

Hi rogo,

That makes a lot of sense, thank you. If I were at a more advanced statistical level, then I'm certain you're correct.

At my level, we are focused on bivariate associations. We learn about significance tests like: one-sample and two-sample hypothesis testing; chi square; ANOVA. And we learn about measures of association that allow us test for strength/direction of associations, like: Phi, Cramer's V, Lambda (all for nominal level variables); Gamma, Spearman's Rho (for ordinal level variables); and Pearson's r (for interval/ratio variables). The caveat for measures of association being that at the nominal level, we can't talk about direction, only pattern - the values for Phi, Cramer's V or Lambda do not give us a "direction". So, we are taught to calculate the percentages and describe the table.

You can see, probably, how we are taught to use the test most appropriate for the level of measurement and for the question at hand. (This is probably why I am divorced from the reality of how statistics are actually used in the research world.)

Thus, if my question is simply: "is there a statistically significant difference between the number of males versus females who vote democrat [or republican]?", my first instinct would be to simply do a two-sample test of difference in proportions. That answers the question, and is easiest for me to calculate. But I have learned I could use chi square instead, or as you mention, even logistic regression, if I'm interested in strength and direction as well. I'm going to attempt to learn more about logistic regression moving forward.

Thanks so much for your insight. I truly appreciate it. I am learning a lot.

Re: Chi square test of independence - test of difference or association?

As I understand it there are three alternatives:

- to test if there is a significant difference in the proportions (of being "democrat" for females and males).
- to do a chi-squared test on a 2x2 table.
- to test a logit model (or probit)

I believe (but I am not sure) that they will give exactly the same p-value. (So that they are essentially the same model in different disguise.) (Sorry, I am to lazy to check that at the moment. Please correct me!)

Anyway, I think they will give very similar results. I would use the proportions, since I think it is easier and that most people have no difficulty in understanding that.

But maybe there is a need for a multiple model. Often there are several explanatory variables. Then I would use a multiple logit model (also called multiple logistic regression).

Re: Chi square test of independence - test of difference or association?

Hi Greta,
yes, you will get very similar p-values but with the logistic
regression you also get the strength of the association, the direction and an estimate of the probability of occurrence even if you only have one x with several levels.

Re: Chi square test of independence - test of difference or association?

Originally Posted by GretaGarbo

As I understand it there are three alternatives:

- to test if there is a significant difference in the proportions (of being "democrat" for females and males).
- to do a chi-squared test on a 2x2 table.
- to test a logit model (or probit)

I believe (but I am not sure) that they will give exactly the same p-value. (So that they are essentially the same model in different disguise.) (Sorry, I am to lazy to check that at the moment. Please correct me!)

Anyway, I think they will give very similar results. I would use the proportions, since I think it is easier and that most people have no difficulty in understanding that.

But maybe there is a need for a multiple model. Often there are several explanatory variables. Then I would use a multiple logit model (also called multiple logistic regression).

Hi Greta,

Thank you very much for this. This is a great summation and very helpful.

Re: Chi square test of independence - test of difference or association?

Just a quick example:
let's assume we have a 2x2 table, cross-tabulating the assumption of a medicine (aspirin vs placebo) and presence (yes) or absence (no) of heart attack (I took this example from the web).

See the table below:

Code:

no yes All
aspirin 10933 104 11037
5,001 -5,001 *
placebo 10845 189 11034
-5,001 5,001 *
All 21778 293 22071

a) we wish to see if there is an association between assuming a specific medicine and exeriencing heart attacks. We use chi-sq test. It returns a p-value which is well below 0.05, pointing to a significant assocation, that is a significant deviation of the data from the hypothesis of independence between the two categrical variables. Of course, we wish to know which level of "medicine assumption" correlates to which level of the "experiencing hearth attack". In the above table (under the counts), adjusted standardized residuals are reported. As you can see, by inspecting the sign of the values, there is a higher-than-expected frequency of hearth attacks among thos who take placebo, while there is a higher-than-expected frequency of no heart attack among those who assume the aspirin (the same picture arises if we take into account the negative residuals: e.g., there is a less-than-expected frequency of heart attacks among those who take aspirin).

b) we may wish to know what are the odds of experiencing an heart attack if, for instance, we take placebo; we could use:
-odds ratio calculated on the 2x2 table: (10933*189)/(104*10845)= 1.83

-binary logistic regression, with heart attack as dependent variable, and medicine assumption as predictor.

Code:

Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant -4,65515 0,0985233 -47,25 0,000
group
placebo 0,605438 0,122842 4,93 0,000 1,83 1,44 2,33

The estimated coefficient for the placebo is 0.605 (p-value below 0.01), which corresponds to an odds ratio of 1.83 (of course, the same of the preceding one).
Interpretation: taking palcebo increases the odds of experiencing an heart attack by a factor of 1.83.

Re: Chi square test of independence - test of difference or association?

hi,
you could also use the "predict" function to get the probabilities of getting a heart attack in both cases. This is probably easier to communicate than odds ratios.