# Comparing two groups with categorical data- Need help in choosing the correct stat test to assess significance

#### Ffrozenffeelings

##### New Member
Hello everyone,

I am currently doing a Research project and am unsure what test I should use to test statistical significance. I am trying to assess whether certain findings on a CT scan appear more frequently in a specific group of patients (present with a chest pain), compared to a control group (don't present with chest pain). This can be then used to support my hypothesis that patients with chest pain but no heart disease may have something else wrong (visible on CT) that could explain their problem.
The situation is as follows:

In my sample population (n~200), I have gone through CT scans for each patient and identified if they have certain findings using a checklist (e.g Finding A yes/no, Finding B yes/no Finding C Yes/no). I have done this for the full set and obtained how many there are in total for each Finding.

I wish to repeat the procedure on a control population (n~200) and look for the same findings (using the same yes/no checklist) on each patient's scan.

Then I would like to calculate whether there is a significant difference for each finding between both groups (i.e. Does my sample population show Finding A,B,C... more so than my control population).

However, I am unsure as to what statistical test I should use? Could I run a Fischer's exact test for each finding (see picture of example table using Prism Software) to assess the significance, and then repeat the process for every finding, or should I approach it in a different fashion (there are a total of 36 findings on my checklist).

Additionally, should I conduct a normality test on my data, or can I assume its parametric?

Would appreciate any advice! Thanks for your time.

#### Attachments

• 44.2 KB Views: 2

#### Omerikooo

##### New Member
Hello,

I think you should do repeated Chi-square tests (fisher if needed) for each outcome. For example fining A present/absent vs controls/cases. This will simply show if the frequency of your findings differ between cases and controls. BUT! I wouldn't stop there.

Seeing the predictive value of these findings can also be important in medical field. For example, pain relief with nitroglycerin increases the chest pain being due to angina. If you do a chi-square with pain relief with angina (absent/present) and cause of chest pain(cardiac/non-cardiac) you will only see the difference of frequencies of these findings. For example: pain relief with nitroglycerin was more frequent in the cardiac chest pain group.

So you should definitely run some logistic regression analyses(to see Odds ratios) and maybe (not a must) further classification algorithms(i.e. decision trees). This would give something like "Pain relief with nitroglycerin increases the pain of cardiac origin by 80%(OR = 1.80).

#### Ffrozenffeelings

##### New Member
Hello,

I think you should do repeated Chi-square tests (fisher if needed) for each outcome. For example fining A present/absent vs controls/cases. This will simply show if the frequency of your findings differ between cases and controls. BUT! I wouldn't stop there.

Seeing the predictive value of these findings can also be important in medical field. For example, pain relief with nitroglycerin increases the chest pain being due to angina. If you do a chi-square with pain relief with angina (absent/present) and cause of chest pain(cardiac/non-cardiac) you will only see the difference of frequencies of these findings. For example: pain relief with nitroglycerin was more frequent in the cardiac chest pain group.

So you should definitely run some logistic regression analyses(to see Odds ratios) and maybe (not a must) further classification algorithms(i.e. decision trees). This would give something like "Pain relief with nitroglycerin increases the pain of cardiac origin by 80%(OR = 1.80).
Thank you omerikooo for your prompt response.

By repeated Chi-square test do you mean repeat the chi-square multiple times for each finding, or is that something different? I thought to do a chi-squared test you needed expected vs observed characteristics- so it wouldn't be applicable in my case? Additionally, isn't Fisher better than chi-squared which is more of an estimate?

I like your idea, however due to time constraints I can only look at findings in my population Vs control. Do you think it would be worth calculating Odds ratios when I do chi-squared/Fishers test anyway, so in the results write-up I can say "Having chest pain means you are 30% more likely to have HH compared to not showing the symptom (OR=1.3)- (even though my study is a retrospective study, not a retrospective case-control study).

#### Omerikooo

##### New Member
Yes I mean a chi-square test for every finding.

To run a Chi-square test your data is adequate. Expected values are intrinsically calculated in chi-square formula based on your contingency table, so no, you don't need to provide extra expected characteristics data, chi-square test takes care of it.

I'm not aware of any superiorities between chi-square and fisher. I use fisher when expected values are below 5 in more than 20% of the cells. I report the result of fisher and chi-square the same in my research. Any deeper explanation could be great about this from any user

I always like to see predictive value of my findings so I routinely conduct logistic regression tests. You are right about the retrospective nature of your study limits the application of logistic regression. But I would do it anyways and let the editors decide.

#### Ffrozenffeelings

##### New Member
Yes I mean a chi-square test for every finding.

To run a Chi-square test your data is adequate. Expected values are intrinsically calculated in chi-square formula based on your contingency table, so no, you don't need to provide extra expected characteristics data, chi-square test takes care of it.

I'm not aware of any superiorities between chi-square and fisher. I use fisher when expected values are below 5 in more than 20% of the cells. I report the result of fisher and chi-square the same in my research. Any deeper explanation could be great about this from any user

I always like to see predictive value of my findings so I routinely conduct logistic regression tests. You are right about the retrospective nature of your study limits the application of logistic regression. But I would do it anyways and let the editors decide.
Thanks for the help! I think I read that Fishers is an exact value used when the sample size is small (which I believe mine is?) and is better used when trying to calculate ODDS ratios. Hence I think it may be more appropriate. But i'm open to a better explanation as my statistical knowledge is limited!

Is it acceptable to simply calculate the significance between each group for each finding and express it as an OR (as contrary to my older post this would be a case-control study, albeit the control population is may not be matched), since I believe I may not be able to create logistic regression.

#### Omerikooo

##### New Member
To give odds you should do logistic regression. With chi or fisher you can only report significant frequency difference between groups with regards to your findings.

Chi and fisher doesn't give Odds.

#### Ffrozenffeelings

##### New Member
To give odds you should do logistic regression. With chi or fisher you can only report significant frequency difference between groups with regards to your findings.

Chi and fisher doesn't give Odds.
Apologies, I am slightly confused. My statistical package can calculate Odds ratios using Fisher/Chi-squared tests, are these incorrect values? I have attached examples of random data to show what I mean.

#### Attachments

• 340.9 KB Views: 3
• 46.4 KB Views: 3

#### Omerikooo

##### New Member
Okay, I'm not sure what that really means.

Odds ratio can be calculated from contingency tables as you put in the second photo. But I'm also sure that it can be calculated via logistic regression analysis. There should be a relationship between them but I don't know really.

I will try to do some research and may come to an understanding. Which software are you using ?

#### Ffrozenffeelings

##### New Member
Okay, I'm not sure what that really means.

Odds ratio can be calculated from contingency tables as you put in the second photo. But I'm also sure that it can be calculated via logistic regression analysis. There should be a relationship between them but I don't know really.

I will try to do some research and may come to an understanding. Which software are you using ?
Thank you for the reply. I am using Graphpad Prism