# sample size and best test to use nominal categorical data

#### Peita77

##### New Member
H i have some data on screening rates amongst presentations to emergency departmetn
The data is divided into a number of different lots of categories that i want to test for statistically significant difference between
for exanoke one lot is after hours or normal hour presentations

another group is reason for presentation - there is about 5 say runny nose, cough, chest pain, injury, depression

How do I work out minimum sample size AND
whcih test should i use

i thought originally chi squared but now i am not sure given sample size in each category will be different?

Much appreciated - long time since i did stats!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Your writing is hard to read. This sounds like a logistic regression problem. To power it, you may think about the smallest difference that you would like to discern, and figure out how many people would be needed given your assumption in order to reject your null hypothesis.

However, I will note you seem to be looking at a lot of comparisons or covariates, so your risk for false positives would be high if you don't address this issue - perhaps via you set level of significance.

#### Peita77

##### New Member
Your writing is hard to read. This sounds like a logistic regression problem. To power it, you may think about the smallest difference that you would like to discern, and figure out how many people would be needed given your assumption in order to reject your null hypothesis.

However, I will note you seem to be looking at a lot of comparisons or covariates, so your risk for false positives would be high if you don't address this issue - perhaps via you set level of significance.
Okay so I am not looking for the amount they contribute as in I don’t care how much being on one group predicts if you get screened

I simply want to know if there is a statistical difference between groups - why can’t I just pick the smallest frequency group and the largest frequency and do chi squared?

surely when there is only two groups like after hours or normal hours I can do a chi squared

This needs to be simple - it’s just an audit

#### Peita77

##### New Member
Your writing is hard to read. This sounds like a logistic regression problem. To power it, you may think about the smallest difference that you would like to discern, and figure out how many people would be needed given your assumption in order to reject your null hypothesis.

However, I will note you seem to be looking at a lot of comparisons or covariates, so your risk for false positives would be high if you don't address this issue - perhaps via you set level of significance.
And how do I do that how do I work out smallest difference and relate to sample size for categorical data

#### Karabiner

##### TS Contributor
So you want to compare hours (2 levels) with reason (5 levels)?
Is your focus on the global test (2*5 table) or on the pairwise comparisons
(2*2)? In any case, you can do a Chi square test on this. For a 2*2
tacke you will find sample size calculators on the internet. For a 2*5
you could consider using freeware like g*power by Faul & Erdfelder.
I guess it would be reasonable to perform a global test before
pairwise comparisons.

With kind regards

Karabiner

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I have attempted to reread your above posts - not an easy task. So you want to compare the presenting complaints in patients during (I guess) hours where general clinics are open versus hours they are closed. I am guessing you are ignoring urgent care clinics in this framing. Thus, your results will tell you if patients presenting after hours to the ED are different from those during normal business operating hours.

I get you likely just want some quick and dirty number, but if you are not strategic with the set-up of the project, you will likely get erroneous results. Say if you don't consider the left without being seen rates across periods , seasonality (respiratory season), impact of COVID-19, etc. Focus on 'significance' is usually always flawed. you should pick your top presenting conditions and compare the rate differences across the period for each one. If you just run a chi-sq, what do you really get out of it? Say you have a large tertiary center, chi-sq may say a 3% change is significant - however, is that really clinically significant or of practical importance. Probably not.

Also, do you want to know 'sample size' in order to power a chi-sq test? It seems you are vague about this. Just procure two representative sample and compare rates.

#### Peita77

##### New Member
It’s all standardised they remain constant those variables what I want is to determine if their is one group of patients more likely to be screened for domestic violence than another - due to stigma of illness
Ie are people with diagnosis of substance use disorder more likely to be screened than those with depression
There are like five diagnosis

then are people who present after hours more likely to be screened than those who present in normal hours

it points to reasons people may not be screened in relation to time of presentation or primary diagnossi
Or reason for oresentation - there’s three different groups

#### Peita77

##### New Member
So you want to compare hours (2 levels) with reason (5 levels)?
Is your focus on the global test (2*5 table) or on the pairwise comparisons
(2*2)? In any case, you can do a Chi square test on this. For a 2*2
tacke you will find sample size calculators on the internet. For a 2*5
you could consider using freeware like g*power by Faul & Erdfelder.
I guess it would be reasonable to perform a global test before
pairwise comparisons.

With kind regards

Karabiner
no I am testing rates of screening for domestic violence in mental health presentations to emergency department I am testing for things like are people who present after hours more likely to be screened than those presenting in normal hours and amongst reason for presentation are those presenting with behavioural disturbance more likely to be screened than those presenting with suicidal ideation - there’s about 5 different reasons for presentation and does mental health diagnosis impact chance of being screened so say are those with depression more likely to be screened than those with substance use / basically it’s looking at operational reasons that may impact screening and or stigma or ease of assessment of one group in that there’s more stigma around people with substance use so they may be less likely to be screened than those with depression - obviously stigma is a theory as to why as well as ithers

#### Peita77

##### New Member
If I found a global difference then I would do pairs starting with smallest and largest frequency

#### Karabiner

##### TS Contributor
This needs to be simple - it’s just an audit
Agreed. So why do you want to perform a statstical test of significance at all?

Anayway, if you found a global difference, then start your pairwise comparisowith the largest difference,
as you suggested.

With kind regards

Karabiner

#### Peita77

##### New Member
As I said to see if there is any difference between groups to inform why people are or aren’t getting screened in relation to those specific categories ie are people who come after hours more likely to get screened than normal hours - yes - great - we know an area to investigate next

Are people with mood disorders more likely to be screened than substance users - great - what is it about this group is it stings - another area to look at if no groups are different then it’s not a contributing factor

basically to work out potential contributing factors as to likelihood of someone being screened

#### Peita77

##### New Member
Agreed. So why do you want to perform a statstical test of significance at all?

Anayway, if you found a global difference, then start your pairwise comparisowith the largest difference,
as you suggested.

With kind regards

Karabiner
Because I used to do stats in engineering twenty years ago so I was not going to miss opportunity to extract data that could inform practise while I was trudging through files - it’s to direct further research and inform intervention potentially- the question is is chi squared okay - and I think it is

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Rate differences all the way, if you are just doing bivariate comparisons. What will you do with chi-square results? So you have a p value or standardized residual, what does that mean clinically, nothing?

#### Peita77

##### New Member
Rate differences all the way, if you are just doing bivariate comparisons. What will you do with chi-square results? So you have a p value or standardized residual, what does that mean clinically, nothing?
Yes it does mean something - it’s to see if one group is getting screened more than another - we see that a lot in mental health - and then in the audit cycle you can use that for your intervention educate and out in processes to endure people with that diagnosis or presentation aren’t being missed

#### Peita77

##### New Member
Rate differences all the way, if you are just doing bivariate comparisons. What will you do with chi-square results? So you have a p value or standardized residual, what does that mean clinically, nothing?
What do you mean rate differences? I just want to know is there a difference bewteren groups that’s all - that’s all I want - there is a lot that can be done with that as well as informing future research

#### Peita77

##### New Member
By the way it’s not screening for a disease - it’s different - it’s screening for domestic violence I want to know the chance of screening - not anything about outcome - outcome is not the point this is a quality assurance which is about imporovibg the process - which is compared to gold standard of 100% screening rate - so all are supposed to be screened at same rate

#### Peita77

##### New Member
All I want to know is is there a difference between groups in the rate of mandatory screening

#### Peita77

##### New Member
No not rate differences I just looked

#### hlsmith

##### Less is more. Stay pure. Stay poor.
All I want to know is is there a difference between groups in the rate of mandatory screening
You say it yourself, rate difference. You calculate the rate in one group and then the other group and subtract the two rates and add confidences intervals. If confidence intervals exclude '0', the null value - then screening rates are different. Boom.

I would like to hear what you plan to do with the output of a chi-sq test, all you get is a pvalue. A pvalue does not tell you the magnitude of the difference or direction - it is clinically useless and archaic compared to a rate difference which tells you the actual difference in rates between groups along with whether given the set alpha level you can reject the null of no difference. And if desired you can get rate differences from multiple regression models where you adjust for other variables. Take if from someone who has worked in a hospital/academic institution for over 20 years doing healthcare research - rate difference all the way!

#### Peita77

##### New Member
You say it yourself, rate difference. You calculate the rate in one group and then the other group and subtract the two rates and add confidences intervals. If confidence intervals exclude '0', the null value - then screening rates are different. Boom.

I would like to hear what you plan to do with the output of a chi-sq test, all you get is a pvalue. A pvalue does not tell you the magnitude of the difference or direction - it is clinically useless and archaic compared to a rate difference which tells you the actual difference in rates between groups along with whether given the set alpha level you can reject the null of no difference. And if desired you can get rate differences from multiple regression models where you adjust for other variables. Take if from someone who has worked in a hospital/academic institution for over 20 years doing healthcare research - rate difference all the way!