Uneven groups in outcome

#1
I am trying to determine the relationship between basic demographics (age, race/ethnicity, gender, etc..) to hospital-acquired infections in 2017.

My outcome (Infections - Yes/No) came out to Yes = 20, and No = 417.

Are there any approaches to this dataset, or am I screwed? Thanks!
 

hlsmith

Omega Contributor
#2
Depends on your agenda. I am guessing you weren't expecting so few infections. Good for you then. What are you trying to do overall? Before I give advice, can you collect more data across a larger time frame to increase sample size? The imbalance isn't the issue as much as 20 being a small number. You could have the imbalance by a sample 5x as large than you would have more modeling options. We will wait to hear back what you goals are with this dataset.

Thanks.
 
#3
Thanks for the response! Mainly, I'm trying to see if there are any racial/ethnic disparities in healthcare-associated infections. The other demographics are a bonus in terms of disparity/equity.

I can collect more data from 2018 if it helps. What I am doing is pulling data from Hematology/Oncology - patients during admission that received an intervention with their central venous catheter. After, I look to see if an infection happened during their admission.