Uneven groups in outcome

I am trying to determine the relationship between basic demographics (age, race/ethnicity, gender, etc..) to hospital-acquired infections in 2017.

My outcome (Infections - Yes/No) came out to Yes = 20, and No = 417.

Are there any approaches to this dataset, or am I screwed? Thanks!


Omega Contributor
Depends on your agenda. I am guessing you weren't expecting so few infections. Good for you then. What are you trying to do overall? Before I give advice, can you collect more data across a larger time frame to increase sample size? The imbalance isn't the issue as much as 20 being a small number. You could have the imbalance by a sample 5x as large than you would have more modeling options. We will wait to hear back what you goals are with this dataset.

Thanks for the response! Mainly, I'm trying to see if there are any racial/ethnic disparities in healthcare-associated infections. The other demographics are a bonus in terms of disparity/equity.

I can collect more data from 2018 if it helps. What I am doing is pulling data from Hematology/Oncology - patients during admission that received an intervention with their central venous catheter. After, I look to see if an infection happened during their admission.