I have a large database with more than a million records. I would like to study the impact of several predictor variables (age, race, etc) on one binary variable of interest yes or no. My issue is that the distribution of this outcome variable (yes and no) is ver unbalanced 97% yes and 3% no. Is this a problem?
Thank you for your help!
Advertise on Talk Stats