low frequency independent variable in logistic regression

#1
I'm running a logistic regression and I think there's a variable that I'm not treating properly.

I'm modeling traffic accidents (1) vs. no accident (0) on the highway for the day. I have a binary rain variable that is precipitation (1) or no precipitation (0).

I have only 30 days with a value of 1 for that variable, however when it does occur 28 of the 30 times there is an accident. My coefficient for that variable is very low though. I think this is because 90% of the time I have a 0 for weather, and there are still many accidents. The regression is showing a weak correlation because I have 9 times as many no-weather days where there may or may not be an accident.

I'm guessing that I have to treat this variable differently since it's such low frequency.
Would I just model accidents 1/0 and with a single variable, and only use days where it rained, and then find the coefficient to manually use in my regression with multiple variables?
 

hlsmith

Omega Contributor
#2
How many observations do you have?


You should look into using the Firth Correction or exact logistic regression, perhaps.


P.S., sweet account name!
 

hlsmith

Omega Contributor
#4
What program are you using, some will spit out a warning if your data are too sparse for the model to converge. Something like "quasi-complete separation, etc."
 
#5
What program are you using, some will spit out a warning if your data are too sparse for the model to converge. Something like "quasi-complete separation, etc."
Hi, I am using STATA, but will likely transition to SAS EG. I don't receive a warning that I've noticed.

Having 112 accidents with a -0- for weather is negating the fact that 28/30 times weather is 1 then accident is also a 1. Maybe if this wasn't a binary variable and was amount of rain/snow that would help?