Large logit coefficient - what it means

#1
I am constructing a logit model with 6 explanators, trying to predict outcomes of hockey matches using odds (the variable I am referring to below) and other variables.
I'm just looking for some information on why a logit model may produce very a large coefficient (13.3186), how one should deal with it, and what it means. It was also statistically significant: std. error 5.48536, wald 5.892. All other variables were insignificant.
I realise I have given a minimal amount of info here, but if anyone could point me in the direction of this, or help, it would be great. I have exhausted other sources in relation to answering this. Thanks.
 

noetsi

Fortran must die
#2
One reason it can be large is that the ML estimates don't exist, that is iteration failed. However you should get a warning if this occured.
 
#3
Thanks for the reply. I didn't get a warning to that effect, the iterations were completed. I'm thinking perhaps was the variable in question too correlated with the dependent relative to the others, which were all insignificant.
 

noetsi

Fortran must die
#4
One reason that you get large coefficients, I now know is partial data separation or full data separation. But as noted you will get an error message in most software. Another reason might be the unit of the IV. For example if you measure change in income as the DV and your unit for the IV is decade, you will get a huge coefficient. Obviously that is a silly example, but you might look at what the unit of your IV is.

If the IV in question was the only major driver, it may be it leads to signficant change.
 

hlsmith

Less is more. Stay pure. Stay poor.
#5
Key question, when looking back at the data does the coefficient make sense, but is just surprising?

Does it make sense in that the value could actually be achieved. Was the DV continous? If not can you create an unadjusted 2 x 2 table and see that this always or never occurs with your dependent. If it is continuous, calculate the mean or median of the variable for the two groups (stratified by the dependent variable). Do these data seem appropriate given the generated coefficient??

Lastly, what is your Odds Ratio looking like. Is the Confidence interval about right or does reach toward infinity?

Also, what is the model fit like, and how many values do you have in each group of the dependent variable and how many IV are you using?
 
#7
Would SPSS give an error message if a separation occured? The unit of this ind. variable is probabilities implied by bookmaker's odd, ie. decimal odds of 5 imply a probability of 0.2. It is in this form (0.2) that the ind. variable has been used. But I see what you mean with that example.
 

hlsmith

Less is more. Stay pure. Stay poor.
#8
So do you think the beta coefficient is incorrect? If this is logistic regression, does the Hosmer Lemeshow test so appropriate fit. Have you stratified the probabilities by dependent group and is there a big difference that explains your results?
 
#9
Sorry I only got a quick reply earlier, I had to go somewhere just after I read the replies!
The coefficient has the correct sign. In the context of the model I would expect the a greater prob of a 'one' in the dependent as this variable increases. But, yeah, the magnitude is baffling.
Well, exp(13.3186) = 605863.43, with Confidence Interval - LOWER 12.982, UPPER 28276381559.
Hosmer Lemeshow shows: sig. 0.774, indicating a good fit. Haven't stratified it yet, I'll have a go soon.
 

Dason

Ambassador to the humans
#10
Can you tell us more about the predictor that gives you this beta? What are the possible range of values it takes on?
 

noetsi

Fortran must die
#11
Would SPSS give an error message if a separation occured?
YES it does the following:

Warnings
|-----------------------------------------------------------------------------------------|
|The parameter covariance matrix cannot be computed. Remaining statistics will be omitted.|

This may be useful on this issue (I ran into major separation and quasi seperation issues recently so I have been working through it). From what you have said you don't have this problem in that SPSS unlike SAS will not even generate estimates if you have this problem.

http://www.ats.ucla.edu/stat/mult_pkg/faq/general/complete_separation_logit_models.htm
 
#12
No problem. Probabilities which range from a min. of 0.418 to 0.657. Average (mean) of 0.55. 100 observations. For the dependents, 0 and 1, the average when 0 is 0.537, the average when 1, 0.562.