Costant coefficient in logit model


I would like to make sure of my argument. Am I right when I say that a constant coefficient in logit model MUST BE close to zero ??

Because it would mean that there exist a 50 % probability that Y = 1 (where 1 is for example a failure of a company). When I dont have any other variables, only constant, it is similar when I predict a failure of company "with closed eyes".

Is this a confirmation that my model is correct? Is my argument correct? sounds really logical

Thank you very much !


probability in logit = e^y* / (1 + e^y*)

where y* = 0


Omega Contributor
I guess you may need to rephrase your questions.

You are correct in that a constant term in an empty model, equaling zero means it could go either way. But what the probability derived from the constant in an empty model actually represents is the proportion of 1s in the binary Y variable. So it has no discrimination, though there are 50% 1s and 50% 0s. Does that make sense to you. That is what I was eluding to. The value does not tell you anything about the fit or appropriateness of your proposed model. Side note, an empty model in multilevel modeling can tell you about the appropriate of controlling for clusters, but I don't think that is what you are writing about.

not sure what your y* = 0 means other than you may be trying to predict the "0" group.

If you have more questions, please post them. :)
its not empty model. Its a model with 5 variables (I posted a picture here)

What does my y* mean? Its the left side of equation:

y* = β0 + β1*X1 + β2*X2 + ... + βn*Kn

where y* = ln (p / (1-p))

so ln (p / (1-p)) = β0 + β1*X1 + β2*X2 + ... + βn*Kn (this is logit model)

and then p = e e^y* / (1 + e^y*)

and when I have in a model constant with coefficient 0,1042 (X1 to Xn are zero value) then

ln (p / (1-p)) = 0,1042
p = e^0,1042 / (1 + e^0,1042) = 0,526

It means that when I dont have any discriminant variables (x1....xn) then a probability that a observation will 1 (company will fail) is 52,6 %. And would be correct because you dont have any discriminant variables a therefore you cannot predict a company´s status

However in the second example (in attachment) there you cannot say that without discriminant variables (x1...xn) exists a cca 50 % probability that a observation will 1 because constatn coefficient is -2,8561

and then p = e^(-2,8561) / (1 + e^(-2,8561) = 0,054

so without discriminant variables there exists a 5,4 % probability that observation will 1 and this is not corrent, isnt it ?? Because when I horse-sense "tells me" that when I dont have any discriminant variables then must always exist 50 % probability that observation will 1 (or 50% probability that observation will 0)

So can I generaly say that in logit model, there a constant coefficient MUST BE allways close to zero ?? (regardless of any other outcome as for example "correctly predicted cases)


Omega Contributor
So you are just saying that you always want the intercept to equal 50% probability, meaning no discrimination. You write a lot but I am not getting what you want us to support or confirm? The intercept could by 50% but that still wouldn't mean the predictors have any significance. It is also pretty difficult to imagine that an intercept would be 50% probability in a model with terms because many predictors are not perfect in their delineation in the y term. The intercept would just represent the the probability of the subgroup of naughts. So if you had binary variables sex, young/old, white/black, etc. For predicting unplanned pregnancy. So the the intercept would be the probability for young white female log odds of unplanned pregnancy. You also run into the possibility of low power if there are few persons in this subgroup (curse of high dimensionality).

Yes, y* means y-hat in you post.
Maybe I have problem to express my ideas in English. I thought that a coefficient of Intercept in logit model must be close to zero. So no matter what a number has a coefficient? Is it alright when a coefficient of intercept in logit model is for example -2,8561 (as in example above) ?


Omega Contributor
The logistic regression model is giving you log odds. For simplicity's sake I am just going to talk about binary predictors. The log odds or odds from the model are all relative, more specifically they have a reference group.

So if I was trying to predict the odds ratio of cancer in asbestos versus tobacco exposed individuals, the natural log of asbestos beta coefficient would give you the greater odds of cancer for those exposed to asbestos versus the odds of tobacco exposed. This is a ratio of asbestos over tobacco. The tobacco odds of cancer are not zero. The intercept tells us what the odds are for cancer in the tobacco group. You are wanting the tobacco odds to be "0", but you are still at risk in the reference group. Does this make sense? You have many of the fundamentals figured out but seem hung-up on this part.

So this next part is completely made up, but say 5 out of 100 smokers get cancer, than the probability of cancer in the reference group (also the intercept value) is = to 0.05. Now, natural log(0.05/0.95) = -2.94. This is the intercept in a model with just exposure in it (asbestos versus tobacco). Now the natural log of the beta coefficient for asbestos is going to be the number of time greater asbestos is to tobacco. So if it is above zero than you are at greater risk in the asbestos group than the tobacco group, but the tobacco group is still at risk if the intercept is statistically significant.