# Exclude intercept in logistic regression?

#### jockem

##### New Member
I would like some help on the subject whether to include the intercept term or not in a logistic regression.
I have read that some people say that you should never exclude the intercept since this will cause bias in your estimates.
Even so I just can’t figure out whether to include it or not, any input in this matter will be gladly appreciated.

I am modelling switching of some sort, so my Y is coded 1 (switching occurs) or 0 (no switch)

My IV, X are: value,gender,age,return
Value are in $terms Gender is binary Age is in years Return is in %, so if return=0.85 this means a -15% return And 1.15 a +15% return Estimates from model with intercept Intercept -70.2337 Value: 0.0000 Gender 0.1140 Age: 0.0073 Return: 67.2139 Estimates from model without intercept Value: 0.0000 Gender: 0.1126 Age: 0.0061 Return: -2.9409 Note the different estimates for the return variable. If I understand correctly the intercept estimate displays the log odds of switching when all other IV are zero, in this case (0$,female,0 years old, -100% return). Right?

So for the intercept model the P(switch)=exp(-70.2)/(1+exp(-70.2))=0
Without intercept P(switch)=exp(0)/(1+exp(0))=0.5

In any case a person with attributes like that is unrealistic. In my case, value>0, Age>0 and Return>0 for entire sample.

My intuition says that a person with return of -100% is more likely to switch than probability of 0 (P(switch) with intercept), so the probability of 0.5 (P(switch) without intercept) is in my belief more accurate, Even though theory states that you should never exclude the intercept term.

Any suggestions?

And I can’t figure out why the Return estimate is large and positive in the first regression and small and negative in the second. How can the intercept effect the slope coeff for this variable?

#### Dason

##### Ambassador to the humans
This is just an illustration using simple linear regression but the same type of idea crosses over to logistic regression. The intercept allows us to capture what's actually going on with the data. Without it the sign of our slope might change and it really doesn't mean what you think it should mean anymore.

#### noetsi

##### Fortran must die
An intercept is also critical to analyze dummy variables since it reflects the value of the reference level.

#### Jake

I don't know, Dason. In general I agree that it usually works out better to model the intercept. But I'm not completely satisfied with the argument implied by your example. What the example basically illustrates is something like: if the best-fitting line for a model that estimates the intercept looks very different from the best-fitting line for a model that just forces the intercept to be 0, then the latter model will do a much worse job describing the data than the former. But that seems tautological; obviously if the data really is best described by a line with an intercept far different from 0, then a line with an intercept of 0 won't fit the data well. However, for other datasets where the model predictions do not differ much between these two types of models, it's not at all clear that we should still be obligated to model the intercept. What it all really comes down to for me is the issue of bias vs. variance. We would want to model the intercept for datasets like the one you gave for exactly the same reason that we would want to include a quadratic term when the data looks roughly parabolic. But we don't conclude from the latter situation that we ought to always include polynomial terms in our models. Rather, we make these decisions on more of a case by case basis, relying partly on what the dataset at hand looks like and partly on what theory/experience says is a likely functional form of the data. I don't see why we should approach the intercept any differently. We may in some cases have sensible reasons for believing that the dependent variable ought to be 0 when the predictors are 0.

#### Dason

##### Ambassador to the humans
I'm not sure. I haven't put as much thought into this but the intercept really is a different beast to me. It helps out in ways we don't even realize. Plus even if we fully expect the response to be 0 when the predictors are 0 I don't think that is a good reason to not fit the intercept unless we know that we are fitting exactly the correct model. I talk about this somewhat here: http://www.statisticspedia.com/articles/basic-statistics/regression-without-an-intercept-yay-or-nay/

I'm not saying that you should just not think about it and put the intercept in - just that unless you have VERY good reasons to exclude it you should keep it in there.

#### noetsi

##### Fortran must die
However, for other datasets where the model predictions do not differ much between these two types of models, it's not at all clear that we should still be obligated to model the intercept.
It would seem in this case the intercept would be not substantively meaningful anyway (no statistically different from 0) in which case you would not have an intercept. So what would the point be of getting rid of the intercept, which already did not exist.

#### Jake

I do recall your previous example regarding this issue, and it was actually pondering that example that got me thinking about this issue a little more deeply. I didn't settle on a well-formed view on the matter until after that discussion died out, so in truth I've sort of been waiting for another good opportunity to come along for me to talk about it.

In terms of practical recommendations, I think we are pretty much on the same page: it is almost always a better idea to model the intercept than not to. I suppose my point is a bit more of an abstract methodological one. In my view, each and every term should be included in the model only and entirely to the extent that we believe including that term results in a better model of the data. As it happens, this logic usually compels us to model the intercept. But in a deep sense, I think there is nothing special about the intercept term. This is why I am uncomfortable with your argument. It apparently designates the intercept as special, but in my opinion, there is nothing sacred about any parameter. Crucially, there may exist scenarios, albeit few, where forcing the intercept to be 0 (or some other constant) is sensible.

I guess my view can be summed up by paraphrasing George Orwell here. In theory, all parameters are equal... but in practice, some parameters are more equal than others.

To the OP: it seems to me that the situation you are describing is basically analogous to the example that Dason illustrated. Although you may have prior theoretical reasons for believing that the parameter estimates from the model with 0 intercept make more sense, the data themselves just seem to be screaming out loud that this is not a good model overall. I'm sure if you did a likelihood ratio test you'd find that the model with an intercept provides a much closer fit. Part of the problem appears to be that the base rate of switching is less than 50% (that is, there are more 0s than 1s). So if you force the probability of switching to be 50% when return is 0, then the only direction for the predicted values to go as return increases is downward if the function is going to best fit the data, given the constraint you've imposed.

#### jockem

##### New Member
Thank you for your inputs Dason and Jake. I think I got a hang of it now...

Part of the problem appears to be that the base rate of switching is less than 50% (that is, there are more 0s than 1s). So if you force the probability of switching to be 50% when return is 0, then the only direction for the predicted values to go as return increases is downward
Wise point, yes you're right there are far more 0s than 1s, so I understand your point. And yes the LR shows that the model with intercept provides a far better fit.

The reason that this is a bit confusing for me is that whether I let the intercept varry or be zero, the point X=(0,0,0,0) does not make any sense in my data. So I might try to recode the Xs into groups where for instance age*=0 when 0<age<25 and Return*=0 when the return is within certain levels. So that X*=(0,0,0,0) make sense, however I guess the result will be the same except for the magnitude of the intercept (prob. lower).

Anyways thanks once again.