Should I include this variable or not?

#1
Hello,

My dependent binary variable is Turnout in 2010 elections (yes/no) and I'm performing logit analysis using a cross-sectional dataset.

There's a variable though called Past vote in 2005 elections i.e. (Abstain,Conservative,Labout,etc) and, as we expected,its high correlated with Turnout in 2005 and significant in our regression.

Is it right to put it in the model? When we include it the constant term of the logit regression has a p-value of 0.800 and when we exclude it, 0.000.

Cheers
 

noetsi

No cake for spunky
#2
Generally the constant is not seen as that important in logistic regression, so I am not sure why you are concerned with it. What you should consider is if the predictor you are talking about 1) is statistically signficant in terms of its Wald test (and substantively what the effect size is for it) and 2) how it effects model fit. You would compare the model fit with and without it (since its nested you can do a test of the change in its negative 2 log liklihood or its deviance or pearson value and their corresponding DF).

Or you could use the Hosmer Lemshow goodness of fit test, although I am not sure if there is a statistical test comparing these values from model to model.
 

noetsi

No cake for spunky
#4
Whether the constant is statistically signficant or not does not tell you whether you should include a variable or not in a model. The methods I described earlier (or theory) is what you base keeping a variable. The constant is not a consideration in this decision.

I may not be understanding what you are asking. If you are asking should you exclude the constant term because it is or is not signficant than normally you keep it in the model regardless.
 
#5
@noetsi -- The constant should almost always be included in your models. I don't know why it would be less important for a logit/probit, and I've certainly never heard that before. Do you have a cite?

Putting turnout 2005 in the model is a bigger question. So what is the goal of this model? Is it prediction? If so, then sure include that variable because it will help you predict your outcome that much better. But is your goal on your other hand to understand how some variable (like college completion) affects turnout. If this is the case, then including turnout 2005 should bias your college completion variable by reducing it's size.
 

noetsi

No cake for spunky
#6
If you read Paul Allison for example he notes that the constant is nearly never of theoretical interest in logit. Certain methods require supression of the constant (although its been too long since I read them to comment on this). In ordinal regression he actually comments that you should "not bother" to interpret them.

I never quite understood this, since it seems useful to analyze dummy variables, but he is pretty commited on this view.
 

noetsi

No cake for spunky
#7
If you read Paul Allison for example he notes that the constant is nearly never of theoretical interest in logit. Certain methods require supression of the constant (although its been too long since I read them to comment on this). In ordinal regression he actually comments that you should "not bother" to interpret them.

I never quite understood this, since it seems useful to analyze dummy variables, but he is pretty commited on this view.
 
#8
So it is difficult to interpret but that doesn't mean that you should exclude it.

Imagine a regression...

y = intercept + b1*x1 + b2*x2 + b3*x3

the intercept tells us the predicted value of y given that x1-x3 equal zero. Depending on the situation this may or may not mean anything substantive. If x1 and x3 can simultaneously equal zero then I might be interested in the intercept, but if they can't then I might not care.

That begin said, I find it hard to believe that the intercept would truely equal zero in any model. Therefore, I would never exclude it, even if it's not something I can 8interpret in a substantive fashion.
 

noetsi

No cake for spunky
#9
I would never exclude it either. My original point was that you won't decide to add or not add a variable based on the intercept value - which the OP seemed to be suggesting to me.
 

noetsi

No cake for spunky
#11
I see! Then, I suppose we agree! I just want the OP to know that the intercept should (in most cases) be included.
Personally I agree. But there are some, Allison would certainly be one, who while not arguing it should be discarded, assign it little if any value.

Certain techniques repress the intercept - why exactly I don't know. Likely more confusing for those new to statistics is that centering (around a group or grand mean) changes the meaning of the intercept at least in multileval analysis and I assume regression generally.

None of which has anything to do with the OP's comments :)
 
#12
Whether it has SUBSTANTIVE meaning (it always has a literal meaning) or not depends on the model. If I want to predict a country's probability of fighting a war using a single dummy variable equal to 1 for democracy like...

pr(war) = B0 + B1*Democracy

Then B0 has a clear interpretation as the probably of war for non-democracies. If on the other hand I include a continuous measure of GDP in addition to the dummy for democracy like...

pr(war) = B0 + B1*Democracy + B2*GDP

Then the interpretation of B0 while still clear (the pr of war for non-democracies with gdp = 0) no longer has a substantive meaning since no country has a gdp of zero. Again, this does not mean you should exclude it.

No author should say that it either does or does not have a substantive meaning.