Forward selection of non significant variables


In SAS Enterprise Miner, I trained a logistic regression with forward selection and AIC criteria.

I grouped rare levels for categorical variables. One of these variables was selected by the algorithm but the coefficients of all categories were statistically not significant (different from 0).

Why the algorithm would select such a variable if all categories are not significant ? Does someone know a scientific explanation ?

The test level is .05 and the corresponding p-values for the categories are around .93.

Thank you for your help,


Ambassador to the humans
Do you have the history of when variables were added in. It's possible for variables to look very important but drop to non significance after other variables are added into the model. Forward selection doesn't drop variables once they're added so that might be the case. Is there a reason you're using a stepwise procedure in the first place though?


Less is more. Stay pure. Stay poor.
Run a traditional model using the selected variables and take a look at the type I and then type III effects for the variable and report back what you see.

PS, I can have a dummy variable, say insurance type. I can enter the cat var into the model and one group is highly predictive of say death, why shouldnt the model keep the full cat var around? Unless I specifically dummy code myself each one as a yes/no. Try that as well if you are unsatisfied the cat var as a whole is sticking around.