Droping variables that are not statistically significant

noetsi

Fortran must die
#1
I have large samples if it matters (5000 or more cases).

I am confused what you should or should not do when you have a model and some of the variables are not statistically significant. I do not generally have theory to guid me in the original creation of variables and so I use the ones that make sense to me. But commonly some of the variables do not matter statistically and/or have very small slope coefficients that are not substantively important.
 

hlsmith

Not a robit
#2
Is the purpose inference or predict? If latter, marginally insignificant variables can help, but if you have a huge realiztion then they really might not be of interest.
 

hlsmith

Not a robit
#4
Well then when modeling you only include terms you want to test an inference for along with confounding variables, which hinder the estimates between the terms of interest in the outcome. It is that simple, though you have to know what variables may be confounders and the general theory is if you are unsure if a variable is a confound but have some suspicion, put it in the model to block the backdoor path to be cautious.
 

noetsi

Fortran must die
#5
That makes sense although I thought the goal was to develop a parsimonious model.

One thing I am doing, and yes this is strange, is trying to find confounds, that is control variables that should be in a model. We work for a federal agency who is developing a control model - that is a model with only controls. But we have limited confidence in their statistics, they are not known for being data savvy, so we are running our own models to capture what they leave out.