This is a useful discussion for you:
http://stats.stackexchange.com/quest...-ridge/876#876
Hi guy.
I'm doing a econometrics project and I have been given a task to find out what factors determine the male wage.
I have been given a lot of data, with many independent variable. I'm just having trouble know which variables to choose for my regression. Many of the variables are dummy variables. I was wondering if anyone could give me some insight to know how to choose what variables to include in the model. Should I include all of them? If so the regression would have 23 variables, of which 17 variables would be dummies.
Any help would be greatly appreciated.
This is a useful discussion for you:
http://stats.stackexchange.com/quest...-ridge/876#876
Always start with a theory. You should have a working theory of what should have an impact. Testing an A Priori theory has a lower risk of false positives than a Post Hoc analysis. Next, perform a graphical exploratory data analysis. This will help identify likely candidates and identify candidates with nonlinear relationships. Best subsets regression and stepwise regression are useful tools, but should be applied wisely. This means that you should be in control of which factors are included in the model, not the method. That is, a factor should only be in the model if it makes theoretical sense.
Also, in some domains (e.g., finance/economics), you will never see those pure statistical approaches to variable selection. It is almost wholly due to theory/a priori reasoning.
So for the sake of being able to publish you should be aware of the approach that's most accepted in the literature and mimic that.
FYI, for that response variable, you might be interested in quantile regression or panel quantile regression.
Tweet |