Thanks jpkelley for replying so quickly.

I only have 400 rows of data.

Your response made me wonder if perhaps I'm not asking the right questions.

"How many variables should I include?" is a question I cannot answer clearly. To build my model, I consulted an expert on what variables would be of interest, took that list and did backwards stepwise regression. I know stepwise is now frowned upon but I could not find out how to do lasso or ridge regression for logit in Stata.

Let me illustrate the background of my research:

I am interested in what would affect the probability of an employee to be graded A by a company (it is their internal rating system.). I ended up with variables like IQ, Highest Educational Attainment, Personality test scores, arithmetic scores, grammar scores, seminar activity, etc etc etc. So actually, bulk of those variables are categorical; if you are familiar with the 16 PF test, I used each category score as a variable.

Again forgive me for I only have an undergraduate understanding of regression. As I understand it, marginal effects are computed from the point on the graph using the means of all the covariates. I assumed thus that including more significant variables would reduce the residual. Including as much significant variables as I could seemed to support my thoughts as it improved the fit and the p-values of my model.

Is this approach actually wrong? Should I actually subdivide my model into smaller models?