Basic questions for an important project.


No cake for spunky
For the first time in forever I am running a regression that might have real impact. It is on what impact services we spend a lot of money on have. So I wanted to ask about possible issue that I should concern myself with.

The dependent variable is income for customers (there are thousands the entire population of interest). There are about 40 predictors the federal government requires. The variable I am interested in is service impact. I do the following (note the data is restricted by law so I can not send it. The regression code is really long because of the numerous variables most of which are CLASS variables (categorical). I was not going to post it for that reason, but can if that helps.

First, I determine if someone was eligible to get a service. The logic is that the impact of a variable should be on those that professional counselors think will be helped by the service and are allowed to get it (as a public agency we have rules what we can provide to who). So I look for impact specifically on those eligible not on the population of customers as a whole? Is that the right way to do it (I have never seen it done this way).

I have not tested the regression assumptions yet, but in honesty I don't think they can matter. The predictors are dummy variables so non-linearity would not apply. We have the whole population so normality and hetero is not an issue. This is a population not a sample so the effect size is real.

Second, I run descriptives to see which percent of those eligible got any such service. My predictor of interest is a dummy variable, one got a paid service of the type being analyzed and 0 (the reference level) got no paid services.

This is one result.


Regression reflects the slope, usage the percent who got one service, med inc a descriptive showing what was the median income for those who were eligible for a service. I interpret this as the impact of the service controlling for about 40 variables such as gender , is that reasonable? I don't really know how to control explain the negative numbers. I doubt really getting a service causes you to do less well. So possibly there are factors not in the model explaining income.

I would really appreciate comments on this, ways to improve it..... it is very important we get it right. Services provided to customers will depend on it. I am sure this is too simple to be correct and would be happy to change it (assuming I know the method suggested). One concern I have, one of the predictors which the federal agency I work for requires, is a measure of if you get one of three very broad types of service. I am not sure this is impacting these slopes (robbing of them of information).


No cake for spunky
Ok I ran a model and there is a regression coefficient associated with services. There are about 50 variables to control for things like gender, age, education etc. There is absolutely zero theory to identify key agents of change here.

So how do I know, how can I be more sure anyhow, that the slope is the true effect of service or just capturing the effect of a variable left out of the model? I know I can not prove causality here, I am just trying to be more sure of my results because it would have real impact and I don't claim to be an expert in this. :p I already listed what the model was on other threads. Are there methods or approaches that can make findings more certain? Obviously I never came across any or I would try them.

I don't have data over time so I can't run fixed effects models at different points in time.