follow up analysis for regression

noetsi

Fortran must die
#1
I ran 20 services we supply to our customers. I controlled for what seemed obvious things like gender, public support, and severity of disability (we provide services to those that have disabilities).

The dependent variable was income gains as result of coming to our agency (or during the time they were here, we can't be certain their stay had anything to do with benefit in a causal sense). The problem is that some common services show significant income losses, that is customers saw much lower than average income gains when they got a given service. That does not make much sense to me.

What are follow up analysis I could run? My guess is that some factor I have not controlled for is at work here, and that the services are capturing this effect which is left out of the model.

As a second question, and assuming I had the right controls, what is the best suggestion for rating which services had the greatest impact (all these are dummy variables - you get the service or you do not). Beta weights are iffy with dummy variables.

I could go back and look at the raw variables, that is how much we spent on a customer rather than a dummy variable but I am not sure that the logic that you gain more as you spend more holds up.
 

noetsi

Fortran must die
#3
Almost none of these customers would have had jobs when they arrived, that is why they come here they don't have a job. But I can check to see how many came here with a job compared to those who did not have a job (we do limited services for customers who try to keep their job).
 

Miner

TS Contributor
#4
What about someone that came in on unemployment then later found a job that paid less than they made on unemployment?
 

hlsmith

Omega Contributor
#5
Yeah, if something doesn't make sense either you don't know the underlying function well enough given the context and/or you have an omitted variable(s), mediator, systematic biases, or perhaps you are controlling on a variable that is an antecedent of the DV and IV variables - opening a backdoor.
 

noetsi

Fortran must die
#6
What about someone that came in on unemployment then later found a job that paid less than they made on unemployment?
That is an interesting thought. In theory I would think that would fall under the rubric of public support, but the numbers we use are self reported and its possible that customers are counting unemployment as income.
 

noetsi

Fortran must die
#7
or perhaps you are controlling on a variable that is an antecedent of the DV and IV variables - opening a backdoor.
What does that mean? I am not familiar with that concept.

I have read a fair number of books on regression, but have not found any that deal with issues (in any detail) like moderation or antecedents or how to fix this. Do you have any suggestions?
 
Last edited:

hlsmith

Omega Contributor
#8
It is called "collider bias", it is when you have a variable in the model that is a descendent or effect of both the IV and DV variabless. X -> Z <- Y, where Z is the term. Having such a term in your model opens a door leading from the DV back to the IV, which can bias the interpretation of their actual relationship.


I was just thinking if you had a variable collected after the intervention you all provided and it is influenced by the IV as well (e.g., disability level), then you are at risk of opening the backdoor. But I don't know your context, so I don't know if it could be something like gov. support or not. An easy way to think about it is, that it is a variable effected by IV and DV and it is in your model that predicts DV. So typically common sense says don't put effects of DV in model, but it can happen some times.


IV = Independent variable.
 

Miner

TS Contributor
#9
Another possibility is someone that was working two low paying jobs and was able to go to one higher paying job that was still lower that the total of the 2 lower paying jobs.