Estimating effect of those not in the system

noetsi

No cake for spunky
#1
We have an entire population we serve. But many never apply with us. We want to know if our services help those we serve or could serve. We can tell if services impact who we do serve, but I am not sure what this tells you about those you don't serve.

I am not sure how, or if, you can analyze the impact of your services on the general population from those you actually serve. What type of literature or methods deal with thid.
 

noetsi

No cake for spunky
#3
No cake for Spunky. Dason and I are adamant on this point. It is why we built the trap in the cake depository.

We have no way to sample those who do not use our service. We don't even know who they are. And legally we can't get information on the dependent variable (income or whether they are employed) for them.

We know that those who use our more likely to be employed than those who don't. But we can't control for threats to external validity like history, demographics or the like. There are many factors which are believed to impact results such as if they receive public support or parental support that would not be possible to obtain unless we contacted them.

But in theory would you try to get a list of your population, randomly contact them, and try to see as much as you can how they compared to your population (other than services) and the information on the DV for them?
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
Yes, you need to have some type of information on people that opt not to use your services in order to generalize or transport results to them. If you had content knowledge you could assume some of this information or as @GretaGarbo mentions - randomly subsampling them would be ideal.

Very Black Swan-esque.
 

noetsi

No cake for spunky
#5
We don't, and almost certainly never will, have that information. Are there any alternatives?

If not I am not sure how you do regression other than to hope what you find would be true of those who don't get services.
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
You have to have a subsample or assumptions - otherwise you would be straight making up stuff.

So given you have either, you can then create weights to apply to your data. Perhaps, are there customers that start but quit, and these people can be match to people that are similar but didn't quit or some clever proxy.
 

noetsi

No cake for spunky
#7
What I did was to look at those eligible for a service who got it and those that were eligible for a service and did not (among our customers). So that is a binary predictor and if those that got the service did significantly better than those that did not I said the service mattered (controlling for a variety of predictors identified as important by the federal government).

Is that a reasonable approach. In theory, according to federal regulation and counselor expertise if they were eligible they needed the service/would benefit from it. Not sure if that was clever in the sense you mean. :p
 

hlsmith

Less is more. Stay pure. Stay poor.
#8
The g-formula may be beneficial here. I'll discuss tomorrow, but you use propensity scores in a standardized model. You can also search double robust estimation. I think SAS has a package causality that may make it easy for your first time.
 

noetsi

No cake for spunky
#9
Don't forget to do this. :) Also I was hoping you would comment on if my design makes any sense...that is comparing those who got services who were eligible to those who did not who were eligible rather than just comparing everyone who got a service whether they are eligible or not.
 

hlsmith

Less is more. Stay pure. Stay poor.
#12
It is a two part model. You first model the probability of getting the service (y/n) given you were eligible. Then you use those propensity scores in your outcome model of service (y/n) on income.

If you don't have overlap in the histograms or additional reservations about residual confounding in the used covariates, you can also put those covariates from the propensity score model into the outcome model as well. This is called augmented outcome model, but slightly changes your outcome model interpretation, (from marginal estimates to conditional).

Model 1: got service (y/n) = B0 + B1,...,Bk
Model 2: outcome = got service (y/n), and it uses weights from model 1 to balance background differences.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_causaltrt_examples01.htm

P.S., This is simpler than it seems and this SAS interface (procedure) makes it pretty simple. The only draw back would be if the people that did and did not get services differed based on undocumented variables that also impact the outcome of income.
 

noetsi

No cake for spunky
#13
thanks very much hlsmith. I need to learn propensity scoring anyway.

I suspect that those who did and did not get services do differ on some variables left out of the model (although there are over 50 variables in the model to serve as controls). But I know of no theory to guess at what they are and am not an expert in VR amusingly.

I suspect this is true because getting a service when eligible can lead to worse results than when being eligible and not getting a service. I can't imagine why getting a service would make results worse (well if it made them get discouraged or took to long, but that is a stretch on my part).
 

hlsmith

Less is more. Stay pure. Stay poor.
#14
What are the services? Is it a dose-response thing or say one-time training?

Is it well-defined and not that variable?
 

noetsi

No cake for spunky
#15
A huge number of different services that are measured many different ways. Anything from a medical procedure to training, to a successful placement by a vendor, to having your van fixed.

That is part of why I only measure counts of services because I think cost distorts the reality of the impact on a customer. But that is imperfect to say the least. Getting a bus ride is one service and so is a major van modification or medical procedure. Unfortunately I lack the expertise to create a system to address this (I am an analyst not a counselor and my appeals for help from the counselors who are experts has gone ignored. The data analysis I do is not seen as important in my organization - it is why I want to move on).

We are a social work type organization not a quantitative one. I am the remnant of a subunit that was supposed to change this, but did not.
 

hlsmith

Less is more. Stay pure. Stay poor.
#16
The approach I described gets a little mottled when you have lots of treatment groups.

Side comment, perhaps win them over by making beautiful/interpretable graphs. Funny enough that is all people ever really want. They don't want to have things and concepts explained that make a difference, they just want a picture.
 

noetsi

No cake for spunky
#17
They will take whatever I give them. I am trying to be sure the way I am doing it is valid.

I am still not sure measuring the impact of a service by measure those who got it and those who did not, when both are eligible, is logical. I am not entirely sure in what statistical literature to look this up.