Propensity score matching - pre-treatment outcome variable

Hi all,

maybe this rather a silly question but I have not found anything helpful. I have two groups of samples which are basically a treatment and control group.

My task at hand is to compare a certain variable between those two groups, which however is not appearing after the treatment (t) but is rather at t-1.

Is PSM a valid method to find comparable matches for this comparison given that (a) it is expected that the variable of interest does not influence the probability of being in the treatment group and (b) that is expected to have an influence?

Many thanks in advance!


Less is more. Stay pure. Stay poor.
Your wording is a little confusing! So your baseline covariate is suspected to be related to outcome but not relate to treatment assignment mechanism, correct?

If so, you don't have to control for the baseline covariate in order to get an unbiased estimate of treatment. You can if you want, but it is not necessary. Though, it may be prudent to check that the covariate is balanced across treatment groups to make sure you don't need to control for it.

Kind regards!
Given the two cases (a and b) I outlined the variable of interest might be affecting the treatment.

So when calculating the propensity scores with respect to the treatment the variable could act as a covariate in the one case and not in the other.

I am not 100% familiar with the wording but let me try to explain it using the common example of smoking of the mother and the height of the kid afterwards:

- Lets assume I have a group of mothers who smoked (treatment) and relatively large group of mothers so didn't smoke (control). Then let's assume I have information on the color of the house where the mothers life in.

I want to know whether the color of the houses differ between the treatment and control group. Hence, can I use PSM based on other factors determining whether the mother smokes or not (i.e. could become part of the treatment) such as education, income etc.. to identify suitable mothers from the control group?

In the case that
a) the color of the house has no influence on becoming part of the treatment
b) their might be an influence

Hope that makes my question understandable.
Though, it may be prudent to check that the covariate is balanced across treatment groups to make sure you don't need to control for it.

I don't have experience on the subject and the above sentence triggered a thought/question.

I want to analyze survey data with continuous dependent variable and some independent variables categorical with 2 levels-responses (yes/no). But some times 80% of the responses are "yes" and only 20% "no". Is there a common/proposed way to account/correct for this imbalance? What would be the issue if I move on with the analysis without accounting for this imbalance?

I appreciate your thoughts on this!


Less is more. Stay pure. Stay poor.
Well you probably don't need to control for the imbalance, it is common in natural studies. Balanced groups make some analyses run optimally, but imbalances wont mess up the effect estimates. if you had sparsity in your data, say model with many of these covariate which have few persons, the risk is sometimes a model wont converge and also another issue is that the standard errors can get large. So you could have a large effect size, but it may not be significant since the SEs are large. The other issue is the generalizability of the results. So if say you have 10% males, 10% of them are minorities, and 10% of them have no insurance. Now can you say that this very small handful of people represent everyone like them in the world. So can you generalize the results of just a couple of people to there peers.