Omitted variable bias

#1
Dear all,

I've learned that Omitted Variable Bias happens when the missing variable is correlated with an independent variable and has causal relationship with the dependent variable.

I've also learned that Confounding has the same criteria as above.

My question is: is confounding a subset of omitted variable bias?? In other words, is OVB broader than confounding?
Or are they the same thing??

Thank you in advance.
 

hlsmith

Not a robit
#2
I have always considered them as the same thing. Many fields use statistics (medicine, economic, etc.), I believe OVB might have come out of the social scientists. Thus, you will find multiple terms for the same concepts.
 

hlsmith

Not a robit
#4
The only confusion i come across is whether beyond the confounding triangle, if the spurious DAG is also included, where there is no arrow between exposure and outcome. I believe it does.
 
#5
I asked a teacher whether OVB and confounding are the same thing, his reply was as follows:
"no, confounding and OV bias are different. OV bias occurs when you have left out a variable you should include, confounding occurs when you have multiple variables that capture the same idea in a regression".

What do you think??
 

hlsmith

Not a robit
#6
Your instructor seems wrong. Given I don't use the OV term, but leaving out a variable you should have included seems like confounding to me. If you are looking at X -> Y and you have a confounder, if you leave it out you will have a biased estimate of the effect of X -> Y. If you have two independent variables associated with Y but not associated with each other: X -> Y <- Z, not controlling for Z will not bias the estimate for X -> Y. So the variable of interest has to be confounded to be of interest. You could get nuanced and say, well what if it was a moderated-mediated effect, well they should say that is what they are talking about.

If two variables are both saying the same thing they are considered collinear, and omitting one or including both won't bias the estimated relationship of X -> Y. Inclusion of both terms just increases the confidence interval via larger standard errors. So that isn't confounding.

Wikipedia on OVB: "... an independent variable that is correlated with both the dependent variable and one or more of the included independent variables."

I would have to imagine your instructor is likely misrepresenting both issues. What type of class is this and at what level?
 

ondansetron

TS Contributor
#7
I think confounding and OVB at least can be the same, if they are not always the same.

The first example I learned about a fractional factorial experimental design, the guy teaching it (PhD in Stats from a good program along with decades of consulting), he explained a fractional factorial as making the grid of all possible combinations of treatments to assign participants to, but then leaving out certain treatment groups for some reason (maybe cost or lack of interest in a specific combination). He said the downside is you may end up with confounded treatment effects due to omitting those portions of the factorial.

Example (what I recall of the top of my head, gist of it, he used more factor-level combos):
Treatments A, B, and C
I could give: A, B , C, AB, AC, BC, ABC.

Let's say I leave out all of the 2-way interactions (AB, AC, BC) for cost and only assign and measure A, B ,C, and ABC. I can get an estimate for the 3-way interaction, however, the 2 way interactions (such as the impact of A on Y depending on the level of B, independent of C) are confounded within the 3-way interaction (the effect of A on Y depends on the specific B-C combination, for example, could be thought of as the effect of A on Y depends on the effect of B on Y for a particular setting of C). The ABC estimate may appear one way in this design, but if we included/measured AB, BC, and AC, we would then have the estimate of ABC after adjusting for the confounding that occurs by omitting AB, AC, and BC.

I think looking at the definition of confounding (obscuring [informal term] the relationship of one variable with the dependent variable because of unmeasured/improperly-measured variables), this fits well with omitted variable bias. I would suspect though, that to be more rigorous, one should show that there is a calculable bias in the estimators (i.e. beta coefficient(s)) to show that omitting the (confounding) variable causes the expected value of the estimator to differ from the true value of the parameter. Pretty sure OVB and confounding are the same (at least in many cases). I vote with @hlsmith. [And, I suppose, the omitted variable is correlated with an included variable and obviously is related to the dependent variable. This would show up doing the calculation to determine unbiasedness.]
 
Last edited:

hlsmith

Not a robit
#8
Interesting perspective @ondansetron - a couple of comments:

1.) Improperly-measured variables would not be confounding, but information bias (measurement error) - and yes could lead to bias.

2.) Content knowledge is important, since there is a small chance that the confounder (common cause of X and Y), may have a directional impact on one variable and an opposite directional response on the other variable that perfectly masks the confounder and leads to no bias.

3.) I wonder what the three variable interaction model would look like if it was graphed? Not sure.
 

ondansetron

TS Contributor
#9
Interesting perspective @ondansetron - a couple of comments:

1.) Improperly-measured variables would not be confounding, but information bias (measurement error) - and yes could lead to bias.

3.) I wonder what the three variable interaction model would look like if it was graphed? Not sure.
1) I disagree here, and Harrell makes a case for this when he points out that if you dichotomize a continuous independent variable, you can reduce confounding by also including the original continuous variable.
3) I assuming linearity, it wouldn't look different than a two variable interaction because the cross product of two could be conceptualized as a single variable to hold fixed (at different combos). I.e. the slope relating Y and X1 changes based on the specific combination of X2 and X3. If they're all quantitative, it's technically a third order relation ship between Y and (X1X2X3) where the most obvious case is when X1=X2=X3 we get X1^3.
 

hlsmith

Not a robit
#10
@ondansetron - Can you provide Harrel reference? Variable X-continuous doesnt cause X-binary variable, it is the same variable reformatted. Thus wouldnt be a confounder. I am not even sure it would bias the relationship since it is just another less informative estimate looking at a differ identification of the relation. Is it a bad idea to categorize a continuous variable, well yeah most of the time unless the true underlying data generating process includes a true composite distribution. It can also result in type 1 error at the hypothesis level, but if the question was what was the relationship between X-binary and Y, well it is an unbiased estimate for that not so good of a question.