Controlling for a Variable that directly affects both independent and dependent var

Sorry this may be a very basic question but I was wondering what statistical method I can use to control for this.

Let's say I am measuring if X has a strong correlation to leading to Y.

I measure X, and I measure Y.

However, lets say that we have another variable Z: The higher the value of X, the higher the value of Z, but Z can also contribute to increasing Y.

How would I control for Z in this hypothetical study?

For example, lets say I am measuring concentration of a bacteria in the gut and seeing if it is related to a quantifiable scale in quality of life. Lets say that being overweight also directly increases the scores in the quantifiable scale.

Lets also say that the more concentration of bacteria in the gut you have, the more overweight you are as well. (So concentration of bacteria leads to both increases in the independent and dependent variable) How do I control for being overweight?

Thank you very much!


New Member
Re: Controlling for a Variable that directly affects both independent and dependent v

What is typically done in this case is adding the confounding variable to the model. So you measure weight and add this to the model. Under certain assumptions, this allows you to look for the relation with the independent variable, controlling for the confounding variable. Assuming linear regression (but the principle is the same with any approach), you might get:

[math]Y= a + bX + cZ + \epsilon[/math]

Where Y is your dependent, X is your independent, Z is your confounder, a, b and c are coefficients and [math]\epsilon[/math] is the error term. Each coefficient gives you the relation between the corresponding variable and the dependent, holding the other variable constant. So mathematically, there is no difference between the two variables. But the results tell you nothing about the causal interpretation.

For instance, when we speak about causation, there is a difference between a confounding and a mediating variable, which you seems to hint at in your last paragraph. If the influence of the independent variable is hypothesized to go via another variable, then you might want to look at mediation analysis. See here for a nice blog explaining the difference.
Last edited: