help in setting up model with confounding variables


if I have a multiple regression model with two IVs such as DV ~ IV1 + IV2 + IV1:IV2, what is the appropriate way to include a confounding variables (COV) in the model?

DV ~ IV1 + IV2 + IV1:IV2 + COV

or do I need to specify all interactions with the COV as well?

DV ~ IV1 + IV2 + COV + IV1:COV + IV2:COV + IV1:IV2 + IV1:IV2:COV



Less is more. Stay pure. Stay poor.
What program are you using???????!!!

i have never used any COV. Typically you just put the base terms in the model along with their product to model the multiplicative interaction.
@hlsmith, sorry that was supposed to be R-like syntax. DV is dependent variable, IV independent variable and COV the confounding variable.

say my data frame (df) looks like:
43 x a 21
11 x b 32
53 y a 32
44 y b 12

then the R syntax:
lm(DV ~ IV1 + IV2 + IV1:IV2 + COV, data=df)
means main effects of IV1, IV2 and COV and interaction between IV1 and IV2.

I'm relatively new to statistics, could you please spell out what you mean by "put the base terms in the model along with their product to model the multiplicative interaction."? I suppose base terms would be IV1 + IV2 + COV right? Is the product the 2-way and 3-way interactions? And would you include interactions between the confounding variable and predictors of interest in the model?


Less is more. Stay pure. Stay poor.

I totally botched my above reply. I must have been distracted and saw your use of ":" and got derailed.

If you think you have a potential confounder you just add that term to your model, it is usually that easy. Like in your first model. However you seem to have a suspected multiplicative interaction and confounder. I have not dealt with this personally. I think you should draw out your model as a direct acyclic graphic and see if you have any content to direct your model. Does the confounder (W in lieu of COV) interact with only one of the terms or both? Perhaps Andrew Hayes or Tyler VanderWeele have written on this topic.

You could play the game of running a bunch of models to try and tease it out, but purest may call that data mining and undirected analyses. Of note, what if the confounder has a positive relationship with one term and negative with the other, will effects be masked or hidden?

A simulation may help with what to model or interpret results, but that would require knowing what the relationships are. If you find something, please update this post, I would be interested in the solution.
Last edited:


Less is more. Stay pure. Stay poor.
Thinking a little bit about this. Perhaps an option, if the confounder is binary, is to run two models. Since one way to address for a confounders effect is to stratify data.

y-hat = x1 + x2 + x1*x2, where w = 1.

y-hat = x1 + x2 + x1*x2, where w = 0.