predictor 2 on the causal pathway between predictor 1 and outcome


This is re-post of a something I put in the 'statistics' forum a while ago. There were 75 views but no responses, so I thought I try again in the 'regression' forum. If anyone has any advise on this, I'd greatly appreciate it. Here's my previous post:


I'm putting together a model which has the purpose of predicting an outcome using a given set of predictor variables; i.e. the main purpose of the model is NOT to assess the individual contributions of each predictor.

It's a longitudinal multilevel model, but I think the essence of my problem is about the complicated relations between the outcome (y) and two predictors (x1, x2).

x1 is well known to impact on y via various pathways. One of the paths is via x2; that is, x2 is on ONE of the causal paths between x1 and y. Additionally x2 may also impact of y independently of x1. I'm interested in estimating y given both x1 and x2.

My question is:
If I regress y on x1 and x2, can I then use that model to predict y given x1 and x2, or will the model parameters produce biased estimates?

(I realise you can use path analysis to look at the relations, but the longitudinal multilevel nature of the data makes this difficult. Also, as above, the main purpose of the model is to make 'correct' predictions of y).

Any advise would be greatly appreciated.


Thanks to all those who have taken a look at this post. I'm not sure whether there have been no responses because it's obvious the model will make reasonable estimates, obvious it will make biased estimates, a silly question, or no one knows.

My feeling is that it will make reasonable estimates with the parameter for x1 capturing effects on y not operating through x2, and the parameter for x2 capturing the effects of x1 on y via x2 as well as x2's own effects on y.

Here's a quote from Kirkwood and Sterne (2003) that's actually about confounding but that supports the above: "Note that a variable that is part of the causal chain leading from E to D is not a confounder. That is, if E affects C, which in turn affects D, then we should not adjust for the effect of C in our analysis of the E-D association (unless we wish to estimate the effect of E on D which is not caused by the E-C association)".

In my case: E is x1, C is x2, and D is y. I want to capture the combined effects of x1 and x2 on y and the final bracketed text in the above quote seems to suggest this is fine.

If anyone has any thoughts on this I'd greatly appreciate hearing them.



Omega Contributor
I am guessing you have looked at mediation analysis. Within it there are ways to look at:

controlled direct effects
natural direct effects
natural indirect effects

Not sure how this plays out in a multilevel model, you mean clusters right?

Thanks a lot for the response. That's right - I'd been trying to find the answer by looking at this. I understand how to test for mediation (both informally and formally) in a single level model, but I'm not sure of what to do in a multilevel model. The model is longitudinal, with level 1 being measurement occasion (i.e. time) and level two a country. At least informally (i.e. without formal statistical tests of significance (e.g. Sobell-Goodman)) and theoretically it meets the following criteria:

"A variable may be considered a mediator to the extent to which it carries the influence of a given independent variable (IV) to a given dependent variable (DV). Generally speaking, mediation can be said to occur when (1) the IV significantly affects the mediator, (2) the IV significantly affects the DV in the absence of the mediator, (3) the mediator has a significant unique effect on the DV, and (4) the effect of the IV on the DV shrinks upon the addition of the mediator to the model."

So, I think I'm OK, but it's surprisingly hard to find the answer. As I say, as long as ultimately using the two predictors together gives reasonable estimates of the predictor (and correlations between observed and predictor values suggest this is the case) I'm OK.



Omega Contributor
I guess another question would be if the relationship is an interaction instead of mediation. If the former, you may need an interaction term. I bet you can find your answer if you keep looking, may have to go to peer-reviewed journals.

Also, it may be interesting to explore the relationship with more basic procedures just looking at single time points.
Thanks again. A good question - at least theoretically it should be mediation rather than interaction. I've tested some interaction models and this also seems to suggest mediation.

A good idea to dive more deeply into journals. Also interesting idea to try single time points - it's an unbalanced panel but I could block of particular time-periods.