Conditional expectation

Hi everyone,
I've usually seen the conditional mean of X_n given X_1,X_2,….X_n-1 expressed as shown on p. 5 of:
or in Appendix B of Greene's Econometrics textbook.

In a Journal of Finance article, I've recently come across this alternative way of expressing the conditional mean:

E(X_n|X_1,X_2) = E(X_n) + {Cov[X_n - E(X_n), X_2|X_1]/Var(X_2|X_1)}*[X_2 - E(X_2|X_1)]

All variables are normal.

Could anyone help me understand how they got there?

Many thanks in advance.



TS Contributor
First of all you need the standard result of conditional distribution:

Intuition: It will be easy to see that

[math] E[X_n|X_2 = x_2] = E[X_n] + \frac {Cov[X_n, X_2]} {Var[X_2]} (x_2 - E[X_2]) [/math]

which is a standard result seen in regression.

Since this results holds [math] \forall x_2 \in \mathbb{R} [/math], and thus

[math] E[X_n|X_2] = E[X_n] + \frac {Cov[X_n, X_2]} {Var[X_2]} (X_2 - E[X_2])

and it will be tempting to put the conditional [math] X_1 [/math] inside to reach the conclusion. To verify this we can do the following calculation.

To shorten the notation, first we write

[math] \begin{bmatrix} X_1 \\ X_2 \\ X_n \end{bmatrix} \sim \mathcal{N}
\left(\begin{bmatrix} \mu_1 \\ \mu_2 \\ \mu_n \end{bmatrix},
\begin{bmatrix} \sigma_1^2 & \sigma_{12} & \sigma_{1n} \\
\sigma_{12} & \sigma_2^2 & \sigma_{2n} \\
\sigma_{1n} & \sigma_{2n} & \sigma_n^2 \end{bmatrix} \right)[/math]

Then following the formula,

[math] E[X_n|X_1 = x_1, X_2 = x_2] [/math]

[math] = \mu_n + \begin{bmatrix} \sigma_{1n} & \sigma_{2n} \end{bmatrix}
\begin{bmatrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{12} & \sigma_2^2 \end{bmatrix}^{-1}
\begin{bmatrix} x_1 - \mu_1 \\ x_2 - \mu_2 \end{bmatrix} [/math]

[math] = \mu_n + \begin{bmatrix} \sigma_{1n} & \sigma_{2n} \end{bmatrix}
\frac {1} {\sigma_1^2\sigma_2^2 - \sigma_{12}^2}\begin{bmatrix} \sigma_2^2 & -\sigma_{12} \\ -\sigma_{12} & \sigma_1^2 \end{bmatrix}\begin{bmatrix} x_1 - \mu_1 \\ x_2 - \mu_2 \end{bmatrix} [/math]

[math] = \mu_n + \frac {(\sigma_{1n}\sigma_2^2 - \sigma_{2n}\sigma_{12})(x_1 - \mu_1) + (\sigma_{2n}\sigma_1^2 - \sigma_{1n}\sigma_{12})(x_2 - \mu_2)} {\sigma_1^2\sigma_2^2 - \sigma_{12}^2} [/math]

On the other hand, the covariance matrix of [math] X_2, X_n|X_1 [/math] is

[math] \begin{bmatrix} \sigma_2^2 & \sigma_{2n} \\ \sigma_{2n} & \sigma_n^2 \end{bmatrix} -
\begin{bmatrix} \sigma_{12} \\ \sigma_{1n} \end{bmatrix}
\begin{bmatrix} \sigma_1^2 \end{bmatrix}^{-1}
\begin{bmatrix} \sigma_{12} & \sigma_{1n} \end{bmatrix} [/math]

[math] = \begin{bmatrix}
\displaystyle \sigma_2^2 - \frac {\sigma_{12}^2} {\sigma_1^2} &
\displaystyle \sigma_{2n} - \frac {\sigma_{12}\sigma_{1n}} {\sigma_1^2} \\
\displaystyle \sigma_{2n} - \frac {\sigma_{1n}\sigma_{12}} {\sigma_1^2} &
\displaystyle \sigma_n^2 - \frac {\sigma_{1n}^2}{\sigma_1^2} \end{bmatrix} [/math]

which means that

[math] Cov[X_2, X_n|X_1] = \sigma_{2n} - \frac {\sigma_{12}\sigma_{1n}} {\sigma_1^2} [/math]

Now consider

[math] \frac {Cov[X_n - \mu_n, X_2|X_1]} {Var[X_2|X_1]} \times (X_2 - E[X_2|X_1]) [/math]

[math] = \frac {\displaystyle \sigma_{2n} - \frac {\sigma_{12}\sigma_{1n}} {\sigma_1^2}}
{\displaystyle \sigma_2^2 - \frac {\sigma_{12}^2} {\sigma_1^2}}
\times \left\{X_2 - \left[\mu_2 + \frac {\sigma_{12}} {\sigma_1^2}(X_1 - \mu_1) \right]\right\} [/math]

[math] = \frac {\sigma_{2n}\sigma_1^2 - \sigma_{12}\sigma_{1n}} {\sigma_1^2\sigma_2^2 - \sigma_{12}^2}
\times \left[(X_2 - \mu_2) - \frac {\sigma_{12}} {\sigma_1^2}(X_1 - \mu_1) \right][/math]

[math] = \frac {\displaystyle \left(\frac {\sigma_{12}^2\sigma_{1n}} {\sigma_1^2} - \sigma_{2n}\sigma_{12}\right)(X_1 - \mu_1) + (\sigma_{2n}\sigma_1^2 - \sigma_{12}\sigma_{1n})(X_2 - \mu_2)} {\sigma_1^2\sigma_2^2 - \sigma_{12}^2} [/math]


[math] E[X_n|X_1] = \mu_n + \frac {\sigma_{1n}} {\sigma_1^2} (X_1 - \mu_1) [/math]

Combining together,

[math] E[X_n|X_1] + \frac {Cov[X_n - \mu_n, X_2|X_1]} {Var[X_2|X_1]} \times (X_2 - E[X_2|X_1]) [/math]

[math] = \mu_n + \frac {(\sigma_{1n}\sigma_2^2 - \sigma_{2n}\sigma_{12})(X_1 - \mu_1) + (\sigma_{2n}\sigma_1^2 - \sigma_{12}\sigma_{1n})(X_2 - \mu_2)} {\sigma_1^2\sigma_2^2 - \sigma_{12}^2} [/math]

which is the same as the expression calculated above.
thank you so much. This is incredibly helpful.
Is this result also valid for the case in which X2 is a single variable, while X1 is a subset of two (or more) variables?

Thanks again.



TS Contributor
The answer is Yes. The above calculation is a little bit tedious, but it help you to verify the simplest case.

The reason is that we can regard the conditional probability measure as another new measure; and the good thing is that the conditional joint distribution of

[math] X_2, X_n|\mathcal{F}(\mathbf{X}) [/math]

is still bivariate normal. So you can still apply the formula for the bivariate normal, but in a conditional fashion.


TS Contributor
It is even more tedious when you include one more term inside, so I have not check the detail :p, but the overall form should be alright.

The reason why I have not mentioned the measure stuff in the very first place, is to let you do some elementary calculation fundamentally and get the intuition behind. I know that one may not be so easy to understand the stuff.

The key thing is that, you just regard the conditional joint pdf of [math] X_2, X_n|X_1 = x_1[/math]

as the joint pdf of a pair of new variable [math] X_2^*, X_n^* [/math]

i.e. define

[math] f_{X_2^*, X_n^*}(x_2, x_n) = f_{X_2, X_n|X_1 = x_1}(x_2, x_n|x_1) [/math]

(the [math] x_1 [/math] somehow just act as a parameter)

The induced pair of random variables [math] X_2^*, X_n^* [/math] will jointly follow the distribution induced by this joint pdf. And we know that this is a bivariate normal which share the same properties from other bivariate normal as well.

That's why we can do the generalized stuff, as the multivariate normal is closed under the conditional operation :)