Conditional Variance


I have difficulty understanding the definition of conditional variance as:

var(X|Y) = E[(X-E[X|Y])**2 | Y].

Given the definition of variance of a random variable X:

var(X) = E[(X-E[X])**2],

and substituting X with X|Y, you get:

var(X|Y) = E[(X|Y-E[X|Y])**2]. How can one get from this equation to the definition of the conditional variance given above? I would appreciate any help on this. Thank you!
Last edited:


TS Contributor
It seems that just a notation issue.

When you write the conditional expectation, we will put the conditional part at the end of the operator. The reason why we write something like \( X|Y \) is because we can just regard the conditional distribution as a new distribution. Maybe just one more property to emphasize here:

\( E[E[X|Y]|Y] = E[X|Y] \)
Thanks BGM. But I don't seem to be able to get it. Assuming that this is a notation convention, why the convention doesn't apply to the second term E[X|Y]? I couldn't use the law of iterated expectation that you mentioned here either. Could you please elaborate? Sorry I am new to the subject matter and am studying Dimitri Bertsekas's book - introduction to probability. Thanks!


TS Contributor
The reason why you define the conditional variance like this is to the same as we define the ordinary variance

\( Var[Z] = E[(Z - E[Z])^2] \)

So I will say it is a notation issue as the concept is like what you have said, a "substitution" (although not really a substitution).

And, of course we know that in general \( E[X|Y] \) and \( E[X] \) are different. The variance is calculating the average squared difference with respect to its own mean, so that why we use \( E[X|Y] \).

If you carefully expand out the squared term, we have

\( E[(X - E[X|Y])^2] \)

\( = E[X^2 - 2XE[X|Y] + E[X|Y]^2|Y] \)

\( = E[X^2|Y] - 2E[X|Y]E[X|Y] + E[X|Y]^2 \)

\( = E[X^2|Y] - E[X|Y]^2 \)


\( E[(X - E[X])^2] \)

\( = E[X^2 - 2XE[X] + E[X]^2|Y] \)

\( = E[X^2|Y] - 2E[X]E[X|Y] + E[X]^2 \)

so indeed they are different.

You may need to elaborate more on which part you stuck in iterated expectation if you still have some questions here.
Thanks again! I understood your point about E[X|Y] in the definition. If you clarify one point to me, I will get the whole thing. In your first expansion, how did you calculate the second term? Meaning, how did you get from E[2XE[X|Y]|Y] to 2E[X|Y]E[X|Y]? Thanks!


TS Contributor
It is similar to something like

\( E[cX] = cE[X] \)

where \( c \) is a constant. We pulls out the constant. This is actually can be also be interpreted more generally as the tower property of conditional expectation.

\( E[X|Y] \) is \( \sigma(Y) \)-measurable, and in this sense you can just treat it as like a "constant" inside the conditional expectation \( E[\cdot|Y] \). But you should note that \( E[X|Y] \) itself is a random variable but not a constant.