I have difficulty understanding the definition of conditional variance as:

var(X|Y) = E[(X-E[X|Y])**2 | Y].

Given the definition of variance of a random variable X:

var(X) = E[(X-E[X])**2],

and substituting X with X|Y, you get:

var(X|Y) = E[(X|Y-E[X|Y])**2]. How can one get from this equation to the definition of the conditional variance given above? I would appreciate any help on this. Thank you!

When you write the conditional expectation, we will put the conditional part at the end of the operator. The reason why we write something like is because we can just regard the conditional distribution as a new distribution. Maybe just one more property to emphasize here:

Thanks BGM. But I don't seem to be able to get it. Assuming that this is a notation convention, why the convention doesn't apply to the second term E[X|Y]? I couldn't use the law of iterated expectation that you mentioned here either. Could you please elaborate? Sorry I am new to the subject matter and am studying Dimitri Bertsekas's book - introduction to probability. Thanks!

The reason why you define the conditional variance like this is to the same as we define the ordinary variance

So I will say it is a notation issue as the concept is like what you have said, a "substitution" (although not really a substitution).

And, of course we know that in general and are different. The variance is calculating the average squared difference with respect to its own mean, so that why we use .

If you carefully expand out the squared term, we have

whereas

so indeed they are different.

You may need to elaborate more on which part you stuck in iterated expectation if you still have some questions here.

Thanks again! I understood your point about E[X|Y] in the definition. If you clarify one point to me, I will get the whole thing. In your first expansion, how did you calculate the second term? Meaning, how did you get from E[2XE[X|Y]|Y] to 2E[X|Y]E[X|Y]? Thanks!

where is a constant. We pulls out the constant. This is actually can be also be interpreted more generally as the tower property of conditional expectation.

is -measurable, and in this sense you can just treat it as like a "constant" inside the conditional expectation . But you should note that itself is a random variable but not a constant.