In statistical learning, is the learning function a random variable or a constant?


Consider a predictor x and a response Y, where the true relationship between them is given by Y = f(x) + e. e is a random error term.

A training data set (x_1, Y_1), ..., (x_n, Y_n) is collected and from this an estimated learning function f_hat is fitted. Then Y_hat = f_hat(x) becomes an estimate for the true response Y.

My question is about the derivation of the error of this estimate. This derivation shows that the total error can be divided into a reducible and irreducible component and can be summarized as ...

E(Y-Y_hat)^2 = E(f(x) - f_hat(x))^2 + var(e).

For the reducible component, in some derivations I've seen, they simply write E(f(x) - f_hat(x))^2 = (f(x) - f_hat(x))^2.

This treats the the fitted/estimated learning function f_hat(x) as constant and not a random variable. My question is why is this the case?

The estimated learning function f_hat is constructed from a training data set which is a random sample since each response Y_i is a random variable. Therefore if you collected a different sample, you should get a different estimate for f_hat.

Shouldn't f_hat be a random variable then?

Appreciate any insight.
Last edited:


Active Member
We take a random sample and gather data on it (height, weight, occupation, residence, etc.). These are the x's or inputs to the model. They are fixed because we know what they are. The parameters of the model (the beta coefficients) are also fixed, but unknown. We use statistics to study the parameters of a population / model (and characterize variability). There is random unexplained variation from sample to sample which is captured in the error term.
Last edited: