I am a very beginner so sorry for the very simple questions.

I have first a very general question regarding the linear regression.

This is my understanding of it so far put in a simple example:

I have a set of data points x_i for 1 <= i <= 10, for instance the age of ten people. Then I want to predict the income of these people and assume there is a linear relation between the age and the income (income here would be y_i for <= i <= 10). Now in my model, I have to find a suitable function f - the "response function" - which, for instance minimizes some notion of "error", which is the difference between theh output values of my response function and the real values of the income.

Now my first question: How do I know about any real values? As I remember from statistics, one usually only has estimates of the real values, for instance an estimate for some expected value, but not the real values themselves - so does the notion of "error", which relies on the notion of real values, in this context makes sense at all?

My second question: I have read on wikipedia, that regression can be used for prediction. Somewhere else I have read though that residuals cannot be used for predication when given new data points, which is why for instance cross validation is used. What exactly is true?

Thanks a lot for any help!