Hi
I am a very beginner so sorry for the very simple questions.
I have first a very general question regarding the linear regression.
This is my understanding of it so far put in a simple example:
I have a set of data points x_i for 1 <= i <= 10, for instance the age of ten people. Then I want to predict the income of these people and assume there is a linear relation between the age and the income (income here would be y_i for <= i <= 10). Now in my model, I have to find a suitable function f - the "response function" - which, for instance minimizes some notion of "error", which is the difference between theh output values of my response function and the real values of the income.
Now my first question: How do I know about any real values? As I remember from statistics, one usually only has estimates of the real values, for instance an estimate for some expected value, but not the real values themselves - so does the notion of "error", which relies on the notion of real values, in this context makes sense at all?
My second question: I have read on wikipedia, that regression can be used for prediction. Somewhere else I have read though that residuals cannot be used for predication when given new data points, which is why for instance cross validation is used. What exactly is true?
Thanks a lot for any help!
I am a very beginner so sorry for the very simple questions.
I have first a very general question regarding the linear regression.
This is my understanding of it so far put in a simple example:
I have a set of data points x_i for 1 <= i <= 10, for instance the age of ten people. Then I want to predict the income of these people and assume there is a linear relation between the age and the income (income here would be y_i for <= i <= 10). Now in my model, I have to find a suitable function f - the "response function" - which, for instance minimizes some notion of "error", which is the difference between theh output values of my response function and the real values of the income.
Now my first question: How do I know about any real values? As I remember from statistics, one usually only has estimates of the real values, for instance an estimate for some expected value, but not the real values themselves - so does the notion of "error", which relies on the notion of real values, in this context makes sense at all?
My second question: I have read on wikipedia, that regression can be used for prediction. Somewhere else I have read though that residuals cannot be used for predication when given new data points, which is why for instance cross validation is used. What exactly is true?
Thanks a lot for any help!