beta_0 and beta_1 are the true values of the, uh "coefficients" I'll call them. Their values are unknown (but we know they exist)--that's why we must estimate them from the observed data. On the other hand, b_0 and b_1 are the estimated values of the "coefficients".
It's like the difference between the true mean (mu) and the population mean (x-bar)
So, if your line of best fit is y = 8 + 5x, then 8 is b_0 and 5 is b_1. If we knew beta_0 and beta_1, we wouldn't need to estimate.
Without accounting for random error, your model would say that everyone with the same X_i value would therefore have the same Y_i value. For (improbable) example, say you had a model for weight (Y_i) as a function of height (X_i). Without the random error, everyone who was the same height would have to also be the same weight.
Now epsilon_i is in the true model (the one with the beta's). Note that in the fitted model, there is no epsilon_i
It's simply Y_i,hat = b_0 + b_1 * X_i
You can't estimate random error ("this guy who is 180 cm will weigh 77 kg, while this guy who is 180 cm will weigh 91 kg,...").
In the fitted model, everyone with the same X_i would have the same Y_i,hat. But would have different observed Y_i, so they would each have their own e_i
e_i is based on your estimate of the model, while epsilon_i is the true random error value.
1) But why do we need to ESTIMATE βo and β1? Just do a least-square line of best fit on the scattered plot, then we have a line, so we know the values of βo and β1. Why are βo and β1 unknown? This is what I don't seem to understand.
I can see the difference between population mean and sample mean. So I guess βo and β1 are the true POPULATION parameters?? But what is the "population" in this case?
2) I think I have a pretty clear concept and picture in my mind of what a "residual" is. I can see a scattered plot with lots of points and a fitted line. The residual for each point is just the (signed) vertical distance or vertical deviations between each point and the fitted line.
However, I still don't understand what a random error (ε) is. What is the meaning of it? How can we calculate the value of ε? And how can it be displayed graphically?
1) OK, so Yhat = b0 + b1*X is the sample regression equation based on our observed data points (observed sample) and
E(Y) = β0 + β1*X is the population regression equation.
For example, if we have height v.s. age (Y v.s. X). The population is ALL the data points from the ENTIRE population and we can IMAGINE a population line of best fit going through all those data points, but we will never actually know what it is (and we will never know the exact values of β0 and β1). And the sample would be, say, 10 data points, so the scattered plot will have 10 points, and the sample line of best fit is based on bo and b1. Right?
Yeah, you can't get β0 and β1 because you don't have the scatter plot for the entire population.
Acknowledging the random error ε is necessary, or else the model would imply that everyone with the same X value would also have the same Y value. But each individual (with the same X value) is different. So they each have their own ε_i. But in a (good) model, the ε_i are such that their expected value is 0.
Thanks for clearing my doubts! I have 2 more questions:
3) "A simple linear model has the form
(i) Y= β0 + β1*X + ε
(ii) E(Y) = β0 + β1*X"
To me, equivalent means "if and only if".
I can see how (i) implies (ii), but how does (ii) imply (i)? (how can we go from E(Y) to Y?)
4) "E(Y) = β0 + β1*X
Y hat = bo+b1*X
where bo and b1 are estimators of β0 and β1, respectively.
Then Y hat is clearly an estimator of E(Y)"
(i) Why is Y hat clearly an estimator of E(Y)?
(ii) Also, if Y hat is an estimator of E(Y), shouldn't the notation be
where the hat is taken over the the whole E(Y)? Using the notation Y hat as an estimator of E(Y) doesn't seem to be consistent with the common usage of "hat", a hat above something usually means that it is estimating the thing under the hat, but here we have Y hat instead of "[E(Y)] hat".
On the first question, yo'ure sort of asking is there any other models for Y with that expectation for Y. And in the largest possible context yes. In the context of multiple regression basically no.
On the second question,
(i) In statistics the word "estimator" is far to nebulous to ever reject anything being an estimator -)~ [basically an estimator is any function of the data that maps to the domain of the parameter to be estimated]. My problem with the authors language there is that I might say "Clearly 0 is an estimator for E(Y)". It is a bad estimator, but nevertheless an estimator. So the author might use stronger language and then there would be something to care about. There are a number of properties associated with that particular estimator, but he didn't mention any so there is nothing to elaborate on. (eg it is an unbiased estimator of E(Y) when b0 and b1 are unbiased estimators of their respective quantities which they turn out to be).
(ii) You are basically correct. But nobody does that. There are *deep* traditions in regression notation that are respected. In a multi-level model context they might use mu for the expectation and then mu hat becomes the typical notation. Which becomes wierd because Y hat and mu hat refer to the same thing even though Y is a random variable and mu is a location parameter.
The reason this is tolerable is because Y is technically observed so Y hat has a clear interpretation (once you are introduced to it). Y need not be estimated ... you saw it! But its expectation surely does need to be estimated.