- Thread starter kingwinner
- Start date

It's like the difference between the true mean (mu) and the population mean (x-bar)

So, if your line of best fit is y = 8 + 5x, then 8 is b_0 and 5 is b_1. If we knew beta_0 and beta_1, we wouldn't need to estimate.

Is there a greek letter font I could use?

Without accounting for random error, your model would say that everyone with the same X_i value would therefore have the same Y_i value. For (improbable) example, say you had a model for weight (Y_i) as a function of height (X_i). Without the random error, everyone who was the same height would have to also be the same weight.

Now epsilon_i is in the

It's simply Y_i,hat = b_0 + b_1 * X_i

You can't estimate random error ("this guy who is 180 cm will weigh 77 kg, while this guy who is 180 cm will weigh 91 kg,...").

In the fitted model, everyone with the same X_i would have the same Y_i,hat. But would have different observed Y_i, so they would each have their own e_i

e_i is based on your estimate of the model, while epsilon_i is the true random error value.

1) But why do we need to ESTIMATE βo and β1? Just do a least-square line of best fit on the scattered plot, then we have a line, so we know the values of βo and β1. Why are βo and β1 unknown? This is what I don't seem to understand.

I can see the difference between population mean and sample mean. So I guess βo and β1 are the true POPULATION parameters?? But what is the "population" in this case?

2) I think I have a pretty clear concept and picture in my mind of what a "residual" is. I can see a scattered plot with lots of points and a fitted line. The residual for each point is just the (signed) vertical distance or vertical deviations between each point and the fitted line.

However, I still don't understand what a random error (ε) is. What is the meaning of it? How can we calculate the value of ε? And how can it be displayed graphically?

[for greek letters, just copy and paste β, ε]

Thank you for answering!

I can see the difference between population mean and sample mean. So I guess βo and β1 are the true POPULATION parameters?? But what is the "population" in this case?

2) I think I have a pretty clear concept and picture in my mind of what a "residual" is. I can see a scattered plot with lots of points and a fitted line. The residual for each point is just the (signed) vertical distance or vertical deviations between each point and the fitted line.

However, I still don't understand what a random error (ε) is. What is the meaning of it? How can we calculate the value of ε? And how can it be displayed graphically?

[for greek letters, just copy and paste β, ε]

Thank you for answering!

Last edited:

1)

I can see the difference between population mean and sample mean. So I guess βo and β1 are the true POPULATION parameters?? But what is the "population" in this case?

Thank you for answering!

I can see the difference between population mean and sample mean. So I guess βo and β1 are the true POPULATION parameters?? But what is the "population" in this case?

Thank you for answering!

Okay, remember, there is a population regression function just like there is a population mean.

Also, remember, X is traditionally assumed to be

Thus, when we sample we have:

Yhat = b0 + b1*X

this is a

For more details on your question see Chapter 2 pages 26-37 in :

Gugariati, D. N. (1988).

I think he does a good job of explaining what you're asking.

Unfortunately, I don't have that textbook.

1) OK, so Yhat = b0 + b1*X is the sample regression equation based on our observed data points (observed sample) and

E(Y) = β0 + β1*X is the population regression equation.

For example, if we have height v.s. age (Y v.s. X). The population is ALL the data points from the ENTIRE population and we can IMAGINE a population line of best fit going through all those data points, but we will never actually know what it is (and we will never know the exact values of β0 and β1). And the sample would be, say, 10 data points, so the scattered plot will have 10 points, and the sample line of best fit is based on bo and b1. Right?

1) OK, so Yhat = b0 + b1*X is the sample regression equation based on our observed data points (observed sample) and

E(Y) = β0 + β1*X is the population regression equation.

For example, if we have height v.s. age (Y v.s. X). The population is ALL the data points from the ENTIRE population and we can IMAGINE a population line of best fit going through all those data points, but we will never actually know what it is (and we will never know the exact values of β0 and β1). And the sample would be, say, 10 data points, so the scattered plot will have 10 points, and the sample line of best fit is based on bo and b1. Right?

Last edited:

Acknowledging the random error ε is necessary, or else the model would imply that everyone with the same X value would also have the same Y value. But each individual (with the same X value) is different. So they each have their own ε_i. But in a (good) model, the ε_i are such that their expected value is 0.

3)

(i) Y= β0 + β1*X + ε

or EQUIVALENTLY,

(ii) E(Y) = β0 + β1*X"

To me, equivalent means "if and only if".

I can see how (i) implies (ii), but how does (ii) imply (i)? (how can we go from E(Y) to Y?)

4)

Y hat = bo+b1*X

where bo and b1 are estimators of β0 and β1, respectively.

Then Y hat is clearly an estimator of E(Y)"

(i) Why is Y hat clearly an estimator of E(Y)?

(ii) Also, if Y hat is an estimator of E(Y), shouldn't the notation be

^

[E(Y)]

where the hat is taken over the the whole E(Y)? Using the notation Y hat as an estimator of E(Y) doesn't seem to be consistent with the common usage of "hat", a hat above something usually means that it is estimating the thing under the hat, but here we have Y hat instead of "[E(Y)] hat".

On the first question, yo'ure sort of asking is there any other models for Y with that expectation for Y. And in the largest possible context yes. In the context of multiple regression basically no.

On the second question,

(i) In statistics the word "estimator" is far to nebulous to ever reject anything being an estimator -)~ [basically an estimator is any function of the data that maps to the domain of the parameter to be estimated]. My problem with the authors language there is that I might say "Clearly 0 is an estimator for E(Y)". It is a bad estimator, but nevertheless an estimator. So the author might use stronger language and then there would be something to care about. There are a number of properties associated with that particular estimator, but he didn't mention any so there is nothing to elaborate on. (eg it is an unbiased estimator of E(Y) when b0 and b1 are unbiased estimators of their respective quantities which they turn out to be).

(ii) You are basically correct. But nobody does that. There are *deep* traditions in regression notation that are respected. In a multi-level model context they might use mu for the expectation and then mu hat becomes the typical notation. Which becomes wierd because Y hat and mu hat refer to the same thing even though Y is a random variable and mu is a location parameter.

The reason this is tolerable is because Y is technically observed so Y hat has a clear interpretation (once you are introduced to it). Y need not be estimated ... you saw it! But its expectation surely does need to be estimated.

On the second question,

(i) In statistics the word "estimator" is far to nebulous to ever reject anything being an estimator -)~ [basically an estimator is any function of the data that maps to the domain of the parameter to be estimated]. My problem with the authors language there is that I might say "Clearly 0 is an estimator for E(Y)". It is a bad estimator, but nevertheless an estimator. So the author might use stronger language and then there would be something to care about. There are a number of properties associated with that particular estimator, but he didn't mention any so there is nothing to elaborate on. (eg it is an unbiased estimator of E(Y) when b0 and b1 are unbiased estimators of their respective quantities which they turn out to be).

(ii) You are basically correct. But nobody does that. There are *deep* traditions in regression notation that are respected. In a multi-level model context they might use mu for the expectation and then mu hat becomes the typical notation. Which becomes wierd because Y hat and mu hat refer to the same thing even though Y is a random variable and mu is a location parameter.

The reason this is tolerable is because Y is technically observed so Y hat has a clear interpretation (once you are introduced to it). Y need not be estimated ... you saw it! But its expectation surely does need to be estimated.

Last edited:

I have another question....

For simple linear regression model, we typically write Y= β0 + β1*X + ε as

E(Y) = β0 + β1*X

However, I have seen occasionally that Y= β0 + β1*X + ε is written as

E(Y|X) = β0 + β1*X which looks a bit inconsistent to the above...

How come?? And I don't think E(Y|X) and E(Y) can ever be equal.

Thanks for the helpful comments!

However, I have seen occasionally that Y= β0 + β1*X + ε is written as

E(Y|X) = β0 + β1*X which looks a bit inconsistent to the above...

How come?? And I don't think E(Y|X) and E(Y) can ever be equal.

However, I have seen occasionally that Y= β0 + β1*X + ε is written as

E(Y|X) = β0 + β1*X which looks a bit inconsistent to the above...

How come?? And I don't think E(Y|X) and E(Y) can ever be equal.

Because in the context of linear regression, X the independent variable, is traditionally assumed to be

Mkay.

Because in the context of linear regression, X the independent variable, is traditionally assumed to be **fixed**.

Mkay.

Mkay.

[I am trying to figure out why...when X is fixed, they seem to use E(Y) and E(Y|X=x) interchangably...]

Thanks!

If X is FIXED, does this ALWAYS imply that X and Y are INDEPENDENT and E(Y) = E(Y|X=x)?? Why or why not?

[I am trying to figure out why...when X is fixed, they seem to use E(Y) and E(Y|X=x) interchangably...]

Thanks!

[I am trying to figure out why...when X is fixed, they seem to use E(Y) and E(Y|X=x) interchangably...]

Thanks!

The appropriate form is E[Y|X] not E[Y] because the expected value of Y is conditioned on the value of X.

The appropriate form is E[Y|X] not E[Y] because the expected value of Y is conditioned on the value of X.

=> E(Y)= β0 + β1*X + E(ε)

=> E(Y)= β0 + β1*X (since E(ε)=0 by assumption)

So we have E(Y), although it is a function of X.

On the other hand, how can we see that E(Y|X)= β0 + β1*X?