Explaining Variation in the Simple Linear Regression Model

#1
For one data point, how can the variation explained by the regression line exceed the total variation?

Here the blue line represents the mean of the response variable. I drew 2 data points. For Y1, the total variation is broken down into the variation explained by the regression line, and variation not explained by the line (error).

But say in the model you had the point Y2 as shown. You can see the total variation from Y2 to the blue line. How can the variation explained by the regression for Y2 exceed the total variation from Y2 to Ybar?

My textbook says that the total variation can be broken down into variation explained by the line, and variation not explained by the line. But as I mentioned above for data point Y2, it does not make sense because SSR is greater than the total variation. To me it is like saying I have a total of 10 eggs but really have 15 eggs. Like you are just adding variation out of nowhere.
 
Last edited:

Dason

Ambassador to the humans
#2
Because the relationships your book talks about are for the entire dataset. Not that the first S in those quantities stands for "Sum". So we don't expect those relationships to hold for a single point but they do hold when considering all the data.
 
#3
But each data point is a part of the total Sum or SS. I would think though that the idea should apply to each data point as well as the whole data set.

Considering one data point, no other sort of variation should exceed the total variation for that single data point. Just does not seem to make sense to me.
 
Last edited:

Dason

Ambassador to the humans
#4
Work out a small example. Use three data points and see if you can get things to break the way you claim they should. You'll see that the relationship hold when looking at the sum. It doesn't need to hold for an arbitrary value though.