For one data point, how can the variation explained by the regression line exceed the total variation?

Here the blue line represents the mean of the response variable. I drew 2 data points. For Y1, the total variation is broken down into the variation explained by the regression line, and variation not explained by the line (error).

But say in the model you had the point Y2 as shown. You can see the total variation from Y2 to the blue line. How can the variation explained by the regression for Y2 exceed the total variation from Y2 to Ybar?

My textbook says that the total variation can be broken down into variation explained by the line, and variation not explained by the line. But as I mentioned above for data point Y2, it does not make sense because SSR is greater than the total variation. To me it is like saying I have a total of 10 eggs but really have 15 eggs. Like you are just adding variation out of nowhere.

Here the blue line represents the mean of the response variable. I drew 2 data points. For Y1, the total variation is broken down into the variation explained by the regression line, and variation not explained by the line (error).

But say in the model you had the point Y2 as shown. You can see the total variation from Y2 to the blue line. How can the variation explained by the regression for Y2 exceed the total variation from Y2 to Ybar?

My textbook says that the total variation can be broken down into variation explained by the line, and variation not explained by the line. But as I mentioned above for data point Y2, it does not make sense because SSR is greater than the total variation. To me it is like saying I have a total of 10 eggs but really have 15 eggs. Like you are just adding variation out of nowhere.

Last edited: