Coefficient of determination interpretation

bgst

New Member
#1
Hey guys, this my first post in here so hello everyone:)

I have a question regarding the coefficient of determination. Wikipedia says:

An interior value such as R^2 = 0.7 may be interpreted as follows: "Seventy percent of the variation in the response variable can be explained by the explanatory variables. The remaining thirty percent can be attributed to unknown, lurking variables or inherent variability."

Shouldn't this be "Seventy percent of the squared variation in the response variable can be explained by the explanatory variables"?

I know I'm probably wrong since I've seen the same interpretation on several sites:confused:

Thanks
 

hlsmith

Not a robit
#2
I always see it phrased as 70% of the variance, over and over again. I am not the best at articulating and explaining things, since I use applied statistics, but I will give it a go and let anyone correct me.

If we think about standard deviation being the square root of variance, this forces us to think about how variance is calculated (the average of the squared differences from the Mean / adjusted number of terms), now think about simple linear regression (least squares). Don't they both seem like the same thing when you break them down?
 

bgst

New Member
#3
Variance instead of variation makes sense. I don't understand your point about standard deviation though. When I read "variation" I would expect |x_i-x| and not (x_i-x)^2 but I guess both are a measure of variation...
 
#4
It all depends on how you define variation. I always encountered variation as a measure of dispersion around a given location. The coefficient of determination is uniquely defined by the decompostion of squared differences of the dependend variable with respect to its arithmetic mean \(\bar{y}\), read by

\(
\left\|\mathbf{y} - \bar{y}\mathbf{1} \right\|_2^2 = \left\|\hat{\mathbf{y}} - \bar{y}\mathbf{1} \right\|_2^2 + \left\|\hat{\mathbf{u}} \right\|_2^2
\)

given that the vector of ones \(\mathbf{1} \in \mathbb{R}^n \) is an element of the column space with respect to the data matrix. So one eventually yields

\(
0 \leq \frac{\left\|\hat{\mathbf{y}} - \bar{y}\mathbf{1} \right\|_2^2}{\left\|\mathbf{y} - \bar{y}\mathbf{1} \right\|_2^2} \leq 1
\)

In fact, the interpretation of \( R^2 \) should be

... percent of the variance in the response variable can be explained by the explanatory variables.

since the above mentioned type of variation is commonly called variance.

I would even go so far to to say

It can be expected that ... percent of the variance in the response variable can be explained by the explanatory variables.

due to the fact that \( R^2 \) can be treated as random variable.

Hope that helped

Regards
 
Last edited: