Predicted R square

#1
Hello everyone

While I've been looking for an explanation on how good adjusted r squared explains the data, I came across a term I havn't met before - Predicted R squared.
Apparantly, I've been living all this time in a lie, and neither R squared nor adjusted R squared are good measures for the fit of the model to the population. Supposedly, predicted r squared is the measure to turn to.

However, to my understanding, even predicted R squared isn't enough - you must compare it to R square / adjusted R square. If predicted R square is much lower than them, it means that the model is overfitting the data. In other words, should I test my model on a different sample, I might come up with entirely different results.

So my question is - How small is too small? Is there any rule of thumb to, lets say, the percentage of R square that predicted R square should be?

Thanks in advance
 

Miner

TS Contributor
#2
See the following blog on this topic. This should help explain some of the differences. Regarding your final question, the answer is "It depends". It depends on how good your model needs to be able to predict accurately and precisely. In some cases a model with an R^2 pred in the 70% range is acceptable because of the coarseness of the measurements and acceptability of an approximation. In other cases, such as an algorithm for an electronics application, an R^2 pred of 99% is required.
 

noetsi

Fortran must die
#3
I also think it depends on the complexity of what you are studying. With many economic variables so many things predict Y that no small subset of X is ever going to have a high R square. That is for adjusted R square. I have never heard of predicted R square before now.
 
#4
See the following blogon this topic. This should help explain some of the differences. Regarding your final question, the answer is "It depends". It depends on how good your model needs to be able to predict accurately and precisely. In some cases a model with an R^2 pred in the 70% range is acceptable because of the coarseness of the measurements and acceptability of an approximation. In other cases, such as an algorithm for an electronics application, an R^2 pred of 99% is required.
Thank you Miner. I already read this blog and I while I understood the difference between the terms, It did not help me in understanding how high the R^2 pred should be.
The model I'm trying to create is from the field of psychology. In this field, You never reach a high R^2. Even a 50% R^2 is considered very high. So I wonder if the percetage of R^2 pred Should be even lower than what you have proposed.

I also think it depends on the complexity of what you are studying. With many economic variables so many things predict Y that no small subset of X is ever going to have a high R square. That is for adjusted R square. I have never heard of predicted R square before now.
I know. I imagine it is even more of a problem in psychology. As I mentioned, I didn't hear about this term either untill a few days ago. Apparently, while it is not a new term, it has not been widely used untill the very last few years. And I havn't found any paper in psychology that even mentions it, with the excpetion of one dissertation from 2009.
 

noetsi

Fortran must die
#5
In honesty I don't think research should be based on how high the R square is (I don't know psychology research, but its not common to stress it in the fields I know). You are testing theory in academics most commonly and the relationships either are as predicted or not. What the R square is, is not all that critical. Even the model fit is rarely discussed unless it is not significant (I don't think I have ever seen that).
 
#6
In honesty I don't think research should be based on how high the R square is (I don't know psychology research, but its not common to stress it in the fields I know). You are testing theory in academics most commonly and the relationships either are as predicted or not. What the R square is, is not all that critical. Even the model fit is rarely discussed unless it is not significant (I don't think I have ever seen that).
You are correct. Normally, R^2 won't be that much of an issue, as long as it is above a certain threshold. However, In this particular case, it is of value to me.
 

Miner

TS Contributor
#7
The model I'm trying to create is from the field of psychology. In this field, You never reach a high R^2. Even a 50% R^2 is considered very high. So I wonder if the percetage of R^2 pred Should be even lower than what you have proposed.
At those levels of R^2, I would question whether you are truly using the model to make a prediction, or whether you are simply trying to understand how each factor influences the response. Using it to make a prediction brings Andrew Gelman's weighing the mass of a feather analogy to mind.