http://data.library.virginia.edu/is-r-squared-useless/

http://data.library.virginia.edu/is-r-squared-useless/

The take home message is as long as you know the limitations and interpretations of R^2, it is a fine parameter.

What is a good measure of fit for linear regression? I know what they are for logistic regression not linear regression. The truth is I don't pay much attention to overall model fit when I run models, I focus on the effect size of variables as long as the various model indicators show the model meets the minimum requirement. Of which r square is not one.

I find AIC much more useful for what I do than R square. Beyond the limitations mentioned above it is rarely clear what a "good" r square should be. You can look in the literature, if you have access to it which I commonly don't, and if it exists, which commonly it does not in my field. But other than that too many things can influence it to know whether what you found or not is a good result.

My own theory, I have not seen this addressed, is that for complex phenomenon you will have lower R square than less complex simply because so many things can influence the results with complex realities. So you will be less likely to have useful variables in the model.

What is a good way to measure fit in a linear regression (or a non-linear regression) model?

I explain the importance of confidence intervals over effect size by saying I can predict something as five plus or minus a million with great confidence

You use the delta to talk about percent additional explained variance in the outcome. Is this wrong?

But is the delta-R^2 more meaningful? It can be high or low depending on how it is with the multicolinearity, even if the model beta parameters are the same (in two hypothetical) layout. Remember that the multicolinearity is a problem in the sample, not the population.

The R^2 is quite meaningless. But what influences it is not meaningless - the standard deviation in the residuals and the "spread" in the X-values. They matter.

By the way, it is not strange that you will get the same R^2 with regression y on x as with x on y, since R^2 is the correlation (r) squared and:

r = Cov(x,y)/(sd(x)*sd(y))

where Cov() is the covariance and sd() is the standard deviation. They stay the same.

I think the biggest problem with R-sq(adj) is that it leads to overfitting. It can easily happen that one gets a fine R-sq value on the training set and abysmal performance on a test-set . The R-sq(pred) is alleviating this somewhat, but it still has a pretty weak link to the prediction performance imho.

IIRC BIC has a proven link to the test set performance, so, theoretically it would be a better measure (so, AIC also?)

regards

I really wonder who is publishing without reporting the whole suite of statistics: p-values, effect sizes, fit indices, random effect correlations, confidence intervals, etc. What are these journals and are they indexed?