1. ## Cross validation

Hello,
I have a basic question about cross validation and regression model. what is the final regression model made from? I mean the final model that we report for example in our research; the model that is made from the whole data set or the model made from the training set??!! and what is the best cross validation type for 32 observations. I am really confused about what the point of cross validation is...!
Regards

2. ## Re: Cross validation

You should provide more details about your study. Since you are talking about a final model, I assume you are running a step-wise regression. In such a regression, the final model is the model with the best indicators of model fit, determined blindly by the computer algorithm. So it consists of one dependent variable and a number of selected independent variables that are in the highest association with that dependent variable.

Since you are talking about training and test sets, I assume you want to fit a model and then test its prediction merit. In this case, the model is created based on the training set (and not the test set). What is the model here? A dependent and a number of independent variables, plus their beta values and standard errors. Now you will need to apply this model to the test set and see how effectively can it predict the dependent variable, based on the values assigned to the independent variables.

Cross validation is a method to determine the predictive value of the model, without any test sets. Therefore, you can use cross validation to optimize the model further (over the training set), before applying it to the test set.

Again, you should provide more details regarding your study, variables, etc.

3. ## The Following User Says Thank You to victorxstc For This Useful Post:

Bahareh (10-01-2015)

4. ## Re: Cross validation

I have 32 observations and 27 independent variables. I ran multiple linear regression analysis with "forward"selection method and I chose the model with high "adjusted R2". this model consists of only 2 independent variables out of 27 initial variables. now, I am asked to do cross-validation to assess the predictive ability of this model. to do this, again I ran another regression analysis, this time only with those 2 variables (Enter method) and selected the option: calculate PRESS (sum of squares of prediction erros) in that each fitted value for PRESS is obtained from the remaining n-1 observations, then using the fitted regression function to obtain the predicted value for the ith observation. alongside with PRESS, the software also calculates Predicted R2. my PRESS value is: 13.31 and R2(pred) is 56.57%. is this the prediction ability of my regression model?

5. ## Re: Cross validation

yes it seems to be the prediction R-squared of your model.

Your way of selecting the best model based on the adjusted r-squared is an accepted way, although AIC and other parameters are recommended as well. However, the step-wise method (which includes "forward-selection" etc) is not the best method for determining the model. It is strongly recommended to avoid stepwise regression (the one you conducted). Instead, try building the model on the basis of theory (subjective commonsense + literature) and beta values. As I said, computer blindly selects the model for you, and this "blindly" can get very serious sometimes. Though I agree this stepwise method is very commonly practiced.

Plus, 32 observations is not a good sample size for evaluating 27 independent variables.

6. ## The Following User Says Thank You to victorxstc For This Useful Post:

Bahareh (10-01-2015)

7. ## Re: Cross validation

Thanks a lot victor, you've helped many times...

8. ## Re: Cross validation

My pleasure Bahareh

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts