# Thread: breaking data into training and testing (hold-out) sets

1. ## breaking data into training and testing (hold-out) sets

Dear Stats Forum,
I am a grad student with an introductory stats background. For my research, I'm conducting an experiment to compare the predictive capability of two types of models (physically-based simulation and a multivariate regression model.) I have 92 samples, which I split randomly: 2/3 into a training set, which I used to create the regression model, and 1/3 into a testing (hold-out) set. I am trying to predict the energy use in buildings, so the measured and predicted variables are numeric.

I would like to ask for your kind help with two questions.
1. Should I be doing some sort of test to determine if my split into two sets was OK? For example, should I make sure that both sets have similar mean values or some other test?
2. What sort of metric would you recommend to evaluate the predictive success of my model? Is R^2 generally considered inappropriate for a predictive model? Is AIC appropriate here? I am planning to use RMSE and MAPE.

Thank you in advance for any help!
Holly

2. ## Re: breaking data into training and testing (hold-out) sets

I would suggest simply the sum of the squared of the errors.
Furthermore, you can repeat the breaking many times, not just once. Or you can leave an observation out as test and then calculate the square of the errors.
Then do this for all observations and finally sum the squares of the errors.

3. ## The Following User Says Thank You to Masteras For This Useful Post:

hollywas (07-21-2012)

4. ## Re: breaking data into training and testing (hold-out) sets

Thanks very much. I understand the SSE as a metric, and the idea of repeating the break many times. However, I'm sorry, I'm a little confused about the last part. What am I calculating each time I leave an observation out?

5. ## Re: breaking data into training and testing (hold-out) sets

say you have y1:y30 and x1:x30. leave out (y2,x2). Fit the model and then use x2 to predict y2. Then calculate (hat(y2)-y2)^2. Do this for all
i=1,...30 and then sum the squares up.

6. ## The Following User Says Thank You to Masteras For This Useful Post:

hollywas (07-21-2012)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts