+ Reply to Thread
Results 1 to 4 of 4

Thread: breaking data into training and testing (hold-out) sets

  1. #1
    Points: 88, Level: 1
    Level completed: 76%, Points required for next Level: 12

    Posts
    7
    Thanks
    5
    Thanked 0 Times in 0 Posts

    breaking data into training and testing (hold-out) sets



    Dear Stats Forum,
    I am a grad student with an introductory stats background. For my research, I'm conducting an experiment to compare the predictive capability of two types of models (physically-based simulation and a multivariate regression model.) I have 92 samples, which I split randomly: 2/3 into a training set, which I used to create the regression model, and 1/3 into a testing (hold-out) set. I am trying to predict the energy use in buildings, so the measured and predicted variables are numeric.

    I would like to ask for your kind help with two questions.
    1. Should I be doing some sort of test to determine if my split into two sets was OK? For example, should I make sure that both sets have similar mean values or some other test?
    2. What sort of metric would you recommend to evaluate the predictive success of my model? Is R^2 generally considered inappropriate for a predictive model? Is AIC appropriate here? I am planning to use RMSE and MAPE.

    Thank you in advance for any help!
    Holly

  2. #2
    TS Contributor
    Points: 4,946, Level: 44
    Level completed: 98%, Points required for next Level: 4

    Location
    Nottingham
    Posts
    680
    Thanks
    0
    Thanked 27 Times in 27 Posts

    Re: breaking data into training and testing (hold-out) sets

    I would suggest simply the sum of the squared of the errors.
    Furthermore, you can repeat the breaking many times, not just once. Or you can leave an observation out as test and then calculate the square of the errors.
    Then do this for all observations and finally sum the squares of the errors.

  3. The Following User Says Thank You to Masteras For This Useful Post:

    hollywas (07-21-2012)

  4. #3
    Points: 88, Level: 1
    Level completed: 76%, Points required for next Level: 12

    Posts
    7
    Thanks
    5
    Thanked 0 Times in 0 Posts

    Re: breaking data into training and testing (hold-out) sets

    Thanks very much. I understand the SSE as a metric, and the idea of repeating the break many times. However, I'm sorry, I'm a little confused about the last part. What am I calculating each time I leave an observation out?

  5. #4
    TS Contributor
    Points: 4,946, Level: 44
    Level completed: 98%, Points required for next Level: 4

    Location
    Nottingham
    Posts
    680
    Thanks
    0
    Thanked 27 Times in 27 Posts

    Re: breaking data into training and testing (hold-out) sets


    say you have y1:y30 and x1:x30. leave out (y2,x2). Fit the model and then use x2 to predict y2. Then calculate (hat(y2)-y2)^2. Do this for all
    i=1,...30 and then sum the squares up.

  6. The Following User Says Thank You to Masteras For This Useful Post:

    hollywas (07-21-2012)

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats