# Standard Deviation or Response / Prediction - Linear Regression

#### dzeni

Hi,

I'm hoping someone out there can help me with what is probably a very simple question.

For a stats assignment, I've created a linear regression model to predict the sale price of houses using the capital value of the houses to predict the sales price. We were given a large amount of data and got Minitab to generate the appropriate statistics / residual analysis etc.

The final part of the question asks us to evaluate our model (ie: how good is it at making predictions). The linear regression equation is:

Price = 10708 + 0.992 Capital Value

And the associated Minitab data is:
Predictor Coef SE Coef T P
Constant 10708 5203 2.06 0.041
Capital Value 0.99234 0.03232 30.71 0.000

S = 28223.4 R-Sq = 82.8% R-Sq(adj) = 82.7%

We have a nice high R-Sq value and a reasonably low S value (given that we are looking at house prices which range from under $100,000 - over$300,000 in value).

So far so good. I can say that the slope is positive and has a value of 0.992. The question then asks if the standard deviation of a prediction is likely to be smaller than the standard deviation of the responses.

I have no idea what this means.

I googled and found the following for "standard deviation of prediction"
The standard deviations of the predicted values of the estimated regression function depend on the standard deviation of the random errors in the data, the experimental design used to collect the data and fit the model, and the values of the predictor variables used to obtain the predicted values. These standard deviations are not simple quantities that can be read off of the output summarizing the fit of the model, but they can often be obtained from the software used to fit the model.

From the question, it does not look like they are asking us to find either value, they just want us to say if the standard deviation of the prediction would be smaller than the standard deviation of the responses. The problem is that I can't seem to find a definition of "standard deviation of response" compared with "standard deviation of prediction".

Given that our model predicts events based on a straight line, would we not expect the standard deviation of our prediction to always be smaller than the actual observed standard deviation ???

I'm so lost and am hoping that someone out there can clarify this for me

Looking forward to any (and all) suggestions.

Thanks

Dzeni

#### statboy314

Hi, to answer your original question I have a few suggestions. Your r^2 value is decent at 80% which tells you that 80% of the variation in the data can be explained by the relationship between the two variables. The other way that you can check if it's a good fit is to view a residual plot. Plot the residuals against the x's. There should be random scatter in this plot. Any patterns can tell you that the model isn't good to make predictions from. Not sure if this helps.

#### dzeni

Hi Statboy,

Thanks for your speedy response. I know that the r^2 value is nice and high . The residuals look "OK'ish" - as in they are fine for anything less than \$325,000 after which problems start to emerge with all the points being below the "zero" line.

My major problem is that I don't know what the question means when it asks if it is likely that standard deviation of the prediction will be smaller than the standard deviation of the response. I need to make a statement based on the idea that my model has fairly good predictive value. As in if the predictive value is good, then the standard deviation of the prediction would (or would not be) smaller than the response ... because ...

Which is where I am lost.

#### stat08

Hi

Are you doing 161-100 at Massey University too? I'm doing pretty much the same question, but with a different data set - obviously.

I think the answer is in secton 11.3 of D&F (i.e. the text book). I was hoping I could find the answer using Google, but no such luck.

Good luck with the rest of the Assignment

#### dzeni

Small world!! I'm doing that paper. Its going OK. I resubmitted my 3rd assignment as the marking scheme was a bit hard in places. For example, I got marked down for only creating a box plot where it said to create a box plot or a dot diagram (can't remember the exact wording). It also said to invoke the CLT at some point in the mark scheme but this was not asked in the question so I appealed.

Ended up going from a 79% to 86%. This is not as great as it sounds. I'm trying to do really well as I currently teach Mathematics and needed some training to be able to teach Statistics next year.

You may well be right that the answer is in 11.3. I'm not too stressed about it as I sent in that assignment a while ago and am now focused on studying for the exam.

