Help in deciphering output from a regression

#1
**** the spaces between the values are missing in this post, please refer to the text file attached.

Hi,

I'm trying to do some regression analysis and I do not usderstand the output.

I have run a simpler analysis so I hope someone can help me make sense of the output.

I am trying to get n and e in the following equation

x = n y + e

The data I ran is:

x y
56 89
45 89
12 56
12 263


The code I used is

proc reg data=Try1;
model x= y;
run;

The output I get is

The REG Procedure
Model: MODEL1
Dependent Variable: x x

Number of Observations Read 4
Number of Observations Used 4


Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 1 279.11433 279.11433 0.44 0.5747
Error 2 1263.63567 631.81783
Corrected Total 3 1542.75000


Root MSE 25.13599 R-Square 0.1809
Dependent Mean 31.25000 Adj R-Sq -0.2286
Coeff Var 80.43516


Parameter Estimates

Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 44.02699 22.96735 1.92 0.1953
y y 1 -0.10283 0.15472 -0.66 0.5747


I also plotted the graph in Excel and I got

y = -1.7594x + 179.23
R² = 0.1809
from Excel.

I see the R² value is the same but I don't know which of the output from SAS is the n and e.

I would appreciate any help in deciphering the output.

Thank you.



-Connor
 
Last edited:

jrai

New Member
#2
Connor, what you're trying to accomplish isn't very clear. I've 2 quick observations here:
1) The model that you're estimating in excel is different from the one in SAS. They are actually the same models but equation is rearranged & hence you can't expect the coefficients to be same. The excel model is y=b+mx & SAS model is x=a+ny.

2) 'e' usually stands for error term. If you really mean error when you say 'e' then you're estimating a wrong model in both the cases. If your 'e' denotes the intercept then it is fine.

Assuming that 'e' is for intercept, then from SAS output your equation is x=44.02-0.10283y. The parameter estimate of y gives you 'n' & that of intercept gives you 'e'.

If your 'e' is for error & not for intercept then you need to estimate x=ny. You can do this by using NOINT option in the model statement:
proc reg data=Try1;
model x= y /NOINT;
run;

This will estimate the model without the intercept i.e. x=ny. The you can get error term 'e' for each observation by subtracting predicted value from the observed value.

Caution: your sample size is too small for any practical application.
 
#3
Hey Jrai,
Thanks for responding!!

1) You are right, my bad. I want x = n y + e

2) I meant e as in error term

Does the NOINT command force the intercept to be the origin?

After I run the code you gave I get (text file attached)

The REG Procedure
Model: MODEL1
Dependent Variable: x x

Number of Observations Read 4
Number of Observations Used 4


NOTE: No intercept in model. R-Square is redefined.

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 1 1863.65377 1863.65377 1.56 0.3003
Error 3 3585.34623 1195.11541
Uncorrected Total 4 5449.00000


Root MSE 34.57044 R-Square 0.3420
Dependent Mean 31.25000 Adj R-Sq 0.1227
Coeff Var 110.62541


Parameter Estimates

Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|

y y 1 0.14540 0.11644 1.25 0.3003


Does that mean x= 0.14540y?

If so, the expected y values are

Expected y
8.1424
6.543
1.7448
1.7448


So the error is

y minus expected y
80.8576
82.457
54.2552
261.2552

which sums up to 478.825.

so the final equation is x= 0.14540y + 478.825? (I'm fairly sure I'm very off)

My actual dataset has 3000 values and 4 variables but I figured I'd use a smaller set to understand what I'm doing.

Hope you can clarify my confusion again.

Thank Jrai!!


-Connor
 

jrai

New Member
#4
Yes NOINT command forces the intercept to be origin, which is not at all a good idea for elementary model until & unless you exactly know that the true specification of your model doesn't have an intercept.

Connor, I'd recommend reading some text on OLS/ linear regression analysis. You've to get your basics right (no rudeness meant). The error term varies for each observation & you can't just sum up the error terms. Usually the error term is not specified in the equation. You just leave the equation at E(x)=0.14540y. Error/ residual is the random component which by the assumptions of OLS has to have mean=0.

And for calculating e for each observation, you do x-ny. The observed value of x is x & predicted value of x is ny i.e. 0.14540y. To make things easier you can do this all in SAS:

proc reg data=Try1;
model x= y /NOINT P R;
run;

The P option displays a variable called Predicted values which will give you ny for all observations. The R option will give you the residual/'e' for each term i.e. x-ny. When you do this for 3000 observations I'd recommend not to display the output but to take it in a dataset by following way:

proc reg data=Try1;
model x= y /NOINT;
output out=try2 r=resid p=pred;
run;

This will save the predicted values(in variable pred) & residual values(in variable resid) in dataset named try2.

It'll be good to understand the difference between error & residuals: http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics
 

jrai

New Member
#6
Dason,
Thanks for this article. A quick question:

I remember a model where I was predicting sales. Due to some issues I was restricted not to transform the response variable. The intercept was negative (~=-1000) & this led to many predicted values to be negative (approx. 1000-2000 or maybe even more out of 700,000 subjects). My idea was to restrict the intercept to be positive (I expected this would lead to all positive predictions). Is it a good idea? If not any alternate suggestions to keep the negative count low.
 

Dason

Ambassador to the humans
#7
Did a normal response seem appropriate? Was there any issue with non constant variance? Some sort of GLM using something like a gamma response might work?