# Thread: Help with comprehending output from proc reg

1. ## Help with comprehending output from proc reg

So I just ran my very first regression and the data output is pasted below (I've also attached the .txt file)

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 20 3.670499E11 18352493740 1.58 0.0496
Error 1101 1.277317E13 11601428676
Corrected Total 1121 1.314022E13

Root MSE 107710 R-Square 0.0279
Dependent Mean 100585 Adj R-Sq 0.0103
Coeff Var 107.08322

Parameter Estimates

Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 85167 34579 2.46 0.0139
A A 1 0.00000861 0.00790 0.00 0.9991
B B 1 2.98968 2.46392 1.21 0.2252
C C 1 110.87518 85.72047 1.29 0.1961
D D 1 -347.03355 127.76966 -2.72 0.0067
E E 1 94.43187 99.11811 0.95 0.3409
F F 1 -0.00041549 0.00609 -0.07 0.9456
G G 1 -5.06148 49.23571 -0.10 0.9181
H H 1 16426 13104 1.25 0.2103
I I 1 -45101 22002 -2.05 0.0406
J J 1 -46856 29047 -1.61 0.1070
K K 1 -48402 22305 -2.17 0.0302
L L 1 -30408 21738 -1.40 0.1621
M M 1 -46714 22433 -2.08 0.0375
N N 1 -27593 21538 -1.28 0.2004
O O 1 -31519 23769 -1.33 0.1851
P P 1 -54905 20912 -2.63 0.0088
Q Q 1 -24632 34383 -0.72 0.4739
R R 1 912.08579 377.31024 2.42 0.0158
S S 1 1210.77268 7284.51610 0.17 0.8680
T T 1 -22182 16564 -1.34 0.1808

Given this output, the following are the conclusions I have drawn:

DEPD = 0.00000861 A + 2.98968 B + 110.87518 C + -347.03355 D + 94.43187 E+ -0.00041549 F + -5.06148 G + 16426 H + -45101 I + -46856 J + -48402 K + -30408 L + -46714 M + -27593 N + -31519 O + -54905 P + -24632 Q + 912.08579 R + 1210.77268 S + -22182 T + 85167

(However, since all the Pr > |t| values are greater than 0.001 the coefficients are not very accurate. Also since the Standard error values are pretty high, the coefficients are not very accurate.)

Parameters A to T explain 2.79% (R-square value) of the variation shown in DEPD.

1) Is this correct?

2) What is F Value and Pr > F?

3) What is t Value?

4) Is there anything else I can get from the SAS output?

Thank you.

2. ## Re: Help with comprehending output from proc reg

Mods, i just realised this is probably in the wrong sub forum, could you help me to delete this thread so i can re post in the statistics forum?

Thanks.

3. ## Re: Help with comprehending output from proc reg

Nope - but I will move it for you.

4. ## Re: Help with comprehending output from proc reg

that works out even better, thanks man!

5. ## Re: Help with comprehending output from proc reg

The first question I have for you is: WHAT IS YOUR OBJECTIVE? Are you trying to screen out the variables and find out which of the variables are significant in explaining the response?

If you are trying to find out which of the variables are important, then running a variable selection approach would be useful (forward, backward or step wise). If you already know that some of the variables are important then you can force the variables in the model (despite being not significant).

2) What is F Value and Pr > F?
F test is assesing the significance of your model. To be precise, the F-test is testing the null hypothesis that at least one of the variables is linearly related to the response variable.
Pr>F is the p-value associated with the F-test. It is marginally significant at 5% level indicating that at least one of the variables is linearly related to the response variable.

3) What is t Value?
The t-value is the test statistics for testing the significance of the model coefficients.

4) Is there anything else I can get from the SAS output?
Yes, since you have so many variables, i.e. you are in a multiple linear regression setting, you can definitely explore a bit more. You want to know if there is a multi-collinearity i.e. if any set of the variables are correlated. This can be checked using Variance Inflation Factor (VIF). VIF>10 means the multi-collinearity is serious.
You can explore the residuals and check the underlying assumptions of the model.

6. ## The Following User Says Thank You to ledzep For This Useful Post:

david_q (02-12-2012)

7. ## Re: Help with comprehending output from proc reg

Ledzep,

Yes, I am trying to find out which of the independent variables have an effect on the dependent variable and to what extent they have an effect on the dependent variable. I know it seems like I have a lot of independent variables but variables H to P exist because there is 1 qualitative variable with 10 possible options.

Right now am I right to say that from the results it seems like none of the independent variables have an impact on the dependent variable?

Also, could you explain a bit more about Variance Inflation Factor? Do I test the VIF between independent variables or between the dependent variable and the independent variables?

Thank you!

8. ## Re: Help with comprehending output from proc reg

Right now am I right to say that from the results it seems like none of the independent variables have an impact on the dependent variable?
Your p-value for "D","I","K","M","P","Q" and "R" are all significant at 5% level of significance as the p-values are <0.05. This means that these variables are significant for your response variable.

However, the results may not be reliable as this doesn't seem to be the right model as indicated by large VIF.

VIF is a measure of severity of collinearlity. Larger values (>10) means more correlation between independent variables. You just see the VIF for a given variable. For example:

Code:
``````/*fake data*/
data test;
input y x1 x2;
cards;
8  3  6
3  4  1
2  2  2
4  4  3
2  5  4
;
run;

/*Run glm*/
proc reg data=test;
model y= x1 x2/vif;
run;quit;

/*trimmed output*/

Parameter Estimates

Parameter       Standard                              Variance
Variable     DF       Estimate          Error    t Value    Pr > |t|      Inflation

Intercept     1        2.61458        3.97927       0.66      0.5787              0
x1            1       -0.53646        0.96588      -0.56      0.6344        1.00208
x2            1        0.97396        0.57253       1.70      0.2310        1.00208

Here the VIF are less than 10. So, there is no collinearity between the dependent variables.``````
Usually running a variable selection is useful as they will help to screen you out the important variables. Once you screened out your variables, then you can fit the selected model and run diagnostic checks to check the appropriateness of the fitted model.

Code:
``````
/*Run variable selection*/
proc reg data=test ;
model y= x1 x2/selection=stepwise;
run;quit;``````

9. ## The Following User Says Thank You to ledzep For This Useful Post:

david_q (02-12-2012)

10. ## Re: Help with comprehending output from proc reg

Originally Posted by david_q
I know it seems like I have a lot of independent variables but variables H to P exist because there is 1 qualitative variable with 10 possible options.
OK!!! Thanks for this additional information. I was firmly assuming up until now that all the variables were continuous variables (as you used proc reg).
So, H to P are different levels of the same variable. Is it possible to have them as a single column? then you can use proc glm by specifying the variable as a class variable instead of proc reg.

The danger of using proc reg is that it assumes the dependent variables are continuous even though they are categorical. To specify correctly the class you have to use "proc glm" and specify using class statement that a variable is a factor not continuous.

11. ## The Following User Says Thank You to ledzep For This Useful Post:

david_q (02-12-2012)

12. ## Re: Help with comprehending output from proc reg

Ledzip,
Dude you're like a statistics and SAS jedi!

I will rerun the data with the vif code.

What is the reason you put "selection=stepwise" after the model statement in the second code?

I can definitely get the categorical values into one column, the raw data specifies it as one column and I separated it out. How would I run the code then?

proc glm data=File_name;
model DEPD = A B C D E F G H Q R S T;
run;

In that case what does the coefficient for H represent?

Thanks Ledzep!

13. ## Re: Help with comprehending output from proc reg

Originally Posted by david_q
What is the reason you put "selection=stepwise" after the model statement in the second code?
I thought you're interested to find out which of the variables were significant. Using selection=stepwise will allow you to come up with the variables which were significant for your response in the presence of other variables in the model.

I can definitely get the categorical values into one column, the raw data specifies it as one column and I separated it out. How would I run the code then?

proc glm data=File_name;
model DEPD = A B C D E F G H Q R S T;
run;
Yes, the code pretty much as you said but a slight addition with a class line.

Code:
``````proc glm data=File_name;
class H;  *list all your categorical/factor variables here. SAS calls them Class;
model DEPD = A B C D E F G H Q R S T;
run;``````
In that case what does the coefficient for H represent?
It should list 10 different estimates for H, one for each level of H (one of them should zero, as it will be set as a reference category by SAS).

14. ## The Following User Says Thank You to ledzep For This Useful Post:

david_q (02-12-2012)

15. ## Re: Help with comprehending output from proc reg

Ledzep,
I have 1 last question before I re run the regression:

I have 2 other categorical variables (S and T) but these are either yes or no. So S and T have value of 1 for yes and value of 0 for no. Should I leave them as they are or move them to class variables?

Thank you!

16. ## Re: Help with comprehending output from proc reg

You should move S and T to the class list.
If you don't specify that it is a class variable, SAS will assume it to be continuous variable and will fit as a linear effect.

17. ## Re: Help with comprehending output from proc reg

Originally Posted by ledzep
I thought you're interested to find out which of the variables were significant. Using selection=stepwise will allow you to come up with the variables which were significant for your response in the presence of other variables in the model.
Except that stepwise selection is NOT a good procedure. Here is a link explaining a few of the reasons why you really shouldn't use it: http://www.childrensmercy.org/stats/faq/faq12.aspx

18. ## Re: Help with comprehending output from proc reg

Just to give you an example of what happens when you don't specify class.

Code:
``````/*fake data*/
data test;
input y x1 x2;
cards;
8  1  6
3  1  1
2  1  2
4  0  3
2  0  4
;
run;

*x1 is a yes No variable;``````
Code:
``````/*With Class specified for x1*/
proc glm data=test ;
class x1;
model y= x1 x2/solution;
run;quit;

*output;
Standard
Parameter           Estimate             Error    t Value    Pr > |t|

Intercept        1.229885057 B      1.84671547       0.67      0.5740
x1        0     -1.850574713 B      1.74372021      -1.06      0.3998
x1        1      0.000000000 B       .                .         .
x2               1.034482759        0.49651979       2.08      0.1726

*TWO estimates for x1, one for each level of x1. The highest level is set as reference category by SAS. Hence,0.``````
Code:
``````/*Now, class not told to SAS*/
proc glm data=test ;
model y= x1 x2;
run;quit;

*output;
Standard
Parameter         Estimate           Error    t Value    Pr > |t|

Intercept     -0.620689655      2.19257205      -0.28      0.8037
x1             1.850574713      1.74372021       1.06      0.3998
x2             1.034482759      0.49651979       2.08      0.1726

* SEE that only one estimate for x1, as SAS is assuming x1 as a continuous variable i.e. assuming linear effect. However, in fact it is not linear as we know it is a YES, NO variable.``````

19. ## The Following User Says Thank You to ledzep For This Useful Post:

david_q (02-12-2012)

20. ## Re: Help with comprehending output from proc reg

But they would probably just get an error I'm guessing since their categorical variable is probably actually text and not numeric so it wouldn't be able to treat it as continuous.

21. ## The Following User Says Thank You to Dason For This Useful Post:

david_q (02-12-2012)