+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 15 of 33

Thread: Help with comprehending output from proc reg

  1. #1
    Points: 851, Level: 15
    Level completed: 51%, Points required for next Level: 49

    Posts
    61
    Thanks
    27
    Thanked 1 Time in 1 Post

    Help with comprehending output from proc reg




    So I just ran my very first regression and the data output is pasted below (I've also attached the .txt file)

    Analysis of Variance

    Sum of Mean
    Source DF Squares Square F Value Pr > F

    Model 20 3.670499E11 18352493740 1.58 0.0496
    Error 1101 1.277317E13 11601428676
    Corrected Total 1121 1.314022E13


    Root MSE 107710 R-Square 0.0279
    Dependent Mean 100585 Adj R-Sq 0.0103
    Coeff Var 107.08322


    Parameter Estimates

    Parameter Standard
    Variable Label DF Estimate Error t Value Pr > |t|

    Intercept Intercept 1 85167 34579 2.46 0.0139
    A A 1 0.00000861 0.00790 0.00 0.9991
    B B 1 2.98968 2.46392 1.21 0.2252
    C C 1 110.87518 85.72047 1.29 0.1961
    D D 1 -347.03355 127.76966 -2.72 0.0067
    E E 1 94.43187 99.11811 0.95 0.3409
    F F 1 -0.00041549 0.00609 -0.07 0.9456
    G G 1 -5.06148 49.23571 -0.10 0.9181
    H H 1 16426 13104 1.25 0.2103
    I I 1 -45101 22002 -2.05 0.0406
    J J 1 -46856 29047 -1.61 0.1070
    K K 1 -48402 22305 -2.17 0.0302
    L L 1 -30408 21738 -1.40 0.1621
    M M 1 -46714 22433 -2.08 0.0375
    N N 1 -27593 21538 -1.28 0.2004
    O O 1 -31519 23769 -1.33 0.1851
    P P 1 -54905 20912 -2.63 0.0088
    Q Q 1 -24632 34383 -0.72 0.4739
    R R 1 912.08579 377.31024 2.42 0.0158
    S S 1 1210.77268 7284.51610 0.17 0.8680
    T T 1 -22182 16564 -1.34 0.1808





    Given this output, the following are the conclusions I have drawn:

    DEPD = 0.00000861 A + 2.98968 B + 110.87518 C + -347.03355 D + 94.43187 E+ -0.00041549 F + -5.06148 G + 16426 H + -45101 I + -46856 J + -48402 K + -30408 L + -46714 M + -27593 N + -31519 O + -54905 P + -24632 Q + 912.08579 R + 1210.77268 S + -22182 T + 85167

    (However, since all the Pr > |t| values are greater than 0.001 the coefficients are not very accurate. Also since the Standard error values are pretty high, the coefficients are not very accurate.)

    Parameters A to T explain 2.79% (R-square value) of the variation shown in DEPD.

    1) Is this correct?

    In addition,

    2) What is F Value and Pr > F?

    3) What is t Value?

    4) Is there anything else I can get from the SAS output?

    Thank you.
    Attached Files

  2. #2
    Points: 851, Level: 15
    Level completed: 51%, Points required for next Level: 49

    Posts
    61
    Thanks
    27
    Thanked 1 Time in 1 Post

    Re: Help with comprehending output from proc reg

    Mods, i just realised this is probably in the wrong sub forum, could you help me to delete this thread so i can re post in the statistics forum?

    Thanks.

  3. #3
    Devorador de queso
    Points: 95,705, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,931
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Help with comprehending output from proc reg

    Nope - but I will move it for you.
    I don't have emotions and sometimes that makes me very sad.

  4. #4
    Points: 851, Level: 15
    Level completed: 51%, Points required for next Level: 49

    Posts
    61
    Thanks
    27
    Thanked 1 Time in 1 Post

    Re: Help with comprehending output from proc reg

    that works out even better, thanks man!

  5. #5
    Point Mass at Zero
    Points: 13,828, Level: 76
    Level completed: 45%, Points required for next Level: 222
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    684
    Thanks
    188
    Thanked 143 Times in 139 Posts

    Re: Help with comprehending output from proc reg

    The first question I have for you is: WHAT IS YOUR OBJECTIVE? Are you trying to screen out the variables and find out which of the variables are significant in explaining the response?

    If you are trying to find out which of the variables are important, then running a variable selection approach would be useful (forward, backward or step wise). If you already know that some of the variables are important then you can force the variables in the model (despite being not significant).


    2) What is F Value and Pr > F?
    F test is assesing the significance of your model. To be precise, the F-test is testing the null hypothesis that at least one of the variables is linearly related to the response variable.
    Pr>F is the p-value associated with the F-test. It is marginally significant at 5% level indicating that at least one of the variables is linearly related to the response variable.

    3) What is t Value?
    The t-value is the test statistics for testing the significance of the model coefficients.

    4) Is there anything else I can get from the SAS output?
    Yes, since you have so many variables, i.e. you are in a multiple linear regression setting, you can definitely explore a bit more. You want to know if there is a multi-collinearity i.e. if any set of the variables are correlated. This can be checked using Variance Inflation Factor (VIF). VIF>10 means the multi-collinearity is serious.
    You can explore the residuals and check the underlying assumptions of the model.
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  6. The Following User Says Thank You to ledzep For This Useful Post:

    david_q (02-12-2012)

  7. #6
    Points: 851, Level: 15
    Level completed: 51%, Points required for next Level: 49

    Posts
    61
    Thanks
    27
    Thanked 1 Time in 1 Post

    Re: Help with comprehending output from proc reg

    Ledzep,
    Thank you so much for your reply.

    Yes, I am trying to find out which of the independent variables have an effect on the dependent variable and to what extent they have an effect on the dependent variable. I know it seems like I have a lot of independent variables but variables H to P exist because there is 1 qualitative variable with 10 possible options.

    Right now am I right to say that from the results it seems like none of the independent variables have an impact on the dependent variable?

    Also, could you explain a bit more about Variance Inflation Factor? Do I test the VIF between independent variables or between the dependent variable and the independent variables?

    Thank you!

  8. #7
    Point Mass at Zero
    Points: 13,828, Level: 76
    Level completed: 45%, Points required for next Level: 222
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    684
    Thanks
    188
    Thanked 143 Times in 139 Posts

    Re: Help with comprehending output from proc reg

    Right now am I right to say that from the results it seems like none of the independent variables have an impact on the dependent variable?
    Your p-value for "D","I","K","M","P","Q" and "R" are all significant at 5% level of significance as the p-values are <0.05. This means that these variables are significant for your response variable.

    However, the results may not be reliable as this doesn't seem to be the right model as indicated by large VIF.

    VIF is a measure of severity of collinearlity. Larger values (>10) means more correlation between independent variables. You just see the VIF for a given variable. For example:

    Code: 
    /*fake data*/
    data test;
    input y x1 x2;
    cards;
    8  3  6
    3  4  1
    2  2  2
    4  4  3
    2  5  4
    ;
    run;
    
    /*Run glm*/
    proc reg data=test;
    model y= x1 x2/vif;
    run;quit;
    
    /*trimmed output*/
    
                                             Parameter Estimates
    
                                  Parameter       Standard                              Variance
             Variable     DF       Estimate          Error    t Value    Pr > |t|      Inflation
    
             Intercept     1        2.61458        3.97927       0.66      0.5787              0
             x1            1       -0.53646        0.96588      -0.56      0.6344        1.00208
             x2            1        0.97396        0.57253       1.70      0.2310        1.00208
    
    Here the VIF are less than 10. So, there is no collinearity between the dependent variables.
    Usually running a variable selection is useful as they will help to screen you out the important variables. Once you screened out your variables, then you can fit the selected model and run diagnostic checks to check the appropriateness of the fitted model.

    Code: 
    
    /*Run variable selection*/
    proc reg data=test ;
    model y= x1 x2/selection=stepwise;
    run;quit;
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  9. The Following User Says Thank You to ledzep For This Useful Post:

    david_q (02-12-2012)

  10. #8
    Point Mass at Zero
    Points: 13,828, Level: 76
    Level completed: 45%, Points required for next Level: 222
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    684
    Thanks
    188
    Thanked 143 Times in 139 Posts

    Re: Help with comprehending output from proc reg

    Quote Originally Posted by david_q View Post
    I know it seems like I have a lot of independent variables but variables H to P exist because there is 1 qualitative variable with 10 possible options.
    OK!!! Thanks for this additional information. I was firmly assuming up until now that all the variables were continuous variables (as you used proc reg).
    So, H to P are different levels of the same variable. Is it possible to have them as a single column? then you can use proc glm by specifying the variable as a class variable instead of proc reg.

    The danger of using proc reg is that it assumes the dependent variables are continuous even though they are categorical. To specify correctly the class you have to use "proc glm" and specify using class statement that a variable is a factor not continuous.
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  11. The Following User Says Thank You to ledzep For This Useful Post:

    david_q (02-12-2012)

  12. #9
    Points: 851, Level: 15
    Level completed: 51%, Points required for next Level: 49

    Posts
    61
    Thanks
    27
    Thanked 1 Time in 1 Post

    Re: Help with comprehending output from proc reg

    Ledzip,
    Dude you're like a statistics and SAS jedi!

    I will rerun the data with the vif code.

    What is the reason you put "selection=stepwise" after the model statement in the second code?

    I can definitely get the categorical values into one column, the raw data specifies it as one column and I separated it out. How would I run the code then?

    proc glm data=File_name;
    model DEPD = A B C D E F G H Q R S T;
    run;

    In that case what does the coefficient for H represent?

    Thanks Ledzep!

  13. #10
    Point Mass at Zero
    Points: 13,828, Level: 76
    Level completed: 45%, Points required for next Level: 222
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    684
    Thanks
    188
    Thanked 143 Times in 139 Posts

    Re: Help with comprehending output from proc reg

    Quote Originally Posted by david_q View Post
    What is the reason you put "selection=stepwise" after the model statement in the second code?
    I thought you're interested to find out which of the variables were significant. Using selection=stepwise will allow you to come up with the variables which were significant for your response in the presence of other variables in the model.

    I can definitely get the categorical values into one column, the raw data specifies it as one column and I separated it out. How would I run the code then?

    proc glm data=File_name;
    model DEPD = A B C D E F G H Q R S T;
    run;
    Yes, the code pretty much as you said but a slight addition with a class line.

    Code: 
    proc glm data=File_name;
    class H;  *list all your categorical/factor variables here. SAS calls them Class; 
    model DEPD = A B C D E F G H Q R S T;
    run;
    In that case what does the coefficient for H represent?
    It should list 10 different estimates for H, one for each level of H (one of them should zero, as it will be set as a reference category by SAS).
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  14. The Following User Says Thank You to ledzep For This Useful Post:

    david_q (02-12-2012)

  15. #11
    Points: 851, Level: 15
    Level completed: 51%, Points required for next Level: 49

    Posts
    61
    Thanks
    27
    Thanked 1 Time in 1 Post

    Re: Help with comprehending output from proc reg

    Ledzep,
    I have 1 last question before I re run the regression:

    I have 2 other categorical variables (S and T) but these are either yes or no. So S and T have value of 1 for yes and value of 0 for no. Should I leave them as they are or move them to class variables?

    Thank you!

  16. #12
    Point Mass at Zero
    Points: 13,828, Level: 76
    Level completed: 45%, Points required for next Level: 222
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    684
    Thanks
    188
    Thanked 143 Times in 139 Posts

    Re: Help with comprehending output from proc reg

    You should move S and T to the class list.
    If you don't specify that it is a class variable, SAS will assume it to be continuous variable and will fit as a linear effect.
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  17. #13
    Devorador de queso
    Points: 95,705, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,931
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Help with comprehending output from proc reg

    Quote Originally Posted by ledzep View Post
    I thought you're interested to find out which of the variables were significant. Using selection=stepwise will allow you to come up with the variables which were significant for your response in the presence of other variables in the model.
    Except that stepwise selection is NOT a good procedure. Here is a link explaining a few of the reasons why you really shouldn't use it: http://www.childrensmercy.org/stats/faq/faq12.aspx
    I don't have emotions and sometimes that makes me very sad.

  18. #14
    Point Mass at Zero
    Points: 13,828, Level: 76
    Level completed: 45%, Points required for next Level: 222
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    684
    Thanks
    188
    Thanked 143 Times in 139 Posts

    Re: Help with comprehending output from proc reg

    Just to give you an example of what happens when you don't specify class.

    Code: 
    /*fake data*/
    data test;
    input y x1 x2;
    cards;
    8  1  6
    3  1  1
    2  1  2
    4  0  3
    2  0  4
    ;
    run;
    
    *x1 is a yes No variable;
    Code: 
    /*With Class specified for x1*/
    proc glm data=test ;
    class x1;
    model y= x1 x2/solution;
    run;quit;
    
    *output;
                                             Standard
                    Parameter           Estimate             Error    t Value    Pr > |t|
    
                    Intercept        1.229885057 B      1.84671547       0.67      0.5740
                    x1        0     -1.850574713 B      1.74372021      -1.06      0.3998
                    x1        1      0.000000000 B       .                .         .
                    x2               1.034482759        0.49651979       2.08      0.1726
    
    *TWO estimates for x1, one for each level of x1. The highest level is set as reference category by SAS. Hence,0.
    Code: 
    /*Now, class not told to SAS*/
    proc glm data=test ;
    model y= x1 x2;
    run;quit;
    
    *output;
                                         Standard
                      Parameter         Estimate           Error    t Value    Pr > |t|
    
                      Intercept     -0.620689655      2.19257205      -0.28      0.8037
                      x1             1.850574713      1.74372021       1.06      0.3998
                      x2             1.034482759      0.49651979       2.08      0.1726
    
    
    * SEE that only one estimate for x1, as SAS is assuming x1 as a continuous variable i.e. assuming linear effect. However, in fact it is not linear as we know it is a YES, NO variable.
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  19. The Following User Says Thank You to ledzep For This Useful Post:

    david_q (02-12-2012)

  20. #15
    Devorador de queso
    Points: 95,705, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,931
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Help with comprehending output from proc reg


    But they would probably just get an error I'm guessing since their categorical variable is probably actually text and not numeric so it wouldn't be able to treat it as continuous.
    I don't have emotions and sometimes that makes me very sad.

  21. The Following User Says Thank You to Dason For This Useful Post:

    david_q (02-12-2012)

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats