+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 29

Thread: Regression diagnostics with proc glm or proc reg

  1. #1
    Points: 10,148, Level: 67
    Level completed: 25%, Points required for next Level: 302

    Posts
    158
    Thanks
    88
    Thanked 1 Time in 1 Post

    Regression diagnostics with proc glm or proc reg




    I fit my model using 'proc glm' but now it seems that proc reg should be used for the diagnostics. So, do I need to fit the model all over again using proc reg and creating dummy variables (that proc glm avoided) since proc reg is to be used for the diagnostics or can diagnostics be done with proc glm?

  2. #2
    Points: 2,626, Level: 31
    Level completed: 18%, Points required for next Level: 124

    Location
    Dallas, TX
    Posts
    311
    Thanks
    12
    Thanked 90 Times in 88 Posts

    Re: Regression diagnostics with proc glm or proc reg

    What diagnostics are you referring to in particular?

  3. #3
    Points: 10,148, Level: 67
    Level completed: 25%, Points required for next Level: 302

    Posts
    158
    Thanks
    88
    Thanked 1 Time in 1 Post

    Re: Regression diagnostics with proc glm or proc reg

    Quote Originally Posted by jrai View Post
    What diagnostics are you referring to in particular?
    Outliers, leverage, cook's D, multicollinearity (vif) and whatever else needs to be tested in multiple linear regression (continuous outcome; both categorical and continuous predictors. interactions present).

  4. #4
    Points: 2,626, Level: 31
    Level completed: 18%, Points required for next Level: 124

    Location
    Dallas, TX
    Posts
    311
    Thanks
    12
    Thanked 90 Times in 88 Posts

    Re: Regression diagnostics with proc glm or proc reg

    1) VIF can be estimated using tolerance statistics. tolerance=1/vif & is given by the tolerance option in the model statement of Proc GLM.

    2) Cook's D can be written to the output dataset using cookd= option in the output statement of the Proc GLM.

    3) If you still need to estimate model using Proc Reg then you'll have to create dummies & if you want similar results then the coding has to be done the way Proc GLM does else the coefficients might be different.

    Proc Reg does give more diagnostic statistics than proc GLM.

  5. The Following User Says Thank You to jrai For This Useful Post:

    StatsClue (02-06-2012)

  6. #5
    Points: 10,148, Level: 67
    Level completed: 25%, Points required for next Level: 302

    Posts
    158
    Thanks
    88
    Thanked 1 Time in 1 Post

    Re: Regression diagnostics with proc glm or proc reg

    Quote Originally Posted by jrai View Post
    1) VIF can be estimated using tolerance statistics. tolerance=1/vif & is given by the tolerance option in the model statement of Proc GLM.

    2) Cook's D can be written to the output dataset using cookd= option in the output statement of the Proc GLM.

    3) If you still need to estimate model using Proc Reg then you'll have to create dummies & if you want similar results then the coding has to be done the way Proc GLM does else the coefficients might be different.

    Proc Reg does give more diagnostic statistics than proc GLM.
    Thanks so much! Is there anything else (eg. residuals, leverage, outliers) that I could do with glm? And..could you specify the code a little more? for eg.

    proc glm data=statsclue;
    class gender drugcategory;
    model outcome=gender drugcategory volume gender*volume;
    run;

    Where do I put the tolerance thing? Thanks again, this was really helpful.

    EDIT: I just put tolerance at the right place (would help to know about others, eg. residuals, cooks D) and got an impossible number of dummies on the output. I know I should be worried about vif>10. What tolerance value should I be worried about? Also, would it be type 1 or type 2 tolerance (the output shows these two types)?

    And...a somewhat unrelated q: for the output with glm, should I read the type 1 SS or type 3 SS to decide about which variables to keep in the model? (the p-values for the two outputs aren't always the same). Thanks a LOT.
    Last edited by StatsClue; 02-06-2012 at 10:56 PM.

  7. #6
    Points: 2,626, Level: 31
    Level completed: 18%, Points required for next Level: 124

    Location
    Dallas, TX
    Posts
    311
    Thanks
    12
    Thanked 90 Times in 88 Posts

    Re: Regression diagnostics with proc glm or proc reg

    I know I should be worried about vif>10. What tolerance value should I be worried about?
    If your criteria is vif>10 then tolerance cut off should be 0.1. Anything less than 0.1 will indicate multicollinearity.

    Also, would it be type 1 or type 2 tolerance (the output shows these two types)?
    Type 2 is same as the Tolerance & corresponding VIF output from Proc reg. Therefore, I prefer using type 2.

    for the output with glm, should I read the type 1 SS or type 3 SS to decide about which variables to keep in the model? (the p-values for the two outputs aren't always the same).
    Any of these can be used if you understand what they are testing. Usually Type 3 makes more intuitive sense. Say you've 3 IVs a,b & c. The Type 3 statistic is for a is calculated by estimating equation with intercept, b & c i.e. excluding a. Therefore, it'll give the additional effect of variable a. If the p-value for a comes out insignificant then the equation can be safely estimated without a.

    Type 1 is sequential testing. Say you specified IVs as b,c & a in the model statement in that order. Now Type 1 will fit the model in sequence i.e. intercept first followed by intercept + b, followed by int+b+c & so on. Because of the sequential structure it is less intuitive.

    would help to know about others, eg. residuals, cooks D
    Code: 
    proc glm data=statsclue;
    class gender drugcategory;
    model outcome=gender drugcategory volume gender*volume/ tolerance;
    output out=stasclue1 cookd=cooks_statistics_in_this_var r=residuals_in_this_var;
    run;

  8. The Following User Says Thank You to jrai For This Useful Post:

    StatsClue (02-07-2012)

  9. #7
    Points: 10,148, Level: 67
    Level completed: 25%, Points required for next Level: 302

    Posts
    158
    Thanks
    88
    Thanked 1 Time in 1 Post

    Re: Regression diagnostics with proc glm or proc reg

    Thanks! Not too sure of the output line. Do you mean: output out=statsclue1 cookd=cooks volume r=residuals volume; ?
    Basically, not sure what you mean to be written in place of cooks_statistics_in_this_var .

    In the context of tolerance , I got values for different levels instead of the whole variable, eg. tolerace for the 2 different drug categories (3rd would be referent I guess) was <.1. I didn't get a tolerance for drugcategory as a whole. Also it was <.1 for some interactions and levels of interactions. What does this mean? Should these be excluded from the model?

  10. #8
    Points: 2,626, Level: 31
    Level completed: 18%, Points required for next Level: 124

    Location
    Dallas, TX
    Posts
    311
    Thanks
    12
    Thanked 90 Times in 88 Posts

    Re: Regression diagnostics with proc glm or proc reg

    Basically, not sure what you mean to be written in place of cooks_statistics_in_this_var .
    This would be replaced by the variable name in which you want to store the results.

    In the context of tolerance , I got values for different levels instead of the whole variable, eg. tolerace for the 2 different drug categories (3rd would be referent I guess) was <.1. I didn't get a tolerance for drugcategory as a whole. Also it was <.1 for some interactions and levels of interactions. What does this mean? Should these be excluded from the model?
    This is same as getting VIF for the dummies. Low tolerance means that either there is not much variation within the variable (which I personally think is not a good case for prediction) or it means that you dummy takes value closely correlated with some other variable. I'd suggest investigating a bit but I personally don't worry too much about multicollinearity with dummies.

  11. The Following User Says Thank You to jrai For This Useful Post:

    StatsClue (02-08-2012)

  12. #9
    Points: 10,148, Level: 67
    Level completed: 25%, Points required for next Level: 302

    Posts
    158
    Thanks
    88
    Thanked 1 Time in 1 Post

    Re: Regression diagnostics with proc glm or proc reg

    that code really didn't work for cookd and residuals but thanks a bunch jrai.

    Interactions should be kept when investigating tolerance/vif, right? Thanks.

  13. #10
    Points: 2,626, Level: 31
    Level completed: 18%, Points required for next Level: 124

    Location
    Dallas, TX
    Posts
    311
    Thanks
    12
    Thanked 90 Times in 88 Posts

    Re: Regression diagnostics with proc glm or proc reg

    What was the problem with the code? Did you get any error message?

    Yes interactions should be kept but often they show high collinearity with the base variables, viz. x_sq will always show high colinearity with x. Either you can leave it like that else, you can use ridge regression.

  14. The Following User Says Thank You to jrai For This Useful Post:

    StatsClue (02-09-2012)

  15. #11
    Points: 10,148, Level: 67
    Level completed: 25%, Points required for next Level: 302

    Posts
    158
    Thanks
    88
    Thanked 1 Time in 1 Post

    Re: Regression diagnostics with proc glm or proc reg

    ridge regression...gawd, my brain can't handle any new terminology, let alone regression. well some silly associations got into my model in a highly significant way and I was hoping to find a statistical reason further in the analysis, to drop them.
    Have another q..more for a stage preceding: If after placing all siginficant interactions in the model, a formerly significant variable becomes insiginfincant (but one of its interactions reminds significant), is it reason enough to throw out that now insignificant variable along with its significant interaction?

    I didn't get any error message. The command ran and gave everything else in the output without a hint of cookd or residuals.

    Thanks.

  16. #12
    Points: 2,626, Level: 31
    Level completed: 18%, Points required for next Level: 124

    Location
    Dallas, TX
    Posts
    311
    Thanks
    12
    Thanked 90 Times in 88 Posts

    Re: Regression diagnostics with proc glm or proc reg

    Quote Originally Posted by StatsClue View Post
    I didn't get any error message. The command ran and gave everything else in the output without a hint of cookd or residuals.
    Did you check the dataset named work.statsclue1 (if you used the same code)? This is the output dataset where the results are stored.

    As for model selection, ideal way is to keep base variables if you are keeping interactions. According to McClave, Benson & Sincich see if the overall model is useful indicated by the F-test & then see if the interaction is significant. If the interaction is significant then the tests on base variables are meaningless as the significance of the interaction term implies that both the variables are important.

  17. The Following User Says Thank You to jrai For This Useful Post:

    StatsClue (02-09-2012)

  18. #13
    Points: 10,148, Level: 67
    Level completed: 25%, Points required for next Level: 302

    Posts
    158
    Thanks
    88
    Thanked 1 Time in 1 Post

    Re: Regression diagnostics with proc glm or proc reg

    Quote Originally Posted by jrai View Post

    If the interaction is significant then the tests on base variables are meaningless as the significance of the interaction term implies that both the variables are important.
    Very useful information. So that's saying that tests on base variables are meaningless, right, and NOT that KEEPING the base variables individually is meaningless if they're being kept as part of interaction, right? i.e., you've gotta keep them individually if you're keeping them as part of interaction..(?)

    Checked the work folder. Only has the original dataset. Oh and the log does give a strange message at the bottom:

    Variable volume already exists on file WORK.statsclue1, using volume2 instead.

    Variable volume already exists on file WORK.statsclue1, using volume3 instead.

  19. #14
    Points: 2,626, Level: 31
    Level completed: 18%, Points required for next Level: 124

    Location
    Dallas, TX
    Posts
    311
    Thanks
    12
    Thanked 90 Times in 88 Posts

    Re: Regression diagnostics with proc glm or proc reg

    Yes, keep the individual variables if their interaction is being kept.

    Statsclue1 should be created in the work folder. There seems to be some problem with variable names in your coding. Give the codes & I can check.

  20. The Following User Says Thank You to jrai For This Useful Post:

    StatsClue (02-09-2012)

  21. #15
    Points: 10,148, Level: 67
    Level completed: 25%, Points required for next Level: 302

    Posts
    158
    Thanks
    88
    Thanked 1 Time in 1 Post

    Re: Regression diagnostics with proc glm or proc reg


    That code for cookd and residual worked but my N >1000, so it's tough looking for influential observations (though I'm not even sure what to do with them if I do find them).

    Is there a way to get SAS to print out only the observations of concern, with proc glm?

    Thanks.

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats