proc gen mode diagnostics I don't understand.

noetsi

Fortran must die
#1
I am not used to working with proc gen mode. It generates two sets of diagnostics of the residuals. One looks fine, the other may have a pattern in it. I don't know why I am getting two totally different sets of diagnostics, and which I should use. There are no comments other than what I show below.
 

Attachments

noetsi

Fortran must die
#3
I am not sure which one you mean, but do you know which is the correct set of diagnostics hlsmith. I can't figure this out at all.

This is payment, and we get a small number of people who earn huge amounts of money while most get very little. I tend to assume with 20 thousand cases one outlier won't matter much. Look at the DFBETA's this seems to be confirmed.

Which diagnostics do you look at in proc genmode for linear regression?
 

noetsi

Fortran must die
#4
I think the plots all generates strange plots. Since i am new to genmod, these residual plots suggest no violation of the regression to me. But I wanted a second opinion.
 

Attachments

noetsi

Fortran must die
#5
Ok I have another question.

I am using PROC Genmode. 1 means you are in the group 0 you are not in the second column (please ignore the third column). I interpret this to mean that 16 -18 earn more than the excluded group which is 25-44 controlling for other variables. But that is hard to believe given what we know of our organizations and the descriptives. The code is really long so I did not include it. It is an interval response variable (income).

1615840403507.png
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
It is hard to ignore the third column! The indicators aren't clear given the above table. What I usually do is get the estimates from the model then make sure i can repeat them in the estimate statement, then tailor those to my question or reference groups of interest. It may help to show more of the output.
 

noetsi

Fortran must die
#7
I have the whole population which is why I did not post the rest. In this case being 0 means you are not 16-18. So does this mean that if you are not 16-18 you earn 771 dollars more than the reference group and if you are 16-18 you earn 771 dollars less? That is a bit confusing to me. I don't understand why they score it by the 0 group relative to the reference category rather than the 1 (which means you are in the group) relative to the reference category.

1615843540280.png
 

hlsmith

Less is more. Stay pure. Stay poor.
#10
I almost said earlier, if he kept calling it mode I wouldnt reply. Do you have an identity link and normal dist? If so, the non ref group has an expected 777 greater mean value than ref when controlling for other variables. Boom.
 

noetsi

Fortran must die
#11
Since I have a population I don't need a normal distribution, I don't even pay attention to p values. I don't know what an identity link is, I never heard of that. Do I need to test for non-linearity, the only assumption that matters I believe.

Genmod does not use reference coding. It uses GLM coding as the default. I really do not understand how that changes the interpretation.
 

hlsmith

Less is more. Stay pure. Stay poor.
#12
identity link is the default that goes with dist=normal - kind of like pairing logit link with binomial dist. I believe it refers to the identity matrix used in linear models.
 

noetsi

Fortran must die
#13
I almost said earlier, if he kept calling it mode I wouldnt reply. Do you have an identity link and normal dist? If so, the non ref group has an expected 777 greater mean value than ref when controlling for other variables. Boom.
What confuses me in proc genmod is the difference between glm and reference coding for dummy variables. I know what reference coding does, glm coding is something I can not find covered anywhere and I spent a lot of time looking. Maybe the difference is not large given your explanation.