Normality tests for SAS

#1
Hi All,
I just ran the PROC MIXED for outputting the residual error in SAS [TO CHECK FOR NORMALITY]
I am wondering if I need to look at Shapiro-Wilk or Kolmogorov-Smirnov tests? My sample size is 217 [repeated at 3 time points]. So, when I check for the normality of the residual error, do I run them separately at each timepoint or consider them all together?
Does that make a difference?
Thanks for your time.

I am new to this forum and newly using SAS, any feedback would be greatly appreciated!
Thanks,
PhD student
 
#2
Shapiro-Wilk W 0.889567 Pr < W <0.0001
Kolmogorov-Smirnov D 0.112944 Pr > D <0.0100
Cramer-von Mises W-Sq 2.839237 Pr > W-Sq <0.0050
Anderson-Darling A-Sq 19.61635 Pr > A-Sq <0.0050
This is one of the outputs I have.
My data is not normally distributed right?
Thanks!
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
There are some basic rules out there for selecting normality tests, which I cannot remember off the top of my head, but they provide cut-offs for when to use which normality test based on your sample size. I think they should be easy to look up (the number 200 keeps jumping into my head).

But yeah, those data do not look normally distributed for any of the tests. Perhaps if you also run them for residual consolidate over times and not consolidated and they both also come up < 0 0.05 then the answer does not matter since both groupings are not normal. But of course we would want to figure what the rule is for future analytics

As for the time points, not confidently sure of the answer. I could think of arguments for doing either. Hopefully someone else can jump in on this one. If you are not getting any feedback, perhaps see what is standard for repeated measures ANOVA for an idea/rationale.
 
#4
Thanks so much hlsmith! I appreciate your feedback.
Most of my data are not normally distributed. While I was discussing with a friend, he mentioned, that using PROCMIXED solves the problem even if my data is not normally distributed.
Is that true? Please let me know if any of you have any suggestions. Best, PhD student
 
#5
Be careful about the statistical tests of normality. They have very weak power, which is why they commonly are criticized. You can have a normal distribution and not find it with them because of this. Tests like a QQ plot are better.
 
#7
I am not sure what you mean by robust. The way I use this, it means if the method works when assumptions like normality are violated. Since non-parametrics don't rely on normality they are by definition more robust to violation of it. That is why you use non-parametrics commonly, you have seriously non-normal data and thus parametric methods are questionable. Otherwise the parametric method would provide more useful data (and is better supported by text and software commonly).
 
#8
Phdscientist: Your data is not normally distributed. I would take one of two approaches. First, you could try to transform your data sets to get a normal distribution. This is pretty easy in SAS, and works a fair amount of the time. The second is to run a different test that does not require normality. I assume from the last post you want to run a regression. From the SAS website:

http://support.sas.com/documentatio...lt/viewer.htm#statug_introreg_a0000000433.htm

Might be a start!
 

hlsmith

Less is more. Stay pure. Stay poor.
#9
May or may not be relevant any more:

Code:
[B][FONT=Courier New][SIZE=2][COLOR=#000080][FONT=Courier New][SIZE=2][COLOR=#000080][FONT=Courier New][SIZE=2][COLOR=#000080]proc [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/B][B][FONT=Courier New][SIZE=2][COLOR=#000080][FONT=Courier New][SIZE=2][COLOR=#000080][FONT=Courier New][SIZE=2][COLOR=#000080]univariate [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/B][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff]data[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Courier New][SIZE=2][FONT=Courier New][SIZE=2]=residualss [/SIZE][/FONT][/SIZE][/FONT][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff]normal[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Courier New][SIZE=2][FONT=Courier New][SIZE=2];[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff]var[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Courier New][SIZE=2][FONT=Courier New][SIZE=2] resid;[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff]histogram[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Courier New][SIZE=2][FONT=Courier New][SIZE=2] resid / [/SIZE][/FONT][/SIZE][/FONT][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff]normal[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Courier New][SIZE=2][FONT=Courier New][SIZE=2];[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff][FONT=Courier New][SIZE=2][COLOR=#0000ff]QQplot[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Courier New][SIZE=2][FONT=Courier New][SIZE=2] resid;[/SIZE][/FONT]
[/SIZE][/FONT][B][FONT=Courier New][SIZE=2][COLOR=#000080][FONT=Courier New][SIZE=2][COLOR=#000080][FONT=Courier New][SIZE=2][COLOR=#000080]run[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/B][FONT=Courier New][SIZE=2][FONT=Courier New][SIZE=2];[/SIZE][/FONT]
[/SIZE][/FONT]
 
#10
Thanks for the responses noetsi and cdhowar!

And thanks for the program hlsmith, it is still relevant. I am going to test the residuals error for normality without the major outliers and see.

Any other feedback would be super greatly appreciated :)
 
#11
There is a strong dispute about how important normality is in various methods (formally you are interested in the normality of the residuals not the normality of IV something commonly confused in discussions of this issue and). I have spent a long time studying this for various projects - enough to know that statisticians strongly disagree on it.
 
#13
I posted an example of two well known statisticians who clearly disagree with your perspective. Including what they specifically stated (that is a quote). I also have asked my professors who taught statistics in my graduate program and they too disagree with your perspective :) For my prelims in Measurement and Statistics program I read a series of articles by methods experts who strongly disagreed on the requirement for normality. If I have more time I can send you the cites.

The authors I cited write one of the better known text for statistics .... :)
 

Dason

Ambassador to the humans
#14
And I'm fairly sure a few of us agreed that you were misinterpreting what they were saying. I recall it not being particularly clear and that it was easy to misinterpret but it didn't contradict what I've been saying about the issue.
 
#15
To me it did not seem their quote was unclear. But perhaps you should post it again (I don't know where it is) and others could read it. Obviously I could be wrong.
 

Dason

Ambassador to the humans
#16
Essentially it comes down to whether they were referring to the marginal distribution of the response or the conditional distribution of the response. If one took it to mean the conditional distribution then it was in line with the truth :D ;) - If one took it to mean the marginal distribution then it was wrong. It was something along those lines and without the author providing clarification I recall thinking that it was easy to be confused by it but ... we've talked about this many times and it seems like it's a battle I can't win (even though I know I'm right :p)
 
#17
All I can say is that other individuals (including professors) I have spoken to disagree with your perspective:) I would never argue with you on a statistical point - your expertise and ability is far greater. My point is simply that experts don't seem to agree on this.

I have one professor (a statistician who's PHD is from Harvard and one of the smartest persons I have ever met). His comment is that statisticans commonly disagree on a wide range of issues -as of course do most academics. It's what fills journals.... :p

Incidently I found a macro for SAS that generates Mardia's Test of Multivariate Normality (which has been recommended to me as a way of testing the normality of the residuals). It is too long to post here, if there is a way to post word documents I will.
 
#18
Thanks for the very insightful discussion!

I have a question regarding PROC MIXED.
One of my variables after removing the outliers, when I tested for normality, it was normal and I decided to go ahead with regression analysis.
I have the following program:

proc mixed data=file19;
class id trt community BMI stage;
model FRUITS= trt community BMI BMI*community BMI* stage BMI*trt stage stage*trt/outp= FRUITSwk; [looking at the changes in fruits consumption=Y]
repeated / type=un subject=id(trt) sscp rcorr;
run;

I have 2 treatments
2 communities
3 stages repeated measures

The output I have is Trt 1 232 0.99 0.3218
Community 1 232 106.11 <.0001
BMI 135 232 3.36 <.0001
Community*BMI 69 232 3.41 <.0001
BMI*Stage 134 232 3.41 <.0001
Trt*BMI 64 232 4.30 <.0001
Stage 2 232 8.67 0.0002
Trt*Stage 2 232 3.04 0.0500


My question is I am interested in seeing the different treatment effects [2 treatments], with different communities [2 communities] and stages [3 stages]. What is the appropriate statement for looking at individual effects? Any feedback would be very helpful.

Thanks!
 
#19
Hi,

I am using PROC MIXED but am having a hard time outputting the residual error. i need that to check for normality. Can you tell me how you did that? The SAS code for that?