1. ## Multivariate normality

I know various methods of determing if the bivariate relationship of two variables are normal. But, as Dason has drilled into me, its multivariate normality (that is the normality of the residuals) that actually matters.

I am not sure how to test for that. If you plotted the residuals into a QQ plot and it suggested normality, would that be a valid way to be sure the regression model had multivariate normality?

2. ## Re: Multivariate normality

Although I am not particularly a fan of normality tests, I know that tests like the Shapiro-Wilk Multivariate Normality Test exist.

A quick google search also shows there are R packages (PDF alert).

Maybe you could look into the theory behind these tests and comeup with something satisfactory?

3. ## The Following User Says Thank You to TheEcologist For This Useful Post:

noetsi (06-21-2013)

4. ## Re: Multivariate normality

I think you still have a misunderstanding. The reason I kept mentioning multivariate normality is because you were talking about the response variable being normally distributed. Some authors might say that Y needs to be normal but if that's the case then they're talking about multivariate normality for Y where the mean vector is a function of X. If all we want to do is check the normality assumption we can stick to univariate normal tests because the previous statement is the same as asking if the residuals are univariately normally distributed...

5. ## The Following User Says Thank You to Dason For This Useful Post:

noetsi (06-21-2013)

6. ## Re: Multivariate normality

the psych package in R also has mardia's test of multivariate skewness/kurtosis where, if statistically significant, gives you evidence to suspect your distribution is not multivariate normal.

i know noetsi uses Mplus, and Mplus also gives you mardia's test.

now, for what reason in particular do you need to test for multivariate normality?

7. ## The Following User Says Thank You to spunky For This Useful Post:

noetsi (06-21-2013)

8. ## Re: Multivariate normality

Originally Posted by TheEcologist
Although I am not particularly a fan of normality tests
.... because...?

9. ## Re: Multivariate normality

Originally Posted by spunky
.... because...?
I can't answer for him but I feel similarly. Typically they aren't that great with small sample sizes and once you get a large enough sample size to detect departure from normality then you have a large enough sample to not care about normality...

10. ## Re: Multivariate normality

I don't use them because they have a strong reputation for very weak power.

I can not use Mplus at work (the state will not allow it's purchase nor let me purchase it personally and place it on the computer - don't ask why) and it will be a while before I learn R.

Is it legitimate to use the residuals of a regression in a QQ plot to test for normality?

once you get a large enough sample size to detect departure from normality then you have a large enough sample to not care about normality...
Why would that ever be true? I understand the CLM comes into play at a certain point, but I read treatments all the time in the literature about normality and I have almost never seen one argue that at a certain sample size normality does not matter.

11. ## Re: Multivariate normality

Originally Posted by Dason
I can't answer for him but I feel similarly. Typically they aren't that great with small sample sizes and once you get a large enough sample size to detect departure from normality then you have a large enough sample to not care about normality...
Exactly, also once you have a large sample size these tests also tend detect significant "non-normality" when departures from normality are meaningless.

I mean look what one outlier in 5000 does to a shapiro.test

Code:
``````# test once
shapiro.test(c(rnorm(4999),-5.5))
# test 100 times
pvals<-replicate(100,shapiro.test(c(rnorm(4999),-5.5))\$p.value)
plot(density(pvals))
abline(v=0.05,col='red')``````
You can bet your pretty pink panties that the "sampling distribution" of the above is normal.

12. ## Re: Multivariate normality

Of course years ago I read a Stanford professor argue outliers could make the results of ANOVA invalid regardless of the sample size (that is asymptotic methods were no protection against bias - he argued the bias could actually get worse with larger samples given this issue).

13. ## Re: Multivariate normality

I don't have pretty pink panties. Did I not get my pair with TS membership?

14. ## Re: Multivariate normality

I can't remember who (link or vinux maybe) wrote a post on the old statspedia where ever that got too about multivariate normality.

15. ## Re: Multivariate normality

Originally Posted by Dason
large enough sample size to detect departure from normality then you have a large enough sample to not care about normality...
uhm... i can see how this is true in the case of least-squares but would it also apply for ML? i remember reading in Pawitan's classic book on everything-you-need-to-know-about-maximum-likelihood that the choice of likelihood could (or could not) create a whole bunch of problems in terms of parameter bias, etc. so i do think that multivariate normality should be satisfied (at least as much as possible) if one is choosing a normal likelihood model, or something similar to it.

16. ## Re: Multivariate normality

Originally Posted by TheEcologist
Exactly, also once you have a large sample size these tests also tend detect significant "non-normality" when departures from normality are meaningless.
oh pff... that's just bad data practice not to check (and dump) outliers before the analysis

17. ## The Following User Says Thank You to spunky For This Useful Post:

trinker (06-21-2013)

18. ## Re: Multivariate normality

Originally Posted by spunky
oh pff... that's just bad data practice not to check (and dump) outliers before the analysis
I'm sorry but what you are describing is actually bad practise, IMO very bad practise. It's sad that this is still taught as "common statistical sense" is some fields.

You should only ever "dump" outliers, kicking and screaming, being very certain they are errors.
You should certainly not dump them on a reflex!

Best thing I can do is quote my FAQ part on this;

Originally Posted by TheEcologist
How do I remove or deal with outliers?

Removing outliers can cause your data to become more normal but contrary to what is sometimes perceived, outlier removal is subjective, there is no real objective way of removing outliers.

Always remember that these points remain observations and you should not just throw them out on a whim. Instead you should have good reasons to remove your outliers. There may be many truly valid reasons to remove data-points. These include outliers caused by measurement errors, incorrectly entered data-points or impossible values in real life. If you feel that any outlier are erroneous data points and you can validate this, then you should feel free to remove them.

On the other hand, if you see no reason why your outliers are erroneous measurements then there is no truly objective way to remove them. They are true observations and you may have to consider that the assumptions of your test do not correspond to the reality of your situation. You could always try a non-parametric test (which in general are less sensitive to outliers) or some other analysis that does not require the assumption that your data is normally distributed.
Or from Wikipedia
Deletion of outlier data is a controversial practice frowned on by many scientists and science instructors
Outliers are data, dumping them is bad data practise and you should feel very dirty evertime you do it without a very good reason.

19. ## The Following User Says Thank You to TheEcologist For This Useful Post:

Englund (06-24-2013)

20. ## Re: Multivariate normality

oh, pff....

if dumping outliers is good enough for NASA then it's good enough for me

nah, in all seriousness. i was really into this (and other good statistical practices) like a few years ago... then when i started doing stats consulting for students and profs alike i realised everybody was dumping them (or tinkering with their data in other unspeakable ways) and kept on saying no... then the weeks became months and the months became years and i started noticing that even after people had gone through the mandatory research methods courses where they were instructed to not do it... they were still doing it.

i felt too tired to swim against the current and just lost interest. i now know it shouldn't be done and i guess i'm quite happy with that.

Page 1 of 4 1 2 3 4 Last

 Tweet