PDA

View Full Version : Anderson-Darling test appropriate?

rhayman
09-14-2010, 03:17 AM
Hi,

I have a large (>2500) number of z-scores and I want to test if the distribution of these deviates from a unit normal distribution. I have read around a bit and it seems as though the Anderson-Darling test is the most appropriate for assessing this. I suppose my question is: Is it really? Most of the cases I have seen where the test is deployed are with n's much lower than mine and so was wondering if there is a correction I should be employing OR a more appropriate test to use?

Many thanks in advance for any help.

ichbin
10-02-2010, 02:43 AM
Anderson-Darling is fine. I tend to prefer Kuiper or Kolmogorov-Smirnov because there is a well-developed theory of their distributions, which can be computed even for low N. Many people argue endlessly over the best possible test specifically for normality, but I like the EDF tests precisely because they can be applied to test against any theoretical distribution, not just the normal distribution.

Big N is not a problem. On the contrary, the assumptions underlying most tests get better satisfied as N gets bigger.

Dason
10-07-2010, 03:39 PM
Big N is not a problem. On the contrary, the assumptions underlying most tests get better satisfied as N gets bigger.

This is true. But don't let it get you down if you reject a normality test when N is large (depending on what you're trying to do). If all you're doing is testing whether natural data is approximated well by a normal distribution AND you have a very large sample size you will most likely reject the null because almost no data is perfectly normally distributed.

terzi
10-17-2010, 02:25 PM
But I like the EDF tests precisely because they can be applied to test against any theoretical distribution, not just the normal distribution.

Big N is not a problem. On the contrary, the assumptions underlying most tests get better satisfied as N gets bigger.

Hi all,

I'd like to interrupt these thread since I think I could add something interesting. I'd like to start with a quote by D'Agostino and Stephens (1986) taken from H. Kvam's Non parametric Book:

. . for testing for normality, the Kolmogorov-Smirnov test is only a historical curiosity. It should never be used. It has poor power in comparison to [specialized tests such as Shapiro-Wilk, D'Agostino-Pearson,
Bowman-Shenton and Anderson-Darling tests].

Usually, the specialized options should always be preferred since KS test is more general. In fact, most tests, such as the Anderson-Darling one, were developed in order to improve the Kolmogorov-Smirnov approach.

On the other hand, big n is indeed a problem. A Huge one in some cases. As Dason stated, a huge sample size will almost certainly reject normality, since any minor deviation would tend to influence the statistic. For these cases with big n, some special alternatives have been developed, such as the Shapiro-Francia test, which is a corrected version of the Shapiro-Wilk test that will work better with big sample sizes.

Hope to help a bit. Greetings!

ichbin
10-17-2010, 07:26 PM
Hi terzi! I appreciate your chiming in, and I certainly don't disagree that it's better to use a more powerful test. But I do have quibbles with some of what you say here.

1. AD isn't actually a normal-specific test. It's just KS with a re-weighting to increase tail sensitivity. A side-effect of that re-weighting is that the null distribution is no longer universal, but the asymptotic form of the null distribution is available for other distributions besides the normal. A better approach to the tail insensitivity problem of KS, in my opinion, is the Kuiper test, which, like the KS test, has a universal null distribution which is known for small N as well as asymptoticly large N. (I actually always use Kuiper instead of KS or AD whenever I can, but KS has the advantage of greater familiarity for many people.)

2. The only sense in which the failure of normality tests for large N is a problem is that it is telling people a true thing about their data that they do not want to hear. Of course, there is difference between significance and strength, and one feels intuitively that, if a departure from normality is very small, tests that assume normality should still be very good. The trouble is, I am unaware of any quantitative theory expressing how a small departure from normality affects the reliability, of, say, an ANOVA. I would be very intersted if you could point me to any such analysis.

3. For scientists with good control over their experiements and detailed theories of them, the failure of a hypothesis is not dismissed lightly just because there is a lot of data. Analyzing spectral data from a single-ion trap, I can get no significant deviation from a Voigt profile with tens of millions of counts.

4. Shapiro-Francia isn't better than Shapiro-Wilk for large N in the sense that it will purposely ignore small deviations from normality that Shapiro-Wilk would flag. (That would make it a worse test, not a better one!) It is better for large N only in the sense that calculating it does not require calculating and inverting a very large matrix.

5. I am not aware of any work showing that truly normal-specific tests like Shapiro-Wilk or D’Agostino’s K^2 are actually more powerful than EDF tests like AD or Kuiper. (I'll grant you KS because of it's tail insensitivity problem.) Can you point me to any?

Thanks again for your remarks. I would be very interested in your responses!

Dason
10-17-2010, 07:52 PM
The trouble is, I am unaware of any quantitative theory expressing how a small departure from normality affects the reliability, of, say, an ANOVA. I would be very interested if you could point me to any such analysis.

ANOVA (linear models in general) can have their point estimates derived without any distributional assumptions. It's only once we get to inference that we make the distributional assumption of normality. In this sense the point estimates are robust. To estimate the effect of nonnormality I don't know of any quantitative theory about how large or small of a departure can be observed before seriously having an impact but if you're interested it's always easy enough to throw together a monte carlo simulation to investigate this. I'm sure it's been done before but it's simple enough that it wouldn't take much time (and then you'd have the tools to investigate other types of departures if those interested you as well).

terzi
10-17-2010, 10:04 PM

AD isn't actually a normal-specific test. It's just KS with a re-weighting to increase tail sensitivity.

I never meant a normal specific test, it is just a specialized test. It is true that the Anderson-Darling test is used to verify if a sample of data came from a population with a specific distribution (not necessarily normal). It is indeed a modification of the KS test that accounts for the distribution and gives more attention to the tails. KS test is distribution free in the sense that the critical values do not depend on the specific distribution being tested. The Anderson-Darling test makes use of the specific distribution in calculating the critical values.

The only sense in which the failure of normality tests for large N is a problem is that it is telling people a true thing about their data that they do not want to hear.

I think you are wrong here, or maybe I'm just confused. If your are claiming that the effects of large sample sizes in Normality tests are not important, you may be incorrect. The analytical tests can be misleading with large samples. As the sample size gets large, the test can get pickier about what is considered a departure from the hypothesized null distribution. In short, your data might look normally distributed to you for all practical purposes, but if it is not exactly normal the goodness of fit test will eventually find this out. For large sample sizes, a normality test's power could become huge, therefore it will detect departures from normality that are really not relevant:

http://www.basic.northwestern.edu/statguidefiles/n-dist_ass_viol.html#Special problems with large sample sizes

The trouble is, I am unaware of any quantitative theory expressing how a small departure from normality affects the reliability, of, say, an ANOVA. I would be very intersted if you could point me to any such analysis.

This is a really common analysis, for both normality and multivariate normality assumptions. This is just an example for ANOVA:

http://dx.doi.org/10.1016/0360-8352(96)00127-1

For scientists with good control over their experiments and detailed theories of them, the failure of a hypothesis is not dismissed lightly just because there is a lot of data.

Absolutely true. In fact, as it is universally known, the greater sample size, the greater power a test will have. This problem is particular to normality tests, one of the reasons why I usually recommend using mostly graphical procedures that, albeit subjective, tend to give more information, at least for testing normality. Personally, I'd recommend using formal tests only for published studies (because you are usually ethically forced to do so:))

Shapiro-Francia [...] is better for large N only in the sense that calculating it does not require calculating and inverting a very large matrix

Yeah, Shapiro-Francia test is easier to calculate and it involves less computational errors, yet it is also more consistent and works better on aggregate (or tied) data. I think the advantages go far beyond a matrix:

http://www.jstor.org/pss/2335386

http://stata-press.com/journals/stbcontents/stb3.pdf

http://www.hicstatistics.org/2003StatsProceedings/Roumporn%20Sittimongkol.pdf

I am not aware of any work showing that truly normal-specific tests like Shapiro-Wilk or D’Agostino’s K^2 are actually more powerful than EDF tests like AD or Kuiper. Can you point me to any?

Here's one:

http://interstat.statjournals.net/YEAR/2002/articles/0201001.pdf