- Thread starter ooostats
- Start date
- Tags assumptions distribution normality

I'm just thinking, if I want to decide whether to analyse my data with ANOVA or some non-parametric alternative, then I see whether I'm violating the assumptions. If the residuals are calculated after running the test itself, then I have no idea of knowing whether the test is appropriate before running it in the first place. I was taught to check if my DV itself was normally distributed, and if so then parametric. Otherwise, non-parametric.

It might sound counter intuitive but we typically don't check the normality, equal variances, independence, ... assumptions until after we fit the model.

Not so much counter intuitive, just completely the opposite of what I thought I was being taught! I study psychology, and all resources explicitly talk about checking assumptions prior to applying tests.

It makes me wonder whether people teaching/writing the textbooks don't fully understand it themselves, and/or there is a large degree of subjectivity.

With kind regards

Karabiner

Last edited:

Not so much counter intuitive, just completely the opposite of what I thought I was being taught! I study psychology, and all resources explicitly talk about checking assumptions prior to applying tests. It makes me wonder whether people teaching/writing the textbooks don't fully understand it themselves, and/or there is a large degree of subjectivity.

Every time I have student tell me "but my textbook says...." or "well, I was taught that..." my immediately reply is something like "Sure. But then again you're coming from the field that started the Replication Crisis so... ¯\_(ツ)_/¯"

BTW, shameless self-promotion that is actually tied on to what you're asking: https://psychometroscar.com/2018/07/11/normality-residuals-or-dependent-variable/

Not at all opposite. Dawson wrote "we typically don't check ... assumptions until **after we fit the model**". He did not say: "we typically don't check assumptions until after we performed statistical tests of significance".

Or maybe students sometimes do not read carefully enough.

With kind regards

Karabiner

With kind regards

Karabiner

Every time I have student tell me "but my textbook says...." or "well, I was taught that..." my immediately reply is something like "Sure. But then again you're coming from the field that started the Replication Crisis so... ¯\_(ツ)_/¯"

I guess I'm one of those students then!

and teaching material are the main sources of error.

So could you both please recommend a textbook/resource that covers these topics correctly?

With kind regards

Karabiner

So could you both please recommend a textbook/resource that covers these topics correctly? All I ever hear about is Andy Field's book, but I can't see him covering this in a definitive way. On one page we should check assumptions, and our DV should be normal, and on the next page it doesn't matter because the sampling distribution is normal, and we shouldn't bother checking if our DV is normal.

Like sometimes I feel I wanna scream every time I hear the whole "just look for n>30 and you should be OK". OMG SO.MUCH.WRONG.WITH.THAT.

Honestly, I'm still searching for a good book in the social sciences that accurately reflects the stuff we should need. For better or worse, the most "approachable" book I've ever found is this one:

https://www.amazon.ca/Mathematical-Statistics-Data-Analysis-Sets/dp/0534399428

And even I wouldn't dare use that for a graduate methods course in the social sciences. I usually just recommend it to people I'm helping supervise if I see they've got good math chops. And then we work on it together.

I think I even have 1 or 2 simulation examples in R that I now copy-paste on my reviews to show why n>30 doesn't automatically mean you can get away with whatever your want. Because the papers that use it dont have Ns even in the 100s. It's more like "We got 35 people. But by the powers of the CLT, everything that we did is now justified"

OMG did the Andy Field book say that? And here was me hoping at least *he* would get things right.

So should we even care about normality when analysing our data or not? And why is there so much emphasis on normal distributions and statistics that assume normality?

Thanks for the discussion!

He does state that normality refers to the residuals of the model

"or the sampling distribution". However, we don't have access to the sampling distribution so we use our data as representative of the sampling distribution. (I don't know how accurate/valid that is).

Then shortly later he talks about the CLT and how it means we can assume normality regardless of the shape of our sample data, and that with larger samples (I guess > 30?) we don't really need to worry about normality.

Now what Andy is then forgetting (and I'm gonna take your word for it because I don't have the book here) is that the Central Limit Theorem is asymptotic. Sure, at infinity a lot of distributions do converge to the normal one. But I don't think you or me or most people have access to infinite sample sizes, right? So what becomes REALLY crucial here is the rate of convergence. Or, in other words, "how large does LARGE SAMPLE mean?"

Example here. Consider a Poisson random variable (a type of discrete distribution) with parameter \(\lambda=.01\). Folk wisdom tells us we only need to get to n>30 so we can assume normality of the sampling distribution of the mean, right? Well, let's be REALLY generous with ourselves and start with n=100. Small simulation in R:

Yeah... this is not looking very normal to me. And notice how we're already... what? More than 3 times over the recommended n>30? Let's try with a sample size of 1000:

Better... but still a lot of gaps in between. Ok, what about a sample size of 10,000:

Yeah, that looks more promising. So...yeah. You will, as \(n \to \infty\) get the normal distribution from the Central Limit Theorem, assuming a few of other things hold true first, like the Lyapunov Condition. So this is not a free lunch that you can go around assuming all wily-nilly. Yes, it is a powerful result, yes it holds in a lot of cases but there are also other things that need to be true for it to work out. The question is, how large of a sample or how many trials or how much data do you have to collect before you have access to it? Because in most of social sciency stuff, getting samples on the 1000s is simply not feasible for the most part.

So should we even care about normality when analysing our data or not?

Very few things of what we do are routine. Aside from mind-numbingly boring and trivial cases, very few types of analyses can be done in a "rote" fashion. Sometimes you need to transform the data. Sometimes you need to change a different, non-standard model. Sometimes you need to come up with your *own* version of a regression-type model. Nothing that has any real scientific interest can be default-coded in SPSS. Yes, you see a lot of people doing it and getting away with it because, honest to god, there aren't enough people in our fields that are properly trained enough to catch the mistakes that other people do.

And why is there so much emphasis on normal distributions and statistics that assume normality?

Now the overall lack of formal training. There simply is no way around this so I'm just gonna say it. Anything that doesn't assume a normal distribution is hard. Like, REAL hard. I've been doing research on the sampling distribution of the simple, bivariate correlation under a very specific (and restrictive) type of non-normality. If you assume normality, the sampling distribution of the correlation is a simple, friendly t-distribution like you'd find in linear regression. You know, a simple regression with only 1 predictor so that the slope is the correlation coefficient if the variables are standardized. So far so good. This is how it looks like under a very restrictive and relatively "simple" type of non-normality:

Could you imagine if we taught something like *that* to intro psych students?

Last edited:

Maybe he means the sampling distribution of the mean?

Interesting post. I would like to learn as much as I can about all of this now, just I don't have a maths background so this is quite heavy. And that's the thing, if it were taught in psych classes then it would be quite pointless because most wouldn't have a clue. But this is clearly part of the problem - people need better training, and those training them also need better training I suppose. I remember I had an ex-physicist teaching my stats, but whenever things got complicated the attitude was "I'm not going into the details of why, just trust me .........". I was grateful at the time but now I'm questioning everything

What do you think this means for other non-psychology applications of parametric statistics? For example I'm aware many applications of "machine learning" are simply based on multiple linear regression. There is never any talk of specific parametric assumptions before applying such methods in any resource I've come across (other than that the outcome data should be continuous). I can't see why these would be less of a consideration just because they aren't necessarily interested in p < .05... they are still ultimately trying to fit a model that might not be appropriate.

cannot achive normality of the sampling distribution, unless you have very large n's (in which case standard

errors become so small that one wouldn't cre anyway). But if the distribution looks roughly normal (or if Poisson,

Cauchy, Gamma distribution etc. can be ruled out, based on substantial considerations), then 30 (or 100?)

should be sufficient"?

Ragards

Karabniner

Yeah this is what he means.

Interesting post. I would like to learn as much as I can about all of this now, just I don't have a maths background so this is quite heavy. And that's the thing, if it were taught in psych classes then it would be quite pointless because most wouldn't have a clue. But this is clearly part of the problem - people need better training, and those training them also need better training I suppose. I remember I had an ex-physicist teaching my stats, but whenever things got complicated the attitude was "I'm not going into the details of why, just trust me .........". I was grateful at the time but now I'm questioning everything

What do you think this means for other non-psychology applications of parametric statistics? For example I'm aware many applications of "machine learning" are simply based on multiple linear regression. There is never any talk of specific parametric assumptions before applying such methods in any resource I've come across (other than that the outcome data should be continuous). I can't see why these would be less of a consideration just because they aren't necessarily interested in p < .05... they are still ultimately trying to fit a model that might not be appropriate.

Interesting post. I would like to learn as much as I can about all of this now, just I don't have a maths background so this is quite heavy. And that's the thing, if it were taught in psych classes then it would be quite pointless because most wouldn't have a clue. But this is clearly part of the problem - people need better training, and those training them also need better training I suppose. I remember I had an ex-physicist teaching my stats, but whenever things got complicated the attitude was "I'm not going into the details of why, just trust me .........". I was grateful at the time but now I'm questioning everything

What do you think this means for other non-psychology applications of parametric statistics? For example I'm aware many applications of "machine learning" are simply based on multiple linear regression. There is never any talk of specific parametric assumptions before applying such methods in any resource I've come across (other than that the outcome data should be continuous). I can't see why these would be less of a consideration just because they aren't necessarily interested in p < .05... they are still ultimately trying to fit a model that might not be appropriate.

Dr. Leona Aiken from Arizona State University has done a lot of interesting research on this in psychology, showing some pretty convincing (and damning) evidence that, at least in the United States, people with PhDs in Quantitative or Mathematical Psychology are undervalued, lose on academic positions for which they are perfectly qualified for if the other candidates have a substantive area of expertise and, those who get hired, are expected to perform FAR more service for the department than others. So kudos to you for questioning everything! Borrowing something from Twitter, I honestly don’t think I trust any psych/social science paper from before 2011 when the Replication Crisis started making its rumbles.

Well… a lot of Machine Learning stuff is more interested in prediction rather than inference. And parametric assumptions are most important for matters of inference. HOWEVER (and this is my real pet-peeve) is that it doesn’t matter which method from the Machine Learning/Data Mining/Data Science/choose-a-fancy-name toolbox you want to use, there are ALWAYS assumptions on everything we do. And these methods are pernicious in the sense that they are very easy to implement but very hard to understand form a theoretical/mathematical perspective. And sometimes the method is developed before the theory gets solid around it, which means we don’t always understand why things work the way they work and, more importantly, in which cases they *shouldn’t* work. However you have a constantly increasing group of people jumping into this (because, if anything, it promises a quick, good-paying job) without the necessary theoretical skills to be critical of them. And that’s gonna end up spelling disaster at some point. Dr. Cathy O’Neil (author of “Weapons of Math Destruction”) goes into this in a quite detailed fashion concludes her book by saying she is expecting a “2008 Financial Crisis” type meltdown (she was, after all, one of the “quants” of Wall-Street who contributed to the Crisis) in the future for applying these types of algorithms without any regard to their limitations or critical appreciation of what they can and cannot do. Of much more epic proportions, of course, because this won’t be limited to the financial markets given that most of our everyday lives are influenced, in one way or another, by the decisions made from these algorithms.

cannot achive normality of the sampling distribution, unless you have very large n's (in which case standard

errors become so small that one wouldn't cre anyway). But if the distribution looks roughly normal (or if Poisson,

Cauchy, Gamma distribution etc. can be ruled out, based on substantial considerations), then 30 (or 100?)

should be sufficient"?

Ragards

Karabniner

Like… let’s unpack a few things. When you say “you cannot achieve normality of the sampling distribution” the immediate question is the sampling distribution of what? Are we exclusively talking about the sampling distribution of the mean? Because, for example, the sampling distribution of the standard deviation is chi-square. And if you rule out enough distributions (because a similar result to what I showed with the Poisson can be showed with the Binomial, for instance. Or the Hypergeometric) then you are probably going to end up with where the original rule of thumb comes from: The t-distribution. If my memory serves me right, the whole n>30 thing comes from the old times, back in the day when we needed tables of numbers to obtain p-values. I think the story goes that by the time you get n>30, the area under the curve of the t-distribution is below some sort of minimal error from the normal, so you can use the z-tables as opposed to the t-table. The irony being that the “rule of thumb” was really more a matter of convenience, where textbooks wanted to save on paper by re-using the z-tables for the t-test. But of course, this took a mind of its own and now it’s prescribed almost as a theorem.

Now, when you say “standard errors become so small that one wouldn't care anyway”, let’s place ourselves on the opposite side of things. If you look at Yuan, Bentler & Zhang (2005) Eqn (6) you can see that the kurtosis of the distribution biases the MLE standard error of the variance (and the covariance/correlation). If the kurtosis is positive, the standard error is biased downwards and, if it is negative, it is biased upwards. And this is an asymptotic result letting \(n \to \infty\) so its influence goes away. Now you can find yourself in the awkward situation where minute effect sizes are statistically significant not because the effect is there, but because of the kurtosis of the distribution. Or, on the other hand, you have to deal with the curious interplay between a negative kurtosis and the sample size. So… I don’t know about the “one wouldn’t care anyway” part. *UNLESS* we are really *only* talking about the sample mean and nothing beyond that.

So, to be honest, what I do these days is to advocate for training in simulation methods OR pairing up with people who can do simulations for you. Especially when it comes to power analysis and stuff like that. That would be my rule of thumb: check by simulation first.

Last edited: