He does state that normality refers to the residuals of the model
Yes, distributional assumptions are on the residuals of our models.
"or the sampling distribution". However, we don't have access to the sampling distribution so we use our data as representative of the sampling distribution. (I don't know how accurate/valid that is).
I'm.... not sure what he means here. I mean, the sampling distribution of the standard deviation (coming from a normal parent population) is not normal.
It is chi-squared. Maybe he means the sampling distribution of the mean?
Then shortly later he talks about the CLT and how it means we can assume normality regardless of the shape of our sample data, and that with larger samples (I guess > 30?) we don't really need to worry about normality.
So... a lot to unpack here. First and foremost, the statement as stated here is simply wrong. The sampling distribution of the sample mean taken from a Cauchy distribution is NOT normal.
It's Cauchy-distributed. The sampling distribution of the maximum value of a sample coming from a normal parent population is, in itself, NOT normal.
It follows a Gumbel distribution.
Now what Andy is then forgetting (and I'm gonna take your word for it because I don't have the book here) is that the Central Limit Theorem is asymptotic. Sure, at infinity a lot of distributions do converge to the normal one. But I don't think you or me or most people have access to infinite sample sizes, right? So what becomes REALLY crucial here is the rate of convergence. Or, in other words, "how large does LARGE SAMPLE mean?"
Example here. Consider a Poisson random variable (a type of discrete distribution) with parameter \(\lambda=.01\). Folk wisdom tells us we only need to get to n>30 so we can assume normality of the sampling distribution of the mean, right? Well, let's be REALLY generous with ourselves and start with n=100. Small simulation in R:
Yeah... this is not looking very normal to me. And notice how we're already... what? More than 3 times over the recommended n>30? Let's try with a sample size of 1000:
Better... but still a lot of gaps in between. Ok, what about a sample size of 10,000:
Yeah, that looks more promising. So...yeah. You will, as \(n \to \infty\) get the normal distribution from the Central Limit Theorem, assuming a few of other things hold true first, like the
Lyapunov Condition. So this is not a free lunch that you can go around assuming all wily-nilly. Yes, it is a powerful result, yes it holds in a lot of cases but there are also other things that need to be true for it to work out. The question is, how large of a sample or how many trials or how much data do you have to collect before you have access to it? Because in most of social sciency stuff, getting samples on the 1000s is simply not feasible for the most part.
So should we even care about normality when analysing our data or not?
Over time you will learn that the correct answer to most of these questions is the very underwhelming and much hair-pulling "it depends". Does it matter much if you're calculating a simple means difference t-test? Probably not. Does it matter if you're doing some complicated Structural Equation Model? Oh, quite a bit! To the point that people have made whole careers just out of corrections to non-normality.
Very few things of what we do are routine. Aside from mind-numbingly boring and trivial cases, very few types of analyses can be done in a "rote" fashion. Sometimes you need to transform the data. Sometimes you need to change a different, non-standard model. Sometimes you need to come up with your *own* version of a regression-type model. Nothing that has any real scientific interest can be default-coded in SPSS. Yes, you see a lot of people doing it and getting away with it because, honest to god, there aren't enough people in our fields that are properly trained enough to catch the mistakes that other people do.
And why is there so much emphasis on normal distributions and statistics that assume normality?
Honestly, I think it's mostly a combination of convenience and lack of formal training in more advanced methods. Keep in mind that most of the statistical methods that you know of and will be using were developed... what? Maybe around 100-150 years ago? People did not have access to the computing power that we have today so they needed to make simplifying assumptions to make these methods usable to a certain degree. And the normal distribution is a convenient assumption because it shows up a lot in nature. Then psychology, sociology and all the other -ologies from social-science-land showed up trying to get legitimacy as a scientific endavours circa the 19th century; so they decided that whatever physics was doing (what was considered the "standard" of science in pre-WWI Europe) they needed to do as well. That's the convenience part.
Now the overall lack of formal training. There simply is no way around this so I'm just gonna say it. Anything that doesn't assume a normal distribution is hard. Like, REAL hard. I've been doing research on the sampling distribution of the simple, bivariate correlation under a very specific (and restrictive) type of non-normality. If you assume normality, the sampling distribution of the correlation is a simple, friendly t-distribution like you'd find in linear regression. You know, a simple regression with only 1 predictor so that the slope is the correlation coefficient if the variables are standardized. So far so good. This is how it looks like under a very restrictive and relatively "simple" type of non-normality:
Could you imagine if we taught something like *that* to intro psych students?
