This would

mean, simply neglect the normality assumptions for the

testing of Pearson's r, if sample size is large?

Yeah, I've never quite got my head around that one either; it doesn't make sense. Anybody got any ideas?

well... it seems pretty reasonable to me as for why the behaviour of the test statistic for the case of correlation and simple bivariate regression SHOULD be the same... but the important thing here is that we don't need to conjecture anything about it. we can simulate it.

first, i'd like to direct people's attention

HERE for why i am just simulating data with a non-zero kurtosis. kurtosis is intrinsically related with the variability of

*anything* variance related... so if you're after the standard errors of covariances, variances or correlations, kurtosis is going to show up one way or another.

the first half of the article is a very good and useful explanation of how kurtosis distorts the standard errors of variances/covariances/correlations. the second half deals with more Structural Equation Modelling (SEM) stuff so i wouldn't recommend it as much unless you're interested.

anyhoo... the simulation.

first, i ran a "baseline" simulation where the data was distributed bivariate normal and got what you would expect: the empirical rejection rates were very, very close to .05 (and equal in both cases) because it's essentially the same test.

let's see what happens when you have a smallish sample size of 30:

Code:

```
library(lavaan) # need this to generate the data
# correlation is 0 in the population so any rejection rate over .05 is an inflated Type I error rate
mod1 <- 'x ~~ 0.0*y
x ~~ 1*x
y ~~ 1*y'
reps <- 10000 #10,000 repetitions
N <- 30 #sample size of 30
pval_reg <- double(reps)
pval_cor <- double(reps)
for (i in 1:reps){
datum <- simulateData(mod1, sample.nobs=N, skewness=c(0,0), kurtosis=c(25,25)) #population skewness is 0 and population kurtosis is 25 for each variable
fitt <- summary(lm(y~x, data=datum))
pval_reg[i]<-fitt$coefficients[2,4]
pval_cor[i]<-cor.test(datum$x,datum$y)$p.value
}
sum(pval_reg<.05)/reps
[1] 0.062
sum(pval_cor<.05)/reps
[1] 0.062
```

so an inflated type I error rate as an effect of non-zero kurtosis for both.

let's do it again but with a sample size of 1000

Code:

```
library(lavaan) # need this to generate the data
# correlation is 0 in the population so any rejection rate over .05 is an inflated Type I error rate
mod1 <- 'x ~~ 0.0*y
x ~~ 1*x
y ~~ 1*y'
reps <- 10000 #10,000 repetitions
N <- 30 #sample size of 30
pval_reg <- double(reps)
pval_cor <- double(reps)
for (i in 1:reps){
datum <- simulateData(mod1, sample.nobs=N, skewness=c(0,0), kurtosis=c(25,25)) #population skewness is 0 and population kurtosis is 25 for each variable
fitt <- summary(lm(y~x, data=datum))
pval_reg[i]<-fitt$coefficients[2,4]
pval_cor[i]<-cor.test(datum$x,datum$y)$p.value
}
sum(pval_reg<.05)/reps
[1] 0.048
sum(pval_cor<.05)/reps
[1] 0.048
```

which is pretty close to .05. so yeah... we can validate the intuition Karabiner had. for large sample sizes, distributional assumptions become irrelevant for these tests. my guess would be, of course, that if the method that you're using is complex (like SEM instead of a simple bivariate regression/correlation) then the distributional assumptions become important again.