Multivariate Normality

I have a data set I am going to use SEM on. Each of the variables seems to have acceptable skewness and kurtosis (using cutoffs of 3 and 10 on skewness index and kurtosis index respectively).

However, using Mardia's test for multivariate normality I am getting a Mardia's skewness of 15.675, with a test statistic of 1080, p=0. Mardia's kurtosis of 210, with test statistic of 7.77, p<.0001. N=410.

I know that the p-values are sensitive to sample size, so maybe p-value isn't the best to use (although these p-values seem very small) - is there a cutoff for the kurtosis and skewness coefficients themselves that I should go by?

Given that we are attempting to publish this SEM, how should I handle this. Should I try to transform some of my variables? Use an estimation method other than maximum likelihood? Or just ignore the apparent non-normality? Most of the SEM papers I've seen don't really address normality - but that doesn't mean its the right thing to do, I suppose...


TS Contributor
Hi xralphyx,

Certainly, hypothesis testing can be influenced by sample size, though your p values seem to be considerably low. You can either try other multivariate normality test or try by checking the assumption graphically. I would recommend the latter since this way you can also get some idea about what the problem may be. In order to detect multivariate normality with a graph, plot the squared Mahalanobis Distance against the quantiles of a Chi-square distribution with the degrees of freedom equal to the number of variables. There's some info about that graph in this link. With that, you can understand more deeply the distribution of your data.

Now, you are incredibly right when you say that multivariate normality it is a important assumption in SEM and should be addressed. When there are violations, there are other models available, like PLS models, which don't require multivariate normality, as far as I recall. Also, there are corrections that can be applied to the estimators in order to improve the results. And of course, you can transform your data (yet that may affect interpretations). The way to go will depend on what you are measuring and the type of distribution you have.

Hope this helps!


New Member

It's incredibly important to understand what your data look like (checking for skew and kurtosis, both uni- and multivariate), but at the end of the day I think it's easiest to just use test statistics that account for non-normality. A lot of programs will provide these just by clicking the appropriate options.

I do SEM in EQS and often use Robust ML (Maximum likelihood; see Satorra-Bentler, 1990) with N~400-500 and Mardia Coeff's anywhere from 3 to 200. I'm a grad student at UCLA and Peter Bentler himself (a faculty member at UCLA) has looked over manuscripts of mine in which I report these stats/approach and it gets his approval. So, for whatever that's worth. For a more formal source to cite, Ullman and Bentler (2013) report data analyzed (albeit as a descriptive example) with ML Robust when Mardia's = 238.65. Citations below.

You could play around with transformations, but as the other post mentions, they may not fix the problem anyways. And in terms of publications, unfortunately I think there's still some stigma around transformations (as though you've done something sneaky to make your data look better, which is completely unfounded). So, again, I think it's easiest just to mention that there was considerable variance among each of your variables but the data were not multivariate normally distributed, so you used robust maximum likelihood estimation (cite Satorra-Bentler, 1990)

Satorra, A., & Bentler, P. M. (1990). Model conditions for asymptotic robustness in the analysis of linear relations. Computational Statistics & Data Analysis, 10, 235-249.

Ullman, J. B., & Bentler, P. M. (2013). Structural equation modeling. In J. A. Schinka & W. F. Velicer (Eds.), Handbook of Psychology, Vol 2: Research Methods in psychology (pp. 661-690). Hoboken NJ: Wiley.