Non-normal data, non-parametric tests for normality, and determination of statistical parameters

#1
Hi,

I have a database with more than 50000 observations. I have applied non-parametric tests to determine the normality of the data but in any case p<0.05, rejecting the null hypothesis of normality (in many cases, graphically, the histograms appear to follow a bimodal distribution). However, to elaborate a table, I don't know if it would be better to determine mean and standard deviation or median and median absolute deviation, since following the Central Limit Theorem (CLT), when the sample is large enough, it can approximate a normal distribution.
Captura de pantalla 2021-04-25 a las 18.57.54.png
Furthermore, for the determination of normality by means of statistical contrasts, since the Shapiro-Wilk test cannot be used due to the large number of observations, would it be better to use the Kolmogorov-Smirnov test or the Anderson-Darling test?

Thanks,

David
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
You need to understand the data generating process. There seems to be two underlying distributions, such as weights of male and female lions. If you don't stratify you end up with a bimodal distribution. Figure out your underlying distributions and the issue can be resolved.
 
#3
You're right, I have decomposed this histogram of temperatures by seasons (the figure I posted shows the distribution of the annual values of temperature). The problem is that I have to determine mean and sd or median and median absolute deviation for this histogram, and I don't know which will be the best option since the CLT affirms that I can assume the normal distribution...
 

Karabiner

TS Contributor
#4
I have to determine mean and sd or median and median absolute deviation for this histogram,
So where's the problem? You can calculate such parammeters regardless of the distribution.
and I don't know which will be the best option since the CLT affirms that I can assume the normal distribution...
You cannot assume normal distribution of data if the data are not normally distributed.
The CLT has nothing to do with this. The CLT tells you that for example sample means are
normally distributed if sample size is large enough. But may that is what you wanted to say.

With kind regards

Karabiner
 

hlsmith

Less is more. Stay pure. Stay poor.
#5
@David E.S. why do you "have to" calculate these if they are not informative of the data. If I reviewed these estimates without a visualization of data, I would be totally misled about the data distribution.
 
Last edited:

noetsi

No cake for spunky
#6
QQ plots are better ways to look at distribution anyhow. The test rarely work, or are heavily criticized in any case.