Help with normal distribution

#1
Hello friends of the forum
I have the following doubt ...

If I have a sample of only 24 discrete data, how can I prove that these data follow a normal distribution?

When I enter the data into a specialized software, the software tells me that the sample fits a normal distribution, but I'm confused, because I understand that the normal distribution is ideal for continuous data, besides, what proof of normality can I use? Whether the Kolmogorov Smirnov test or the Shapiro-Wilk test find that they are only used for continuous variables ....

Thank you very much for your answers, thank you
 
#2
Hi, what do you mean by discrete data? count data? What kind of statistical analysis do you want to perform with your data? Probably you should choose a method which does not have to assume normality?
 
#3
Thank you very much for reply

Yes, I mean to data that can only take certain values. For example: the number of students in a class (you can't have half a student).(Opposite of Continuous Data).

I have to characterize the behavior of demand for certain types of products, know which distribution they fit.

So, when I enter the data into the software and I do a fitting distribution test, the results show that the data follows normal distribution, but I'm confused, because how can I apply normality tests to that data, if those tests are for Continuous data Kolmogorov Smirnov test or the Shapiro-Wilk test)
 

rogojel

TS Contributor
#4
Actually discrete data can fit quite well to a normal distribution - see the normal approximation of the binomial or poisson distributions.
regards
 

Miner

TS Contributor
#5
Technically, count data follow a Poisson distribution, but when the values increase above a certain threshold, they as rogojel pointed out, approximate the normal distribution.
 

ondansetron

TS Contributor
#6
Hello friends of the forum
I have the following doubt ...

If I have a sample of only 24 discrete data, how can I prove that these data follow a normal distribution?
First, you won't be able to prove the data come from a normal distribution. As others pointed out, your data definitely don't come from a normal distribution, but a normal distribution might be a reasonable approximation. I would recommend checking your data plotted on a normal probability plot. If the data are well approximated by a normal distribution, you should see your data plotting on the straight line, more or less. There are no p-values with this approach.

When I enter the data into a specialized software, the software tells me that the sample fits a normal distribution, but I'm confused, because I understand that the normal distribution is ideal for continuous data, besides, what proof of normality can I use? Whether the Kolmogorov Smirnov test or the Shapiro-Wilk test find that they are only used for continuous variables ....
Be careful. As I mentioned above, tests of normality aren't confirming or offering evidence of normality. They're looking for evidence to suggest nonnormality of the underlying population (Reject Ho would indicate this). Failing to reject Ho does not confirm or support normality. Additionally, formal tests for normality, such as K-S, S-W, and A-D will often reject Ho, suggesting the data are from a nonnormal distribution (even in cases where data is pulled from a known normal distribution). Similarly, we can sample a truly nonnormal distribution, yet get these tests to give a nonsignificant result (i.e. to say insufficient evidence of nonnormality). For these reasons, it's best not to base your choice of statistical analysis on these normality tests. Use the normal probability plot and good judgement to see if data can be approximated with a normal distribution (especially helpful in cases where it is unknown what the true distribution is).
 
#7
Thank you all for responding

But what test of normality could apply if my data are discrete? (Kolmogórov-Smirnov, Anderson-Darling, Shapiro-Wilk, etc.).
 

ondansetron

TS Contributor
#8
Thank you all for responding

But what test of normality could apply if my data are discrete? (Kolmogórov-Smirnov, Anderson-Darling, Shapiro-Wilk, etc.).
Is there a particular reason why you want to do a formal test?

I wouldn't recommend a formal test for normality for the reasons I mentioned above (and others mentioned some as well). Following formal tests of normality can lead to inappropriate decisions. And again, failing to reject the null hypothesis on one of those tests doesn't, in any way, prove or support the null hypothesis that the underlying distribution is normal.

I would recommend using a normal probability plot (Q-Q plot) for your data to see if the normal distribution is a reasonable approximation for your case (if you really want some sort of assessment). This will not involve p-values, but you will look to see if the plotted values fall on the straight line to a reasonable extent.

I hope this helps.
 
#9
Hi ondansetron, Thank you very much for the advice, I know it is very accurate and it is well sustained.

But what happens that I have more than 3000 samples and the data are discrete and I need to analyze those samples. Therefore I tried in several software fitting to probability distribution to many samples of data. The results show that they follow a normal distribution, but it is very difficult to enter the samples one by one (because there are many).


I was thinking of applying a normality test through an Excel macro for all the samples, so a normal probability plot would be a bit difficult and take a lot of time.

Thanks for reply
 
#10
Hi ondansetron, Thank you very much for the advice, I know it is very accurate and it is well sustained.

But what happens that I have more than 3000 samples and the data are discrete and I need to analyze those samples. Therefore I tried in several software fitting to probability distribution to many samples of data. The results show that they follow a normal distribution, but it is very difficult to enter the samples one by one (because there are many).


I was thinking of applying a normality test through an Excel macro for all the samples, so a normal probability plot would be a bit difficult and take a lot of time.

Thanks for reply
Do you have access to any other software aside from Excel?

Your first post said you have 24 observations. Do you have many different groups?
 
#11
yes ondansetron, I have different groups, almost 3000 samples and each one has 24 values, that is the problem and the reason why I wanted to apply a test of normality
Thanks for reply
 

rogojel

TS Contributor
#12
Hi,
the point ondansetron is making is that we generally want to test the normality because this might be a precondition of applying some statistical method or test - like an ANOVA or a t-test. If this is your case, you might not even need the normality test, or you might have other tests that do not have the normality as a precondition.

So, what is your purpose with the data?

regards
 
#13
Hi, rogojel, thank you for taking the time and reply

well, I am working with inventory systems with probabilistic demand, so I must work with the probability distribution to obtain the information of the samples (leatime, security inventory, quantity, etc etc).

Since I do not know which statistical distribution the data fit, I used several software. The results of the software show that the data fit a normal distribution, but I must prove it theoretically.

So I need to prove theoretically that these data fit a normal distribution, the problem is that they are more than 3000 samples of discrete data, a normal probability plot for each one is a little inappropriate.

Thank you very much for your answers
 
#14
Hi,
the point ondansetron is making is that we generally want to test the normality because this might be a precondition of applying some statistical method or test - like an ANOVA or a t-test. If this is your case, you might not even need the normality test, or you might have other tests that do not have the normality as a precondition.

So, what is your purpose with the data?

regards
Yes, this is why I asked if there was a specific purpose (you just asked it more clearly). I also want to stress that the OP keeps saying he wants to prove the data come from normal distribution (or fit it)-- in those tests, that would be equivalent to saying the null hypothesis is true. This is an incorrect conclusion as we can never prove the null, and failure to reject the null does not count as evidence for the null. So, given this, the OP should be cautious not to interpret the results that he has proven normality in anyway. At best, there would be insufficient evidence of non-normality (but again, doesn't prove or support normality).
 
#15
Hi ondansetron, I agree with what you write and I have not expressed myself well (My apologies), failing to reject Ho does not confirm or support normality, I just need to establish some assumptions.

How can I show a normality assumption in a sample (discrete) without checking my data plotted on a normal probability plot? Because there are many samples as I said before. It's possible? Is there any other technique or method?

Thank you for taking the time and reply, regards
 

rogojel

TS Contributor
#16
Hi, rogojel, thank you for taking the time and reply

well, I am working with inventory systems with probabilistic demand, so I must work with the probability distribution to obtain the information of the samples (leatime, security inventory, quantity, etc etc).

Since I do not know which statistical distribution the data fit, I used several software. The results of the software show that the data fit a normal distribution, but I must prove it theoretically.

So I need to prove theoretically that these data fit a normal distribution, the problem is that they are more than 3000 samples of discrete data, a normal probability plot for each one is a little inappropriate.

Thank you very much for your answers
hi,
if I understand you correctly you have many data points but they refer to different groups, so you have a small number of data per group? And you need a model per each group?

There is a saying that a small dataset is always normally distributed, a large one never is - this means that the power of the normality tests is such, that they will pretty much always fail to reject the null for a small dataset. So, running the tests for small groups is not going to help you, except in really extreme cases, as has already been pointed out here. Depending on what you need to do, you might want to consider simply using your data to make predictions (as in boostraping, by creating larger samples by sampling with replacement from your original data). That could give you an idea of how often the demand would exceed a given limit, for example.

regards
 
#17
hi,
if I understand you correctly you have many data points but they refer to different groups, so you have a small number of data per group? And you need a model per each group?

There is a saying that a small dataset is always normally distributed, a large one never is - this means that the power of the normality tests is such, that they will pretty much always fail to reject the null for a small dataset. So, running the tests for small groups is not going to help you, except in really extreme cases, as has already been pointed out here. Depending on what you need to do, you might want to consider simply using your data to make predictions (as in boostraping, by creating larger samples by sampling with replacement from your original data). That could give you an idea of how often the demand would exceed a given limit, for example.

regards
I will try to apply these advice, thank you very much for the help
 

CowboyBear

Super Moderator
#18
Yes, this is why I asked if there was a specific purpose (you just asked it more clearly). I also want to stress that the OP keeps saying he wants to prove the data come from normal distribution (or fit it)-- in those tests, that would be equivalent to saying the null hypothesis is true. This is an incorrect conclusion as we can never prove the null, and failure to reject the null does not count as evidence for the null. So, given this, the OP should be cautious not to interpret the results that he has proven normality in anyway. At best, there would be insufficient evidence of non-normality (but again, doesn't prove or support normality).
Nice posts in this thread, ondansestron - you really seem to be making a positive contribution on the forums :)