Hi, what do you mean by discrete data? count data? What kind of statistical analysis do you want to perform with your data? Probably you should choose a method which does not have to assume normality?
Hello friends of the forum
I have the following doubt ...
If I have a sample of only 24 discrete data, how can I prove that these data follow a normal distribution?
When I enter the data into a specialized software, the software tells me that the sample fits a normal distribution, but I'm confused, because I understand that the normal distribution is ideal for continuous data, besides, what proof of normality can I use? Whether the Kolmogorov Smirnov test or the Shapiro-Wilk test find that they are only used for continuous variables ....
Thank you very much for your answers, thank you
Hi, what do you mean by discrete data? count data? What kind of statistical analysis do you want to perform with your data? Probably you should choose a method which does not have to assume normality?
MrDavid (02-03-2017)
Thank you very much for reply
Yes, I mean to data that can only take certain values. For example: the number of students in a class (you can't have half a student).(Opposite of Continuous Data).
I have to characterize the behavior of demand for certain types of products, know which distribution they fit.
So, when I enter the data into the software and I do a fitting distribution test, the results show that the data follows normal distribution, but I'm confused, because how can I apply normality tests to that data, if those tests are for Continuous data Kolmogorov Smirnov test or the Shapiro-Wilk test)
Actually discrete data can fit quite well to a normal distribution - see the normal approximation of the binomial or poisson distributions.
regards
MrDavid (02-03-2017)
Technically, count data follow a Poisson distribution, but when the values increase above a certain threshold, they as rogojel pointed out, approximate the normal distribution.
MrDavid (02-03-2017)
First, you won't be able to prove the data come from a normal distribution. As others pointed out, your data definitely don't come from a normal distribution, but a normal distribution might be a reasonable approximation. I would recommend checking your data plotted on a normal probability plot. If the data are well approximated by a normal distribution, you should see your data plotting on the straight line, more or less. There are no p-values with this approach.
Be careful. As I mentioned above, tests of normality aren't confirming or offering evidence of normality. They're looking for evidence to suggest nonnormality of the underlying population (Reject Ho would indicate this). Failing to reject Ho does not confirm or support normality. Additionally, formal tests for normality, such as K-S, S-W, and A-D will often reject Ho, suggesting the data are from a nonnormal distribution (even in cases where data is pulled from a known normal distribution). Similarly, we can sample a truly nonnormal distribution, yet get these tests to give a nonsignificant result (i.e. to say insufficient evidence of nonnormality). For these reasons, it's best not to base your choice of statistical analysis on these normality tests. Use the normal probability plot and good judgement to see if data can be approximated with a normal distribution (especially helpful in cases where it is unknown what the true distribution is).
MrDavid (02-03-2017)
Thank you all for responding
But what test of normality could apply if my data are discrete? (Kolmogórov-Smirnov, Anderson-Darling, Shapiro-Wilk, etc.).
Is there a particular reason why you want to do a formal test?
I wouldn't recommend a formal test for normality for the reasons I mentioned above (and others mentioned some as well). Following formal tests of normality can lead to inappropriate decisions. And again, failing to reject the null hypothesis on one of those tests doesn't, in any way, prove or support the null hypothesis that the underlying distribution is normal.
I would recommend using a normal probability plot (Q-Q plot) for your data to see if the normal distribution is a reasonable approximation for your case (if you really want some sort of assessment). This will not involve p-values, but you will look to see if the plotted values fall on the straight line to a reasonable extent.
I hope this helps.
MrDavid (02-03-2017)
Hi ondansetron, Thank you very much for the advice, I know it is very accurate and it is well sustained.
But what happens that I have more than 3000 samples and the data are discrete and I need to analyze those samples. Therefore I tried in several software fitting to probability distribution to many samples of data. The results show that they follow a normal distribution, but it is very difficult to enter the samples one by one (because there are many).
I was thinking of applying a normality test through an Excel macro for all the samples, so a normal probability plot would be a bit difficult and take a lot of time.
Thanks for reply
MrDavid (02-04-2017)
yes ondansetron, I have different groups, almost 3000 samples and each one has 24 values, that is the problem and the reason why I wanted to apply a test of normality
Thanks for reply
Hi,
the point ondansetron is making is that we generally want to test the normality because this might be a precondition of applying some statistical method or test - like an ANOVA or a t-test. If this is your case, you might not even need the normality test, or you might have other tests that do not have the normality as a precondition.
So, what is your purpose with the data?
regards
MrDavid (02-04-2017), ondansetron (02-04-2017)
Hi, rogojel, thank you for taking the time and reply
well, I am working with inventory systems with probabilistic demand, so I must work with the probability distribution to obtain the information of the samples (leatime, security inventory, quantity, etc etc).
Since I do not know which statistical distribution the data fit, I used several software. The results of the software show that the data fit a normal distribution, but I must prove it theoretically.
So I need to prove theoretically that these data fit a normal distribution, the problem is that they are more than 3000 samples of discrete data, a normal probability plot for each one is a little inappropriate.
Thank you very much for your answers
Yes, this is why I asked if there was a specific purpose (you just asked it more clearly). I also want to stress that the OP keeps saying he wants to prove the data come from normal distribution (or fit it)-- in those tests, that would be equivalent to saying the null hypothesis is true. This is an incorrect conclusion as we can never prove the null, and failure to reject the null does not count as evidence for the null. So, given this, the OP should be cautious not to interpret the results that he has proven normality in anyway. At best, there would be insufficient evidence of non-normality (but again, doesn't prove or support normality).
MrDavid (02-04-2017)
Hi ondansetron, I agree with what you write and I have not expressed myself well (My apologies), failing to reject Ho does not confirm or support normality, I just need to establish some assumptions.
How can I show a normality assumption in a sample (discrete) without checking my data plotted on a normal probability plot? Because there are many samples as I said before. It's possible? Is there any other technique or method?
Thank you for taking the time and reply, regards
Tweet |