So response variable is an integer from 0-5, with n=200. No you want to know if it is normal or if you can simulate a normal sample given those moments?
I’m writing a paper on regression analysis in SPSS, and the data set of the outcome variable is assumed to be normally distributed. It’s a constructed data set (N=200?) with possible values 0, 1, 2, 3, 4 and 5. Mean is 2,5, and St.dev=1,3 (?). Is it possible to construct a data set that is normally distributed according to the Kolmogorov-Smirnov and Shapiro-Wilk-tests? What is then the frequency of the values 0, 1, 2, 3, 4 and 5 to achieve such a normality? I’m grateful if anyone can help me.
So response variable is an integer from 0-5, with n=200. No you want to know if it is normal or if you can simulate a normal sample given those moments?
Stop cowardice, ban guns!
Morten67 (12-01-2016)
Thanks hlsmith! I want to simulate (generate, create) a normal sample given those moments and that is normal distributed according to the K-S-test and S-W-test in SPSS. I have tried different approaches including simulation in Excel (the average of 2, 3, 5 and 10 random generated integers from 0-5) and for "hand". But no of these sets seems to be normally distributed in the two tests in SPSS.
I wonder if you could use uniform distribution or just simulate it using normal then round values to nearest integer?
Stop cowardice, ban guns!
Morten67 (12-02-2016)
I have tried both these approaches. For example
0 - 0,052 → 52
1 - 0,159 → 159
2 - 0,279 → 279
3 - 0.279 → 279
4 - 0,159 → 159
5 - 0,052 → 52
A normal distribution is a continuous distribution. You're forcing data to follow a discrete distribution. I guess it shouldn't be too much of a surprise when you test a discrete (and, therefore, non-normal) distribution for normality and you don't find it.
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
Morten67 (12-02-2016)
Thanks Spunky, Your answer addresses the core of the problem. So the K-S-test and S-W-test statistics does not apply on discrete dataset. Should I then rely on a Visual examination in combination With skewness and kurtiosis to support the assumption of normal distribution?
I just generated normal random numbers and rounded them to integers. Then I recoded the negative values to 0 and the values larger than 5 to 5.
Then I did the Kolmogorov- Smirnov test.
I was surprised when the test gave a warning. (I don't usually use the Kolmogorov-smirnov test.) But of course, as Spunky said, the the normal distribution is continuous and the data is discrete.
For the discrete data with zero decimals normality was rejected, (but of course not rejected for the original data).
For 1 decimal normality was not rejected but the p-value deviated from the correct value given in the original data. The more decimal that were allowed the closer the p-value were to the "correct" value.
The code is in R. You can download R and RStudio for free.
Spoiler:
I prefer to do a histogram and QQ-plot to evaluate the normality of residuals.
When I was looking for code for Shapiro-Wilks I found this amusing blog post and also this.
If we round the data to zero decimals, then the data will be discrete (and not continuous and thus not normal). But if we round it to one or two decimals it will still be discrete, but just with more levels, but still a discrete variable. Even if we have 15 decimals it will still be discrete. In fact, all of our data are discrete since we always do a rounding. The real question is, does it deviate a lot from normality? A discrete binomial variable or a Poisson variable can be approximated with a normal distribution for certain parameter values. "All models are wrong, but some models are useful" as Box put it.
To round a variable loses information. It causes the the variance to increase. So those people who give us some data and say that they have rounded the data "because the uncertainty is so large anyway", they have not done us a favour.
If I read an instrumental value with 15 decimal then I can enter it as data with 2 decimals, especially if the standard deviation is 1.3 (more than 100 times larger than the last decimal).
It is not the appropriate question to ask: "are the residuals exactly normally distributed?"
The important question is: "Do the residual deviate so much from normality that it will have a destructive influence on the regression parameters (alpha and beta) that you are really interested in.
Regression parameter estimates are known to be robust to non-normality (but they are certainly not robust to outliers).
hlsmith (12-02-2016)
Nice synopsis GG. Did the decimal value matter. I am guessing so. Otherwise if it is just that it needed a distinction between values could one round the simulated values to integers and then just add the slightest noise back at say the thousandth's place.
Floating points are always a mystery to me, but beyond people rounding data, it seem in exact scenarios the floating point can also be a nuance. I saw something like your mentioning of rounding data somewhere else awhile ago, in something about know and variance. I am thinking is was John Cook or Gelman.
Stop cowardice, ban guns!
These blog posts make a great point which is often overlooked among applied researchers. About 2-3 weeks ago a student sent me an email concerning some analyses he was doing. What caught my att'n of what he said was this:
Specifically, I am worried about using a t-test for such a small sample size (5 participants) even though I tested the Normality of the sample using the Shapiro-Wilk's test
So, in his mind, because he tested for normality and the null hypothesis was not rejected, n=5 should be appropriate. It took a little bit of a while for me to explain to him that what he was looking at was an underpowered test of the assumption of normality.
Sometimes it makes me wonder how many people just go ahead and do these things though, without stopping to think about what they're actually doing.
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
What are you talking about Spunky, look at all of these beautiful normal simulations of n= 5.
P.S., I thought you were going to say the person also had n=5 integer data, that would be awesome.
Stop cowardice, ban guns!
PSS, when I ran these 40 through normality test the S-W was more generous (gave greater p-values). All were normal per S-W and two rejected by K-S.
Any guesses which two failed K-S normality test? There will be cake for the winner!
__________________________________
GG, has argued for no lower limit on sample size for t-test, the following is for her visual interpretation.
Stop cowardice, ban guns!
Tweet |