tests for normal data

#1
Hi,

I posted some like my questions, but I didn't stay very clearly with my problem.

I have some rows of data like
Code:
1 0 2 1 3 0
I need to apply some tests to compare if the first 3 elements have the same distribution than the other 3. I have some rows with big numbers like
Code:
23 32 12 33 23 54
My idea is to test the normality of each row and if the row has a normal data, I will apply normal tests. If the row don't have normal data, I will apply not normal tests.

Is correct what I'm thinking to do? My question is because I only have 6 data for each row...is correct to apply a shapiro.test to test the normality?

Thank you very much
 

hlsmith

Omega Contributor
#3
Your original question is a little unclear. Can you take sometime to rephrase it, so we better know what you are doing and the purpose.

Also it may be helpful for you to read about Kolmogorov–Smirnov test.
 
#4
My original question: I need to compare if the groups(3 first elements and last 3) has the same distribution. For to do this I need to apply some statistical tests, but first I need to know if I have a normal data.

Which is the difference between the two tests (kolmogorov smirnov or shapiro.test) in this case?
 
#5
What do you do if you have a non-significant Shapiro-Wilks but a significant kolmogorov smirnov ? If kurtosis and skewness are less than +/-1 . Can I use a repeated measure ancova?
 
#6
What do you do if you have a non-significant Shapiro-Wilks but a significant kolmogorov smirnov ? If kurtosis and skewness are less than +/-1 . Can I use a repeated measure ancova?
You are testing whether the residuals of the ANCOVA are normally distributed? Cause the normality assumption rests on the residuals, NOT the data.
 

noetsi

Fortran must die
#7
QQ plots are commonly used (and are probably the best way to determine) normality of data. Although I have never seen this done (interestingly) I assume this could be done with residuals just as easily as raw data.

Its interesting that QQ plots are commonly shown with raw data not residuals even with methods like ANOVA or regression that make assumptions about the normality of the residuals not the raw data.
 
#10
QQ plots are commonly used (and are probably the best way to determine) normality of data. Although I have never seen this done (interestingly) I assume this could be done with residuals just as easily as raw data.

Its interesting that QQ plots are commonly shown with raw data not residuals even with methods like ANOVA or regression that make assumptions about the normality of the residuals not the raw data.
Thank you.

The better thing is to do a QQ-plot. But with this, can I have a measure? A number? Or only the graph? Because I need to apply this to a lot of rows, and it´s impossible to look all the graphs...

After that (and I know that the the power of the test isn't good), can I apply the test (normal or not normal) to see the distribution?
 

noetsi

Fortran must die
#11
QQ plots only plot the observed data against a theoretical distribution. They don't calculate a test statistic.

I am not sure what you mean by a lot of rows. Regardless of how many variables you have the residuals (your observed data) will be a single set of data so you can use one QQ plot.
 
#12
"A lot of rows": a lot of data to test the normality. I asked if it's possible to have a "value" for the QQ plot because I need to do test for about 20000 rows of data (independently) and it's impossible to look all graphs. If I have a number, I can filter, it's beacause I talk about shapiro.test or kolmogorov-smirnov.

In this case I need to work with one of that(shapiro.test or ks.test). Which is the best? Or more correct to use?

Thank you
 

noetsi

Fortran must die
#13
As far as I know QQ plot's don't generate a statistic or value you can test.

In all the discussions of various normality tests I have not seen authors come down on one as being better. The general comment is that all are weak in terms of statistical power.
 

noetsi

Fortran must die
#16
If you accept that the power is weak, then there certainly is no rule against using them. If there was they would not exist:) Anderson-Darling is also used a lot.
 
#17
If you accept that the power is weak, then there certainly is no rule against using them. If there was they would not exist:) Anderson-Darling is also used a lot.
I know about the power, but I don't have more data. I'm trying to do the best that is possible with the data that I have...

Thank you very much
 
#18
Hi again.
Anf if my data is
Code:
 0   0   0   0   0   1   1   1   1   1  1   1   1   0   0   0  0  1  0  0  0  0  0   1   1   1   1   1   1  1   1   1   1  1   1   1   1   1   1   1   1  1  1  1  1  1  1  1´
it's correct to apply a shapiro.test to test the normality of data?
 

Dason

Ambassador to the humans
#19
Here's a hint - if the only values your data can take are 0 and 1 then the data itself is NOT normally distributed...
 
#20
Hi.

I can have values like

Code:
 0   0   0   0   0   1   2   4   1   1  4   2  1   0   0   0  0  1  44  0  25  0  0   1   1   1   1   1   1  1   1   1   1  1   1   1   1   1   1   1   1  1  1  1  1  1  1  1´
It's good to apply a shapiro.test?