# tests for normal data

#### slavia

##### New Member
Hi,

I posted some like my questions, but I didn't stay very clearly with my problem.

I have some rows of data like
Code:
1 0 2 1 3 0
I need to apply some tests to compare if the first 3 elements have the same distribution than the other 3. I have some rows with big numbers like
Code:
23 32 12 33 23 54
My idea is to test the normality of each row and if the row has a normal data, I will apply normal tests. If the row don't have normal data, I will apply not normal tests.

Is correct what I'm thinking to do? My question is because I only have 6 data for each row...is correct to apply a shapiro.test to test the normality?

Thank you very much

#### hlsmith

##### Omega Contributor
Your original question is a little unclear. Can you take sometime to rephrase it, so we better know what you are doing and the purpose.

Also it may be helpful for you to read about Kolmogorov–Smirnov test.

#### slavia

##### New Member
My original question: I need to compare if the groups(3 first elements and last 3) has the same distribution. For to do this I need to apply some statistical tests, but first I need to know if I have a normal data.

Which is the difference between the two tests (kolmogorov smirnov or shapiro.test) in this case?

#### benjwalt

##### New Member
What do you do if you have a non-significant Shapiro-Wilks but a significant kolmogorov smirnov ? If kurtosis and skewness are less than +/-1 . Can I use a repeated measure ancova?

#### TheEcologist

##### R purist
What do you do if you have a non-significant Shapiro-Wilks but a significant kolmogorov smirnov ? If kurtosis and skewness are less than +/-1 . Can I use a repeated measure ancova?
You are testing whether the residuals of the ANCOVA are normally distributed? Cause the normality assumption rests on the residuals, NOT the data.

#### noetsi

##### Fortran must die
QQ plots are commonly used (and are probably the best way to determine) normality of data. Although I have never seen this done (interestingly) I assume this could be done with residuals just as easily as raw data.

Its interesting that QQ plots are commonly shown with raw data not residuals even with methods like ANOVA or regression that make assumptions about the normality of the residuals not the raw data.

#### slavia

##### New Member
There are no tests that have reasonable size and power properties for such small samples! Please correct me if I'm wrong.

I agree, but I have only this data, and I need to test...How can I do?

#### slavia

##### New Member
What do you do if you have a non-significant Shapiro-Wilks but a significant kolmogorov smirnov ? If kurtosis and skewness are less than +/-1 . Can I use a repeated measure ancova?
Sorry, I don't know ancova. What is this?

#### slavia

##### New Member
QQ plots are commonly used (and are probably the best way to determine) normality of data. Although I have never seen this done (interestingly) I assume this could be done with residuals just as easily as raw data.

Its interesting that QQ plots are commonly shown with raw data not residuals even with methods like ANOVA or regression that make assumptions about the normality of the residuals not the raw data.
Thank you.

The better thing is to do a QQ-plot. But with this, can I have a measure? A number? Or only the graph? Because I need to apply this to a lot of rows, and it´s impossible to look all the graphs...

After that (and I know that the the power of the test isn't good), can I apply the test (normal or not normal) to see the distribution?

#### noetsi

##### Fortran must die
QQ plots only plot the observed data against a theoretical distribution. They don't calculate a test statistic.

I am not sure what you mean by a lot of rows. Regardless of how many variables you have the residuals (your observed data) will be a single set of data so you can use one QQ plot.

#### slavia

##### New Member
"A lot of rows": a lot of data to test the normality. I asked if it's possible to have a "value" for the QQ plot because I need to do test for about 20000 rows of data (independently) and it's impossible to look all graphs. If I have a number, I can filter, it's beacause I talk about shapiro.test or kolmogorov-smirnov.

In this case I need to work with one of that(shapiro.test or ks.test). Which is the best? Or more correct to use?

Thank you

#### noetsi

##### Fortran must die
As far as I know QQ plot's don't generate a statistic or value you can test.

In all the discussions of various normality tests I have not seen authors come down on one as being better. The general comment is that all are weak in terms of statistical power.

#### slavia

##### New Member
I know the power is very bad, but it's correct to do a shapiro.test or ks.test? Or there are some rules that I can't apply this?

#### Englund

##### TS Contributor
Or there are some rules that I can't apply this?
For the shapiro wilk test you'll need at least 5 observations, and for jarque bera 8 or more obs is required. At least according to STATA's help section.

#### noetsi

##### Fortran must die
If you accept that the power is weak, then there certainly is no rule against using them. If there was they would not exist Anderson-Darling is also used a lot.

#### slavia

##### New Member
If you accept that the power is weak, then there certainly is no rule against using them. If there was they would not exist Anderson-Darling is also used a lot.
I know about the power, but I don't have more data. I'm trying to do the best that is possible with the data that I have...

Thank you very much

#### slavia

##### New Member
Hi again.
Anf if my data is
Code:
 0   0   0   0   0   1   1   1   1   1  1   1   1   0   0   0  0  1  0  0  0  0  0   1   1   1   1   1   1  1   1   1   1  1   1   1   1   1   1   1   1  1  1  1  1  1  1  1´
it's correct to apply a shapiro.test to test the normality of data?

#### Dason

##### Ambassador to the humans
Here's a hint - if the only values your data can take are 0 and 1 then the data itself is NOT normally distributed...

#### slavia

##### New Member
Hi.

I can have values like

Code:
 0   0   0   0   0   1   2   4   1   1  4   2  1   0   0   0  0  1  44  0  25  0  0   1   1   1   1   1   1  1   1   1   1  1   1   1   1   1   1   1   1  1  1  1  1  1  1  1´
It's good to apply a shapiro.test?