# Kolmogorov-Smirnov Two Sample test statistic confusion

#### pring10

##### New Member
Hi,

What is the test statistic for the Kolmogorov Smirnov Two sample test for testing whether two distributions are the same. I have been using the Handbook of Parametric and Nonparametric Statistical Procedures, but I have also found online a conflicting option for the test statistic:

1) As stated on wikipedia (which http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
2) and in this set of lecture notes http://ocw.mit.edu/courses/mathematics/18-443-statistics-for-applications-fall-2006/lecture-notes/lecture14.pdf

there is this option of a test statistic which has an extra factor of
sqrt [(n1*n2/(n1+n2))]

D_n1n2 = sup|F-G|​

where F and G are empirical cumulative distribution functions of two samples of size n1 and n2, it uses a test statistic:

D_n1n2 = sqrt [(n1*n2/(n1+n2))] * sup|F-G|​

This is confusing, as I also have a large sample size and I was wondering if this was because they are referring to different tables in the two cases where one table is multiplied by a factor or whether this is a mistake? Especially as I am working on a large sample size, so when I look up the tables, I see there is this sqrt factor but it is inverted.

Any suggestions/help would be great!

Thanks!

#### Dason

The extra factor is there on wikipedia too if you check out the Two-sample Kolmogorov–Smirnov test section.

#### pring10

##### New Member
@dason: ... i'm confused... i thought that's what my post was pointing out? =/ why is this factor there? is this wrong?

#### Dason

I suppose it's there because it needs to be modified to get an analogous result to Theorem 2 (in the notes you posted) for the two-sample case.

#### pring10

##### New Member
So which is the correct statistic to use?
D_n1n2 = sup|F-G|
or
D_n1n2 = sqrt [(n1*n2/(n1+n2))] * sup|F-G|

#### Dason

Well it depends on what you're doing with it. But for the two-sample case both sources eventually use the second one. So I'd say the second one...

#### pring10

##### New Member
Well, I want to perform the KS test null hypothesis on two sets of data to test their distributions and i have large data samples > 25 so do I just use the critical value say for alpha = 0.05, do i use from the tables it says
1.36 * sqrt ((n1+n2)/n1*n2)?
It's kind of important seeing as I get a completely opposite result if i don't use the factor to calculate the test statistic.

#### Dason

I don't know which table you're looking at so I can't help there. But both sources agree on the procedure. Why are you doing this by hand anyways? Most stats software packages will do this for you.

#### pring10

##### New Member
This table has the appropriate critical values and is for the standard 'exact' KS 2 sample test:
http://www.soest.hawaii.edu/wessel/courses/gg313/Critical_KS.pdf
I'm just wondering do I use the same test, seeing as I can't find any book which mentions this other test statistic with the square root factor and or accompanying tables. Having looked up the 'tables' which the wikipedia entry refers to in Biometrika Tables for Statisticians as noted in wikipedia - that reference refers to a table which gives m*n*D_mn values and these are not the same as those in the example of critical values I attached for example.

#### pring10

##### New Member
No specific reason I'm doing it by hand - I just happen to be. But it should work and give me the same answer either way.

#### pring10

##### New Member
I think I see what you mean now - as they calculate their own separate critical values in the second source. Thanks!