Is there any method to determine the size of the subgroup used in the Kolmogrov-Smirnov test?

#1
Hi everybody,

I have a data set of 5000 sample. I want to apply the K-S test on subgroups of this data set to check if two consecutive subgroups are derived from the same population. I am confused about the size of the subgroup to be used. Does anyone know how to determine/estimate that size?

Regards.
 
Last edited:
#2
thats an interesting question you know. I think it is basically akin to asking how many clusters should there be in a clustering analysis. In a parametric (finite mixture) model this is 'ICL', which I can't recall how it works, but you have the internet. To me, I think the problem is much better defined if you parameterize what the distribution in each subgroup is, ie normal. A fully non-parametric approach implied by the k-s test leads to weird issues about how the clusters are determined in my view, although im sure some clustering experts will have some.
 

Karabiner

TS Contributor
#3
I have a data set of 5000 sample. I want to apply the K-S test on subgroups of this data set to check if two consecutive subgroups are derived from the same population. I am confused about the size of the subgroup to be used. Does anyone know how to determine/estimate that size?
Could you tell us what these data represent and how they were collected?
In addition, could you tell us in detail how the data were measured and on which scale they are?
And why you need to know if consecutve subgroups are from the same population?
I.e. what is the research question and what is the theoretical or practical impact of the
answer to it?

With kind regards

Karabiner