Choosing a subset of data to represent the total data distribution from given subsets