Newbie here so bare with me...

I am looking at a problem that can translate as follows: I need to select a small list of cities (10s) that are 'representative' of the full population: a much larger list of cities (10000+). The cities have individuals (people) for which we have the following attributes: gender, age, ethnicity, income. Consider that we have all information (these are not samples of a population: i.e. we have the full population).

I spent last couple of days on the internet to see how this problem could be tackled. Closest I found was topic on sampling methods, but none of the textbook methods have this idea of selecting representative predefined clusters from a list of clusters; they select representative individuals (stratified sampling).

Would appreciate if you could point me in the right direction in terms of topics, known methods so that I can pursue my research.

Note that the notion of 'representative' seems quite subjective to me. No mathematical definition was given. Currently, the method I have in mind is based on measuring the difference between historgram of each city and histogram of full population, for each attribute.

Any help appreciated

Thanks in advance.