I am trying to determine the standard deviation data that is not itself numerical, but its count is. For example, If I have a dataset from a survey that shows cities that people live in.

In the data, 9000 people live in Miami, 200 live in Dallas and 1 lives in Toronto, I am trying to represent numerically, just how unusual it is to have a person living in Toronto that responded to the survey. In general terms, a Z-Score seems appropriate since it would numerically tell me how many standard deviations away the data is, but that requires determining the Standard Deviation first, which is proving problematic to me because in determining averages, I'm not calculating the average in a way that recognizes the heavy skew toward Miami. If I say there are 9201 total people and divide by 3, it tells me there are an average of 3067 people in each city, but clearly this number is way off from reality since so many live in Miami and only 1 lives in Toronto. It's true that 3067 is the "average" of 3 equally weighted cities. I could just use percentages and say that the likelihood of someone living in Toronto is 1/9201 = .0109%, but is that the right approach? I like the idea of using a standard deviation and a Z-Score since it's easy to use that number to calculate "normal" vs "abnormal".

My ultimate goal here is to be able to represent numerically, a vast deviation from "normal". Because what if we take it a step further and then do a similar analysis based on color of eyes, and another based on age. I want to be able to put these together to tell me just how unusual is it to have a person that is 23 years old, with green eyes and lives in Toronto.