How to compare population distribution?

abhishekkar

New Member
I have census data for three cities (i.e. population, not sample data). I am able to plot a histogram (discrete data) for population age for each of these cities. I want to run a statistical test to check if the difference between the three population distributions and their mean value are statistically significant.

In the case of a distribution from a population sample, I would have used ANOVA and, if significant, followed up with a post-hoc test. What package should I use to test the difference in population distribution?

obh

Active Member
Hi Abhishek,

ANOVA assumed normality (but not sensitive for normality if symmetric)
I think the equivalent non-parametric test for ANOVA check if the data came from the same distribution: Kruskal–Wallis test

GretaGarbo

Human
If you have the total population then you don’t need to do any statistical test at all. You already have the population numbers. If the relative frequency of 31 – 35 year old is larger in one city compared to an other city, then you know that it is larger.

Statistical significance refer to when you have a sample drawn from a population, or two samples drawn from two populations and you want to infer if the population mean or relative frequency differ between the populations.

An other thing is what is meant by “large”. If the relative frequency of 31-35 year old is 4% larger in one city, then it is up to your judgement if that is large.

hlsmith

Not a robit
I agree, with @GretaGarbo - the work is done. As she mentioned you could compare age ranges and report differences, but there is no uncertainty in the comparisons, since you have all of the data. This latter point is what negates statistical testing and need for measures of precision or uncertainty in your results.

@obh - I think you did not understand the OP's question.

Karabiner

TS Contributor
There was no explanation why this study is carried out, what the research goal and the research questions are.
There are circumstances when it might make perfect sense to perform inferential statistics even if current samples = current full populations.

With kind regards

Karabiner