# comparison of proportion to a population Confidence Interval

#### mjs549

##### New Member
I am comparing the % of minorities from my organization to a population % of minorities, to see if it is high or low. I have data for my whole organization (not a sample) so I do not show CIs. The “population” data is a sample and thus has CIs. For example, see the below (hypothetical) data.

Question #1: if my organization value is within the CI population range, can say it is “consistent” with the population data?
Question #2: If my organization value is outside the CI population range interval, is it correct to stay there is a statistically significant difference, or should I just say it is “higher” or “lower” compared to the population?

#### Karabiner

##### TS Contributor
How large is the population sample, and how large is your organization?

#### mjs549

##### New Member
The population data had about 1500 survey respondents, but with weights applied the size was over 200,000. In my organization I have different divisions that will be compared separately compared to the population values. The size of the divisions are typically 100-1,000, although some are smaller like 20 people. (Maybe I should only compare larger divisions).

#### Karabiner

##### TS Contributor
I do not know how "with weights applied" increase a sample size of 1500 by a factor of more
than 100. How does that happen?

You could consider creating 2x2 tables with the factors "population vs division" and
"minority vs. non-minority". You can a) perform a statistical test, which deals with the
question whether the populations from which the 2 samples are drawn have exactely
the same proportion of minority members (Chi² test), and b) calculate a 90% or 95%
confidence interval for the difference between proportions, to get an idea of how
variable the differences can be, due to chance (sampling error).

With kind regards

Karabiner

#### mjs549

##### New Member
Thank you! couple of follow up questions:
1. Would it be a terrible sin not to calculate statistical significance, but just to make comparisons as "higher" or "lower?"
2. I understand the chi-square, but it seems to carry it out I would need to combine the two datasets into one. However, the "population data" is based on a sample and thus has weights, so it seems that half the data would need to be weighted and the other half not, which seems confusing. I'd be worried about calculating the statistical significance correctly.

#### Karabiner

##### TS Contributor
The significance test is just an option. Essentially, a 95% confidence interval around the difference
supplies the same information - if it contains zero, then one can maintain doubt whether the
difference between samples might be totally due to chance (sampling error). But as mentioned before,
I do not understand that weighting thing. One cannot use 1500 observations and make a sample
size of 200,000 out of it, at least as far as I know.

With kind regards

Karabiner

#### mjs549

##### New Member
Thank you! I looked at the survey file and the weights assigned to each individual are large in some cases, e.g. up to 7,000. For example, the below study gives weighted counts from an American Community Survey - “The study sample obtained from the 2019 ACS comprised a weighted total count of 148 358 252 individuals aged 20 to 65 years” when the number of people actually completing the survey would be much less.

https://jamanetwork.com/journals/jamanetworkopen/article-abstract/2777977

Is there anything inherently wrong with analyzing the data in a descriptive manner? For example, the % minorities in the population ranges from 40-50% (confidence interval) and our organization has 70%, thus we have a higher percentage of minorities. Or, the population ranges from 15-25% and we have 22%, so we are within the range one would expect.

#### Karabiner

##### TS Contributor
Thank you! I looked at the survey file and the weights assigned to each individual are large in some cases, e.g. up to 7,000.
I suppose that you better use the original data then, but I am not sure.
Is there anything inherently wrong with analyzing the data in a descriptive manner? For example, the % minorities in the population ranges from 40-50% (confidence interval) and our organization has 70%, thus we have a higher percentage of minorities. Or, the population ranges from 15-25% and we have 22%, so we are within the range one would expect.
Confidence intervals are not descriptive statistics, but inferential statistics.
And confidence intervals do mean something different from "the true population
value is with 90% confidence between A% and B%".

I am not sure what's wrong with "the difference between the depeartment is D%,
the 95% confidence interval for this between A% and B% and does not contain zero."

With kind regards

Karabiner

#### mjs549

##### New Member
Thank you for your help! Much appreciated.