[SPSS] Comparison to homogeneous distribution.


This is my first post here, please forgive any mistakes.

I am designing the statistics for a project. The aim is to compare samples taken against a perfectly homogeneous population. I have identified both Similarity Factor and Chi-squared as potentially good statistics to utilise.

I have been thinking about the statistics for a little while and have possibly confused myself.

The problem I am having is that the input mix to gain the output population distribution could potentially be 40% A, 30% B, 15% C and 15% D. So that the output would have a greater proportion of A than B etc.

Would this mean that I could not utilise Chi-squared? I was going to set the table out roughly like the one attached (formulae not included, and will be using SPSS rather than Excel). Obs would be the values for each category taken on sampling, and Exp would either be the above values, however, a homogeneous distribution is required so would it be 25% for each category? although that would not be possible as the proportions for each category differ.

If I was to then split the sample down further into sections to test the distributions within, would a Chi-squared test also be utilised there? ie. Sample 1.1. would the Exp values be the proportion of A, B, C and D on input for a homogeneous sample?

I think where I am getting confused is with the proportion of the categories on input and how that would affect the expected value.

This is just to check, I think I would be able to, but would very much appreciate other opinions.

If I was only able to undertake sampling once but get 12 samples across the population, would I be able to do chi-squared analysis on each of the separate samples? If I was able to undertake sampling three times, so was able to get 3 repeats for the 12 sampling locations, and the results for the observed were close in the categories would I be able to combine them for repeats then undertake analysis?

Thank you for your time and answers. I am really sorry it is quite confused.

Edit: I think I understand what I have to do for the expected values, the observed values for each category would be a count. This count is then totalled up and the corresponding input percentage is utilised for the expected count. So say input percentage for A is 40%, and the observed total was 120 individuals, the expected count would be 48. If this is right I have been a complete fool.
Last edited: