# Compare Samples

#### Dik

##### New Member
I’m not a statistician, but have a problem that might have a statistical solution.

The problem is actually related to other data, but the best example I can provide is as follows:

I have two bins with six different types of veggies. They could be carrots, potatoes, turnips, beets, cauliflower, and broccoli.

Bin01 has 10 carrots, 12 potatoes, 4 turnips, 8 beets, 16 cauliflowers, and 9 broccoli.

Bin02 has 12 carrots, 8 potatoes, 5 turnips, 9 beets, 12 cauliflower, and 7 broccoli.

-Is there a means that I can determine if Bin01 is similar to Bin02?
-What happens if Bin02 has 0 turnips? I assume the order of items must remain the same.
-Can the degree of certainty (and I don’t know if that’s the correct term) that they are similar or different be determined?
-Is there any other information that is required? or, can be determined?
-Can the degree of certainty include a component that reflects the small number of each veggie in the bin?
-If I have 10 bins can the same method be used to compare all 10 bins?

Can someone outline a methodology to undertake this analysis, if it is possible? Any help would be appreciated.

Thanks, Dik

#### obh

##### Active Member
Hi,

Per my understanding the question, this is not a statistical question, but you can find a statistical test which does a similar comparison.
I can suggest the Chi-square test for goodness of fit.
Of course, don't forget this is not a chi-square test (if I understand your question...)

Also in chi-square, the totals are equal in both groups (or using a ratio)
Do you want to compare the absolute totals or the ratios?
like, are groups of (10 carrots, 12 potatoes, 4 turnips, 8 beets) similar to (100 carrots, 120 potatoes, 40 turnips, 80 beets,)?

But if you tell me the goal of your comparison I may think of a better idea.
What is the goal of the comparison?

http://www.statskingdom.com/310GoodnessChi.html

#### Dik

##### New Member
Our professional association has a program called 'professional development' ('pro-dev'). During the course of a year, time has to be expended to show the association that members are maintaining a professional level. This requires thousands of hours to be expended by all members.

In the event of non-professional conduct, there are a few options available to the association ranging from a reprimand to expulsion depending of the degree of non-conformance. These options are the 'different veggies' in my example and the bins are the respective year. I think the total number of 'mis-conduct' items would have an impact on the comparison, so the data cannot likely be linearly 'scaled'.

The program has been in place for six or eight years. The professional practice group will have records for these years as well as for decades prior to engaging in pro-dev.

I was looking for a method to compare the professional performance for the years prior to the pro-dev program being implemented to those since it was implemented. The object being to see if there is a measurable improvement in the profession that results from implementing the program.

If there is no measurable improvement, then thousands of hours are being wasted each year.

An interesting link... I'll look at it later tonight.

Thanks, Dik

Last edited:

#### obh

##### Active Member
Hi Dik,

Why are you interested in the relations between the 'different veggies' and not in the ratio between the total problematic cases (veggies) and the total members?

You may consider giving different weight for each "veggie" , example:
reprimand - 0.1
xxxx1 -0.3
xxxx2 -0.6
expulsion - 1

So you will have one mark per each year

Total mark example: 100 * (1222 - (10* 0.1 + 22*0.3 + 11* 0.6 +4*1) ) / 1222

#### Dik

##### New Member
Hi Dik,

Why are you interested in the relations between the 'different veggies' and not in the ratio between the total problematic cases (veggies) and the total members?

You may consider giving different weight for each "veggie" , example:
reprimand - 0.1
xxxx1 -0.3
xxxx2 -0.6
expulsion - 1

So you will have one mark per each year

Total mark example: 100 * (1222 - (10* 0.1 + 22*0.3 + 11* 0.6 +4*1) ) / 1222

Thanks for the comment. I had thought of weighting the values depending on the severity of the sanction. Because the number of occurrences are likely variables, a couple of serious occurrences could outweigh several minor occurrences. I was wanting to determine if pro-dev was actually measurable by comparing the results of pre-pro-dev with pro-dev.

Pro-dev is becoming more common with some professions and I was wondering if it was possible to show that it was an improvement, or a waste of time.

Dik

#### Dik

##### New Member
Hi,

Per my understanding the question, this is not a statistical question, but you can find a statistical test which does a similar comparison.
I can suggest the Chi-square test for goodness of fit.
Of course, don't forget this is not a chi-square test (if I understand your question...)

Also in chi-square, the totals are equal in both groups (or using a ratio)
Do you want to compare the absolute totals or the ratios?
like, are groups of (10 carrots, 12 potatoes, 4 turnips, 8 beets) similar to (100 carrots, 120 potatoes, 40 turnips, 80 beets,)?

But if you tell me the goal of your comparison I may think of a better idea.
What is the goal of the comparison?

http://www.statskingdom.com/310GoodnessChi.html

Neat link, I wasn't aware that this could be done, thanks. Is there a test that can be used to compare multiple items in a sample with multiple items in another sample?

Dik

#### obh

##### Active Member
Thanks for the comment. I had thought of weighting the values depending on the severity of the sanction. Because the number of occurrences are likely variables, a couple of serious occurrences could outweigh several minor occurrences. I was wanting to determine if pro-dev was actually measurable by comparing the results of pre-pro-dev with pro-dev.

Pro-dev is becoming more common with some professions and I was wondering if it was possible to show that it was an improvement, or a waste of time.

Dik
Hi Dik,

You choose the weights, you may choose less dramatic weight differences between the "veggies", but in this case, you may not describe the reality correct.

Another option is to add another measurement like the percentage of "professional people": 100 * (1222 - (10 + 22 + 11 +4) ) / 1222

The combination of the two measurements will give you a better picture.

Obh

#### obh

##### Active Member
Neat link, I wasn't aware that this could be done, thanks. Is there a test that can be used to compare multiple items in a sample with multiple items in another sample?

Dik
The test compares multiple groups (just press insert row ...)
you should also ensure the totals are the same, for example compare observed: (10,20,30) to Expected (60,30,30) you should change the expected:
|(60,30,30) to (30,15,15) calculation: { 60/120 *60=30, 30/120*60=15 ...}

Please notice that in your case you don't use samples but the full population ...

If for example, you will assume that (10,20,30) differs from (60,30,30) what will you do with this knowledge?

#### Dik

##### New Member
The test compares multiple groups (just press insert row ...)
you should also ensure the totals are the same, for example compare observed: (10,20,30) to Expected (60,30,30) you should change the expected:
|(60,30,30) to (30,15,15) calculation: { 60/120 *60=30, 30/120*60=15 ...}

Please notice that in your case you don't use samples but the full population ...

If for example, you will assume that (10,20,30) differs from (60,30,30) what will you do with this knowledge?
I'm currently going through a bunch of papers... I will check to see if 'normalising' the data will help. Intuitively, I'm not sure will, but will keep a real open mind. I have to 'wrap my ears' around the methodology to understand it.

Thanks Dik