# Which method is the best to assign samples into groups

#### xxqtony

##### New Member
I have 100 samples, each of which has an real value of 1, 2, 3, or 4 for a characteristic (say C). I'll develop a few methods (say X, Y, Z) to measure C.

I generated data like this, hypothetically:
sampleName, expected_value, observed_value_by_X, observed_value_by_Y, observed_value_by_Z
S1, 1, 0.5, 1.1, 3.3
S2, 1, 1.3, 0.9, 0.7
S3, 2, 1.8, 2.2, 5.2
S4, 2, 2.8, 1.9, 1.1
S5, 2, 2.2, 2.0, 2.9
S6, 3, 2.9, 3.1, 6.0
S7, 3, 1.1, 2.9, 5.9
S8, 4, ......

Ultimately I need develop a method that can assign samples based on observed values into groups (with expected values of 1, 2, 3, or 4). Of course the more correctly assigned samples, the better.

I need to know which one from X, Y, Z perform the best. From hypothetical data above, we know method Y would be the best, since its observed values are close to expected ones. However, in reality I would get values from 0.5 to 3.5 for expected value of 1 from all methods, and 0.9~4.2 for expected value of 2, 1.2~6.1 for 3, 3.1~12 for 4

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Do the 3 predictor perform differently across the 1,2,3,4 groupings. So X does well with 3 and 4, Y does well for 2,3, etc?

Have you looked at say the mean differences between the 3 groups, so take their value subtract real and find means?

#### xxqtony

##### New Member
Data summary of two methods are shown below.

Method X:
> summary(r1$V2) #--- group for expected value = 1 Min. 1st Qu. Median Mean 3rd Qu. Max. 1.019 1.694 2.095 2.110 2.451 3.435 > summary(r2$V2) #--- group for expected value = 2
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.892 3.235 3.865 4.215 4.917 9.557
> summary(r3$V2) #--- group for expected value = 3 Min. 1st Qu. Median Mean 3rd Qu. Max. 5.122 6.438 7.031 7.491 8.155 10.760 > summary(r4$V2) #--- group for expected value = 4
Min. 1st Qu. Median Mean 3rd Qu. Max.
6.732 7.269 8.644 8.975 10.110 12.370

Method Y:
> summary(r1$V2) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.7025 1.2420 1.2720 1.6030 1.8330 3.1200 > summary(r2$V2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.210 2.344 3.181 3.633 4.895 10.270
> summary(r3$V2) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.574 3.784 4.801 5.182 6.380 11.590 > summary(r4$V2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
5.232 6.876 7.547 8.473 8.701 15.580

Do the 3 predictor perform differently across the 1,2,3,4 groupings. So X does well with 3 and 4, Y does well for 2,3, etc?

Have you looked at say the mean differences between the 3 groups, so take their value subtract real and find means?