I have a problem related to statistics, and I hope that you can help me out. First, an attempt at a short version of my problem:

I have 2 data sets that consist of 3 sub-sets each that I want to compare with a methodology that takes the order of the data points into account.

Now, for the actual details.

I have multiple data sets of 3 sub data sets each, the latter of which sort of belong together. For any given set, data point 1 of sub data sets 1, 2, and 3 are based on the exact same measurement; they just use different criteria in the data analysis.

As an example, this is how one of my data sets looks:

Sub data set 1:

0 0.5

0.25 0.6706

0.5 0.654

0.75 0.6504

1 0.8377

1.25 0.9792

1.5 0.9869

Sub data set 2:

0 0.5

0.25 0.8098

0.5 0.7033

0.75 0.6766

1 0.8662

1.25 0.9857

1.5 0.9952

Sub data set 3:

0 0.5

0.25 0.7515

0.5 0.6742

0.75 0.5333

1 0.736

1.25 0.9298

1.5 0.9721

I want to compare this to a different data set that also consist of 3 sub data sets. I basically need a measure for how different the data sets are, with the order of the data points being taken into account (if, for example, the two sets included exactly the same numbers, but in the wrong order (e.g. data point 1 of sub-set 1 of set 1 equaled data point 7 of sub-set 1 of set 2), the method should catch that) and with sub data set x of one data set being compared to sub data set x of another data set.

The method I’m currently using is the following: Let the y-value of data point 1 of data set 1 for the 3 different criteria/the 3 different sub data sets be denoted as y_C1, y_C2, and y_C3, respectively, min=min(y_C1,y_C2,y_C3), and max=max(y_C1,y_C2,y_C3). I would then calculate the range within which the y-value of data point 1 of data set 1 lies as (min+max)/2 +/- (max-((min+max)/2). I then compare this range to the range of data set 2 for the same data point.

For the data point at x=1.0, for example, this gives me 0.8011+/-0.0651 as the y-range within which the y-value at x=1.0 lies, independent of the sub data set. However, comparing these ranges is not straightforward, and ideally, I wouldn’t have to use the method I use on every single data point, but just whole data sets or subsets. This approach also doesn’t take the error bars of the data points into account at all.

Does anyone have any suggestions how to improve upon any aspect of this very basic method?

Thanks in advance and best regards!

P.S: Just in case this matters in a way, the data I’m looking at resulted from the following process:

I have 29 different cases of something that I measured. I then modified and re-measured each of the 29 cases in 6 different ways. I thereby got 7 different data sets of 29 measurements each.

I then created receiver operating characteristic (ROC) curves for each of these 7 data sets. The data set (also consisting of 29 measurements) to which I compared these other 29 measurements was the same for all 7 data sets. Doing this yielded 7 ROC curves, and thereby 7 area under the curve (AUC) values.

I then repeated the whole process for 2 different criteria. I now had 3 sub data sets that sort of belong together. I then repeated the whole procedure for another condition, meaning that I ended up with 2 sets of 3 sub data sets of 7 data points each.