Hi!

Not sure I post in the right place, but maybe you can help, and some probabilities can be introduced in discussion for the following problem:


I have N sensor measurements (N=5000000) and a random sample of size s (s=20) from this set of data. For each measurement is computed a rank as being the minimum distance to the sample values. So finally we will have 5000000 ranks.
The problem is how do we compute the ranks when 100 new measurements are retrieved, so now we have 5000100 data objects. We could generate a new random sample s1 and repeat all computations for all 5000100 measurements. I want to avoid this since it is computational expensive.
So I was thinking to keep the already generated sample s and already generated 5000000 ranks and compute the rank incrementally only for the new 100 observations.
Is any way to asses the error that would be introduced if we would compute the ranks for the 100 new values referring to the old sample s?

Example

N=10 [2 7 10 2 6 9 11 3 15 8]
s=3 [10 7 2]

rank[2]=0
rank[7]=0
rank[10]=0
rank[2]=0
rank[6]=1
rank[9]=1
rank[11]=1
rank[3]=1
rank[15]=5
rank[8]=1

suppose we have 3 new observations [5 19 4] the new set of observations would be

[2 7 10 2 6 9 11 3 15 8 5 9 14]

Is not correct to consider that for this set a random sample is [10 7 2] since we give no chance of being selected to the new introduced elements.
Still considering that the random sample is [10 7 2] how can we estimate the error of computing the ranks for [5 19 4] in respect to sample [10 7 2] , i.e
rank[5]=2
rank[19]=9
rank[4]=2

what would be the error of taking these values as real ranks?

Thanks and regards
Sorin