I am calculating an attendance factor by dividing the average daily attendance by the highest single daily attendance occurrence.

eg.

total attendance 577

total number of school days in the month 23

average daily attendance = (577/23) 25.0869

attendance factor =25.0869 / 26 (highest attendance day) = .9648

The problem: what if I don't have the highest attendance day, is there some way to simulate the daily attendance from the total attendance of 577 and 23 day in the month?

All help is greatly appreciated. Have a great day. ]]>

It's 8am. There's some % chance of him calling before 5pm. Let's say that's X%.

So then let's say it's 4:59pm... The chance of him calling before 5pm is very low. Much less than X%. Let's say there's a Z% chance.

Now let's say it's 4pm. The chance of him calling is higher than Z% but less than X%. Let's call the chance at 4pm Y%.

So just fumbling around with this toy example, it would appear to me that X% >= Y% >= Z%.

What is this phenomenon called? Is this probably an exponential decay kind of distribution? ]]>

I have the following problem - given a set S of N=100000 data elements I need to extract a random sample R of size n=20 and then for each element in S compute the minimum distance to the points in R. That minimum distance is declared to be the rank of the element.

The problem is that the N data is not available all from the beginning but it is processed in parallel, in a distributed fashion, being available only partitions of M=8000 elements.

I was thinking to generate a random sample for each partition and compute the ranks per partition, but then the ranks won't have the same meaning when compared across partitions since the partitions have different random samples (a point with rank=0.2 in a partition is not the same with a point with rank=0.2 in other partition)

Do you have any idea of how can be unified the ranks globally, after they are computed locally, per partition?

Is any way of comparing how similar 2 random samples are?

Based on that, can we derive a metric to measure the impact of one random sample on a rank?

The final goal is to unify all ranks and to have a correct global rank instead of partition based rank.

Any ideas would be highly appreciated.

Thanks

Sorin ]]>