# finding maximums when checking for duplicate random number patterns

#### SedoSan7

##### New Member
I'm working on a small project to distinguish if a random number was generated by a man or a machine.
The method i'm using is checking for duplicate series depending on the length of the number entered.
The series i'm checking for are the 2 digits, 3, 4, 5, 6 and 7. Beyond that it get's pretty ridiculous.

What I come up with so far:
when entering a number of 100 digits length. Len = 100
when counting 2 digits duplicates, there is the range of [00 ~ 99] which are 100 items.
in a Len 100 there can only be 99 items of 2 digit numbers.
What we need to do now is to find the duplicates of these 99 items by checking them by themselves.
To do that, we will have (99 * (99 - 1)) / 2 = 4851 unique pairs to match.

Now, I did find out a formula that can tell me exactly the number of matches for any given Length and Items. For this example we need:
Length = 100
Number of Digits (Digits) = 2
Items = 10^digits = 10^2 = 100
Edit length (E.L) = Length - (Digits -1) = 100 - 1 = 99

Formula is: (E.L * (E.L - 1)) / (2 * items) = (99 * 98)/(2 * 100) = 48.51

This mean that in any given 100 truly random digits, there should be 48.51 2 digits matches between pairs.
if we apply this to 3 digits then the formula will be
(98 * 97)/(2 * 1000) = 4.753 items.
and for 4 digits it will be: (97 * 96)/(2 * 10000) = 0.4656

I've run multiple simulations and those numbers are correct, however i'm trying to create the maximums that a simulation can give but I'm not that good with statistics.
so with 100 sample size I got the following for 2 digits:
Mean: 48.63
Std. Dev: 6.43
std dev range [42.2 - 55.1]
Sample Min: 36
Sample Max: 67

Now obviously I got the min and max from the simulation and 100 samples isn't enough. Can I create a formula that will get me a reasonable range to calculate?
When I asked random people to generate a random 100 digit number I got:
- 85 matches for 2 digits series check vs 48.51 average
- 20 matches for 3 digits series check vs 4.75 average
- 3 matches for 4 digits series check vs 0.47 average
- 1 match for 5 digits series check vs 0.0456 average

Statistically speaking these results are very rare and thus can conclude that a human has entered them
What I'm stuck at is figuring out the maximums for the random formula and as an additional step I also need to calculate the score of how far off the sample is from the pool.