The method i'm using is checking for duplicate series depending on the length of the number entered.

The series i'm checking for are the 2 digits, 3, 4, 5, 6 and 7. Beyond that it get's pretty ridiculous.

What I come up with so far:

when entering a number of 100 digits length. Len = 100

when counting 2 digits duplicates, there is the range of [00 ~ 99] which are 100 items.

in a Len 100 there can only be 99 items of 2 digit numbers.

What we need to do now is to find the duplicates of these 99 items by checking them by themselves.

To do that, we will have (99 * (99 - 1)) / 2 = 4851 unique pairs to match.

Now, I did find out a formula that can tell me exactly the number of matches for any given Length and Items. For this example we need:

Length = 100

Number of Digits (Digits) = 2

Items = 10^digits = 10^2 = 100

Edit length (E.L) = Length - (Digits -1) = 100 - 1 = 99

Formula is: (E.L * (E.L - 1)) / (2 * items) = (99 * 98)/(2 * 100) = 48.51

This mean that in any given 100 truly random digits, there should be 48.51 2 digits matches between pairs.

if we apply this to 3 digits then the formula will be

(98 * 97)/(2 * 1000) = 4.753 items.

and for 4 digits it will be: (97 * 96)/(2 * 10000) = 0.4656

I've run multiple simulations and those numbers are correct, however i'm trying to create the maximums that a simulation can give but I'm not that good with statistics.

so with 100 sample size I got the following for 2 digits:

Mean: 48.63

Std. Dev: 6.43

std dev range [42.2 - 55.1]

Sample Min: 36

Sample Max: 67

Now obviously I got the min and max from the simulation and 100 samples isn't enough. Can I create a formula that will get me a reasonable range to calculate?

When I asked random people to generate a random 100 digit number I got:

- 85 matches for 2 digits series check vs 48.51 average

- 20 matches for 3 digits series check vs 4.75 average

- 3 matches for 4 digits series check vs 0.47 average

- 1 match for 5 digits series check vs 0.0456 average

Statistically speaking these results are very rare and thus can conclude that a human has entered them

What I'm stuck at is figuring out the maximums for the random formula and as an additional step I also need to calculate the score of how far off the sample is from the pool.