Thanks for the reply,
OK, now let’s assume there is a 1000 element series, as above, comprised of the first ten letters of the alphabet: (a, b, c, d, e, f, g, h, i, j). We don't know whether it is randomly generated or not, but we work from the assumption that it is. We note that a three-letter sequence occurs twice in the series. That shouldn't be so unlikely that it would call into question the assumption the series is randomly generated.
But suppose we find a five-letter sequence that occurs twice in the 1000 element series, or a ten-letter sequence, or a twenty-letter sequence that occurs twice in the series? At some length the occurrence of two identical sequences in the 1000 element series is going to be so unlikely that it refutes the assumption that the series is randomly generated.
I'm guessing that this is some sort of elementary probability/statistics problem which can be generalized as a function the length of the series and the length of the identical sequences. I'm hoping someone can help me understand how to do this.
Any thoughts?