# Thread: probability of finding 2 random DNA fragments matching between genomes

1. ## probability of finding 2 random DNA fragments matching between genomes

Hi all,

I have a question about probabilty.
There are 2 genomes (a long string, composed of 4 letters: A, T, G & C), each 150,000 bases (letters) long. Now, I generate 10,000 random fragments (sub-strings) from each genome, each exactly 20 bases long. What is the probability of finding exactly same fragment (string) in each genome?

What I've tried:

Since there are 4 possible bases, there can be 4^20 total fragments. So finding the same fragment between the 2 genomes is (1/4^20), right?
But, how does the fact that there are only 10,000 fragments affect the probability? Also, do I have to worry about the genome size (150,000 bases, from which the fragments were obtained)? I am unable to find answer for this.

Another related question: so for the same problem, if I allow some mis-matches between the fragment match (say 5 out of 20 bases need not match), how will the probability change? Will it be just (1/4^15) or (1/4^15)*(15 choose 5)?

Any help will be greatly appreciated! Have a great day!

2. ## Re: probability of finding 2 random DNA fragments matching between genomes

If the strings are random, the odds of one string matching another string is

The odds one string matches any of a list of 10000 strings is

The odds of any string in the first list matching any string in the second list is

For the last question

I haven't taken biology since high school, and am far from an expert, but it is my understanding that genomes are not random and there are rules like percentage of g = percentage c, percentage a = t

3. ## The Following User Says Thank You to asterisk For This Useful Post:

GummyBear (11-26-2013)

4. ## Re: probability of finding 2 random DNA fragments matching between genomes

Yes, I know the genome composition is not random, it is just to test our simple hypothesis.

BTW for the last answer, I didn't clearly understand your answer. Can you please explain it to me (sorry for my ignorance)? Thanks once again!

5. ## Re: probability of finding 2 random DNA fragments matching between genomes

This is a Binomial Distribution

n = 20 = number of trials
k = 15 = number of successes
p = 1/4 = probability of success
(1-p) = 3/4 = probability of failure

6. ## The Following User Says Thank You to asterisk For This Useful Post:

GummyBear (11-27-2013)

7. ## Re: probability of finding 2 random DNA fragments matching between genomes

Thanks very much, asterisk! It was very helpful.

 Tweet