+ Reply to Thread
Results 1 to 5 of 5

Thread: probability of finding 2 random DNA fragments matching between genomes

  1. #1
    Points: 17, Level: 1
    Level completed: 33%, Points required for next Level: 33

    Posts
    3
    Thanks
    2
    Thanked 0 Times in 0 Posts

    probability of finding 2 random DNA fragments matching between genomes




    Hi all,

    I have a question about probabilty.
    There are 2 genomes (a long string, composed of 4 letters: A, T, G & C), each 150,000 bases (letters) long. Now, I generate 10,000 random fragments (sub-strings) from each genome, each exactly 20 bases long. What is the probability of finding exactly same fragment (string) in each genome?

    What I've tried:

    Since there are 4 possible bases, there can be 4^20 total fragments. So finding the same fragment between the 2 genomes is (1/4^20), right?
    But, how does the fact that there are only 10,000 fragments affect the probability? Also, do I have to worry about the genome size (150,000 bases, from which the fragments were obtained)? I am unable to find answer for this.

    Another related question: so for the same problem, if I allow some mis-matches between the fragment match (say 5 out of 20 bases need not match), how will the probability change? Will it be just (1/4^15) or (1/4^15)*(15 choose 5)?

    Any help will be greatly appreciated! Have a great day!

  2. #2
    Points: 1,097, Level: 17
    Level completed: 97%, Points required for next Level: 3

    Location
    Philadellphia, PA
    Posts
    68
    Thanks
    1
    Thanked 20 Times in 18 Posts

    Re: probability of finding 2 random DNA fragments matching between genomes

    If the strings are random, the odds of one string matching another string is \frac{1}{4^{20}}

    The odds one string matches any of a list of 10000 strings is

    \frac{10000}{4^{20}}

    The odds of any string in the first list matching any string in the second list is

    \frac{10000^2}{4^{20}}


    For the last question

    {{20}\choose{15}}*(1/4)^{15}*(3/4)^5

    I haven't taken biology since high school, and am far from an expert, but it is my understanding that genomes are not random and there are rules like percentage of g = percentage c, percentage a = t

  3. The Following User Says Thank You to asterisk For This Useful Post:

    GummyBear (11-26-2013)

  4. #3
    Points: 17, Level: 1
    Level completed: 33%, Points required for next Level: 33

    Posts
    3
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: probability of finding 2 random DNA fragments matching between genomes

    Thanks very much for your answer!
    Yes, I know the genome composition is not random, it is just to test our simple hypothesis.

    BTW for the last answer, I didn't clearly understand your answer. Can you please explain it to me (sorry for my ignorance)? Thanks once again!
    Last edited by GummyBear; 11-26-2013 at 11:06 PM.

  5. #4
    Points: 1,097, Level: 17
    Level completed: 97%, Points required for next Level: 3

    Location
    Philadellphia, PA
    Posts
    68
    Thanks
    1
    Thanked 20 Times in 18 Posts

    Re: probability of finding 2 random DNA fragments matching between genomes

    This is a Binomial Distribution

    {{20}\choose{15}}*(1/4)^{15}*(3/4)^5

    {{n}\choose{k}}*(p)^{k}*(1-p)^{n=k}

    n = 20 = number of trials
    k = 15 = number of successes
    p = 1/4 = probability of success
    (1-p) = 3/4 = probability of failure

  6. The Following User Says Thank You to asterisk For This Useful Post:

    GummyBear (11-27-2013)

  7. #5
    Points: 17, Level: 1
    Level completed: 33%, Points required for next Level: 33

    Posts
    3
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: probability of finding 2 random DNA fragments matching between genomes


    Thanks very much, asterisk! It was very helpful.

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats