I have a question about probabilty.

There are 2 genomes (a long string, composed of 4 letters: A, T, G & C), each 150,000 bases (letters) long. Now, I generate 10,000 random fragments (sub-strings) from each genome, each exactly 20 bases long. What is the probability of finding exactly same fragment (string) in each genome?

What I've tried:

Since there are 4 possible bases, there can be 4^20 total fragments. So finding the same fragment between the 2 genomes is (1/4^20), right?

But, how does the fact that there are only 10,000 fragments affect the probability? Also, do I have to worry about the genome size (150,000 bases, from which the fragments were obtained)? I am unable to find answer for this.

Another related question: so for the same problem, if I allow some mis-matches between the fragment match (say 5 out of 20 bases need not match), how will the probability change? Will it be just (1/4^15) or (1/4^15)*(15 choose 5)?

Any help will be greatly appreciated! Have a great day!