Hi,
I've been struggling a bit with this problem:
If I have a sequence of DNA made from letters ATGC which has a length of say 3571 letters, what's the probability that this sequence will contain the sequence AAAA (a 4-mer) at two positions spaced s letters apart, where s is in 10, 50, 100, 200, and where no overlaps can occur, so AAAAAA only has 1 occurrence of AAAA and AAAAAAAA has 2. I also give a tolerance +/- two letters for each occurrence, but I've left that out of my attempt.
I've attached my attempt but I don't know how to calculate the probability of the second pattern which can either be 10, 50, 100 or 200 letters later....
Any help would be appreciated :-)
Best wishes,
James
