Let's say I have a length N, sequence X of non-integer values. I have another sequence Y of non-integers of length M, where N >> M. I employ a metric (say MSE) to find the closest alignment of X and Y. The closest alignment gives me a score S. I would like to estimate a probability that the match is a correct match (i.e. the alignment is correct). Using Bayes rule:
I have estimated the probabilities P(S|match) and P(S|no match) based on synthesized data. The issue I have here is in determining P(match) and P(no match). If I use the fact that there is only one true alignment and account for every other possible shift (with equal probability), then P(match) is really really small (1/(N-M)) and the metric begins to appear very ineffective. I could group the shifts and come up with some probability of a match "within Z shifts", but this seems very arbitrary. Another issue that will affect P(match) and P(no match) is that the sequences (X & Y) contain some amount of serial correlation. At this point, I'm not sure how to estimate the correlation nor how to account for it.
I have searched the web for possible solutions, but I haven't found any. I appreciate any comments.
Hi, thanks for the reply. I have seen some of the literature on RNA sequencing and it does seem like a very similar problem. However, I can't seem to find anything where probabilities or confidence metrics are calculated. If you know of any, I'd be very grateful for any pointers. Unfortunately, I'm forbidden to talk about the application I have in mind. I'm very sorry.