Cool problem. I think the idea is to flip whichever coin you think at that time has the higher p, so for the first flip it doesn't matter, the second you flip again if it's heads or switch if not, etc, always adjusting your posterior estimates of the probabilities in light of earlier flips. But I'm not sure it's so simple as just taking the point estimate of each probability. For example, that oculd leave you with the following: If you flip coin A once and get tails, your posterior probability is , so you'd flip coin B next, and if you got heads you'd flip B forever, no matter what else happened. But suppose after a while you get an estimate that's really low. Then at some point you'd have to wonder if your one-time estimate on coin A was just a fluke and it's worth trying again. So it seems that what you really want to calculate at each stage is . So you'd need the joint distribution .
That's possible to compute, though a bit more difficult. What worries me, and I don't have time or possibly the knowledge to work out, is if there are cases in which the information value of flipping the worse but more uncertain coin outweighs the expected loss, even when we're considering the joint probabilities. I'm quite tired right now and have some of my own work to do, but this is pretty interesting to me and I may come back to it. In the meantime, I'd work on deriving the joint conditional distribution above.
PS The fact that N is finite may also play into this; it might work out that the information value works out in such a way that some experiement of X flips makes sense to run at the beginning but not near the end because you'd run out of time. Knowing that, you'd have to adjust what your plans are earlier on too.