I'm afraid I'm not an expert in statistics, but I have a particular problem I'm interested in solving. I'm pretty sure this area already has a lot of literature, but I'm having difficulty finding something directly applicable to what I'm doing, so it would be great if somebody could nudge me in the right direction.

I think the best way to describe my problem is with an example (though if I knew the terminology I could probably do it in about six words). Say I have a machine which produces biased coins. It has some underlying continuous probability density function which it is using to pick a number 0 < P < 1, then creates a coin which will come up heads with probability P.

My first task is to estimate the function which the machine uses to generate these coins. I'm allowed to flip the coins, and I have access to a very large number of coins, but each one after a certain, random, amount of flips is taken away from me.

My second task is, once I've gotten through all of those coins, I will be given new coins, one at a time. Again I'll be able to flip them some random number of times, and then for each one, using that result and the model I've already generated, be asked to determine the probability density function for that particular coin (i.e. the probability that the coin has P = some particular probability p)

My approach so far has been the very straightforward one. I treat each coin as a binomial distribution, sum them up and normalize. This gives:

[TEX] \begin{equation} Prob(P=p) = \frac{1}{T} \sum_{i=0}^T {{n_i}\choose{h_i}}p^{h_i}(1-p)^{n_i-h_i} \end{equation} [/TEX]

Where T is the total number of coins I've flipped, and n is the number of flips of a coin and h is the number of times it came up heads (for the ith coin).

So, first: is that the correct approach? What would be good terminology to describe this if I wanted to look it up or describe it? Is there anything I should be aware of in particular with this approach (if it is a good one)?

For the second task, my approach was straightforward Bayes' theorem, but unfortunately it seems rather messy. Describing R as a result, and r as a specific result, I have:

[TEX] \begin{equation}Prob(P = p|R = r) = \frac{Prob(R = r | P = p) Prob(P = p)}{Prob(R = r)} \end{equation}[/TEX]

So let's say for the particular coin I want to generate a pdf for, I am allowed N trials and get heads for H of them. Then:

[TEX]\begin{equation} Prob(R = r | P = p) = {N \choose H}p^{H}(1-p)^{N-H} \end{equation} [/TEX] (From binomial distribution)

[TEX]\begin{equation} Prob(P = p) = \frac{1}{T} \sum_{i=0}^T {{h_i}\choose{n_i}}p^{h_i}(1-p)^{n_i-h_i} \end{equation} [/TEX] (From previously generated model)

[TEX]\begin{equation}Prob(R = r) = \int_{p=0}^1 dp {N \choose H}p^{H}(1-p)^{N-H} \frac{1}{T} \sum_{i=0}^T {{h_i}\choose{n_i}}p^{h_i}(1-p)^{n_i-h_i} \end{equation} [/TEX] (Combining the above two)

I have yet to fully go through the process of working out the integral and simplifying, but I believe it mostly ends up as a bunch of factorials which don't simplify very nicely.

So, as before: Is that the correct approach? Is there anything I should be aware of in particular with this approach (if it is a good one)? Any idea on simplifying the final expression, or on any algorithmic tricks to make calculcating this faster?