Estimating a pdf from binomial trials - point me in the right direction?


I'm afraid I'm not an expert in statistics, but I have a particular problem I'm interested in solving. I'm pretty sure this area already has a lot of literature, but I'm having difficulty finding something directly applicable to what I'm doing, so it would be great if somebody could nudge me in the right direction.

I think the best way to describe my problem is with an example (though if I knew the terminology I could probably do it in about six words). Say I have a machine which produces biased coins. It has some underlying continuous probability density function which it is using to pick a number 0 < P < 1, then creates a coin which will come up heads with probability P.

My first task is to estimate the function which the machine uses to generate these coins. I'm allowed to flip the coins, and I have access to a very large number of coins, but each one after a certain, random, amount of flips is taken away from me.

My second task is, once I've gotten through all of those coins, I will be given new coins, one at a time. Again I'll be able to flip them some random number of times, and then for each one, using that result and the model I've already generated, be asked to determine the probability density function for that particular coin (i.e. the probability that the coin has P = some particular probability p)

My approach so far has been the very straightforward one. I treat each coin as a binomial distribution, sum them up and normalize. This gives:

[TEX] \begin{equation} Prob(P=p) = \frac{1}{T} \sum_{i=0}^T {{n_i}\choose{h_i}}p^{h_i}(1-p)^{n_i-h_i} \end{equation} [/TEX]

Where T is the total number of coins I've flipped, and n is the number of flips of a coin and h is the number of times it came up heads (for the ith coin).

So, first: is that the correct approach? What would be good terminology to describe this if I wanted to look it up or describe it? Is there anything I should be aware of in particular with this approach (if it is a good one)?

For the second task, my approach was straightforward Bayes' theorem, but unfortunately it seems rather messy. Describing R as a result, and r as a specific result, I have:

[TEX] \begin{equation}Prob(P = p|R = r) = \frac{Prob(R = r | P = p) Prob(P = p)}{Prob(R = r)} \end{equation}[/TEX]

So let's say for the particular coin I want to generate a pdf for, I am allowed N trials and get heads for H of them. Then:
[TEX]\begin{equation} Prob(R = r | P = p) = {N \choose H}p^{H}(1-p)^{N-H} \end{equation} [/TEX] (From binomial distribution)
[TEX]\begin{equation} Prob(P = p) = \frac{1}{T} \sum_{i=0}^T {{h_i}\choose{n_i}}p^{h_i}(1-p)^{n_i-h_i} \end{equation} [/TEX] (From previously generated model)
[TEX]\begin{equation}Prob(R = r) = \int_{p=0}^1 dp {N \choose H}p^{H}(1-p)^{N-H} \frac{1}{T} \sum_{i=0}^T {{h_i}\choose{n_i}}p^{h_i}(1-p)^{n_i-h_i} \end{equation} [/TEX] (Combining the above two)

I have yet to fully go through the process of working out the integral and simplifying, but I believe it mostly ends up as a bunch of factorials which don't simplify very nicely.

So, as before: Is that the correct approach? Is there anything I should be aware of in particular with this approach (if it is a good one)? Any idea on simplifying the final expression, or on any algorithmic tricks to make calculcating this faster?


Ambassador to the humans
It sounds like you want to look into the beta-binomial distribution. Then again the 'continuous probability distribution' you're drawing your p from might not be a beta. But it's an idea and the beta distribution is pretty flexible. Also it sounds like you want to do some sort of Bayesian analysis? I don't quite think what you've done is correct so far but I don't really have time to work through and provide more suggestions at the moment.
Just taken a quick glance at beta binomial distributions and they do look like they may have some relevance, especially in terms of the bayesian step. I'll look at whether they provide any simplification to the maths there a little later. Thanks for pointing me towards that!

I don't have much of a sense of how flexible the beta distribution is, but I'd probably prefer to avoid going down the road of assigning a specific distribution and parameter fitting as much as possible. Another consideration is that I also want to be able to generalise this to joint probability distribution functions- to continue the example above, imagine that instead of producing a single coin, the machine produced coins in sets of three, with one red, one blue and one green, using functions for the probability of each which may or may not be mutually independent. I'll think about how to do that a bit more later, for now I just want to focus on the single-variable case.

Also it sounds like you want to do some sort of Bayesian analysis?
Er, yes! Or at least this is the way that I know how to deal with a situation like the one described. If anybody can suggest a better one, I'm all ears. It's likely that I won't have heard of it, or know which kind of approach is appropriate in which situation

I don't quite think what you've done is correct so far
Quite possible. The calculations I've done so far have been a bit back-of-the-envelope, since I haven't wanted to spend huge amounts of time either going down a cul-de-sac or reinventing the wheel until I heard some opinions from people more knowledgeable than me.
Thinking about it, summing and normalising doesn't seem like the correct way of combining multiple observations of different coins. If I had a billion coins, was allowed to flip every one of them once, and got heads every time, I'd end up with the same pdf as if I had only flipped one coin and gotten heads. That can't be right.

Anybody able to suggest what might the right way of doing that?