S

• 23 out of 100 are greater than 200

• 9 out of 100 are greater than 300

• 2 out of 100 are greater than 400

How would I find the best estimate of the mean and SD of the distribution based on this information?

Thanks in advance!

- Thread starter schleprock
- Start date
- Tags maximum likelihood

S

• 23 out of 100 are greater than 200

• 9 out of 100 are greater than 300

• 2 out of 100 are greater than 400

How would I find the best estimate of the mean and SD of the distribution based on this information?

Thanks in advance!

If I tell you mu and sigma can you tell me the probability of an observation being less than 200? Greater than 200 but less than 300?

If I also tell you Pr(X>=200 AND X<300), then -- with some effort -- you could give me mu and sigma. But if I also give you Pr(X>=300 AND X<400), and those probabilities are based on observed results (not a Normal distribution w/ known parameters), then I think there probably is no mu and sigma which would describe that. I would be interested to know how you would solve for the most likely mu and sigma in that case.

But what is the likelihood function here?? We're dealing with a cdf, not a pdf, and the Normal cdf is not closed form.

Forget the original problem exists and pretend that you're trying to solve this problem:

The probability of an observation being Red is \(\alpha\), the probability of an observation being Blue is \(\beta\) and the probability of it being neither is \(1-\alpha-\beta\). If you observe 32 reds, 23 blues, and 17 neithers then what are the MLEs for \(\alpha\) and \(\beta\).

Forget the original problem exists and pretend that you're trying to solve this problem:

The probability of an observation being Red is \(\alpha\), the probability of an observation being Blue is \(\beta\) and the probability of it being neither is \(1-\alpha-\beta\). If you observe 32 reds, 23 blues, and 17 neithers then what are the MLEs for \(\alpha\) and \(\beta\).

But that doesn't really get me anywhere, does it? I want to know something about the Normal distribution that underlies the percent of observations falling into the various "bins".

So that's solved. But it doesn't appear to get me any closer to using that information to understand the underlying Normal distribution.

Plus it's not solved. The whole point of it was to get you think about what the likelihood function actually is. There is a direct connection between the problem I gave you and the the problem you're trying to do.

Say 100 samples have been pulled from a Normal distribution

23 out of 100 are greater than 200

9 out of 100 are greater than 300

2 out of 100 are greater than 400?

OR is it that:

66 out of 100 are less than 200

23 out of 100 are greater than 200 and less than 300

9 out of 100 are greater than 300 and less than 400

2 out of 100 are greater than 400?

And that all are statistically independent?

Plus it's not solved. The whole point of it was to get you think about what the likelihood function actually is. There is a direct connection between the problem I gave you and the the problem you're trying to do.

Thanks for the feedback, but I don't think this is really getting me anywhere. I was really looking for suggested solutions, not additional problems related to the original question.

Say 100 samples have been pulled from a Normal distribution

23 out of 100 are greater than 200

9 out of 100 are greater than 300

2 out of 100 are greater than 400?

OR is it that:

66 out of 100 are less than 200

23 out of 100 are greater than 200 and less than 300

9 out of 100 are greater than 300 and less than 400

2 out of 100 are greater than 400?

And that all are statistically independent?

I think the problem could be equivalently stated that way, but I don't think those outcomes are statistically independent, since the number of samples is fixed. More outcomes less than 200 means that (statistically) fewer will be greater than 400.

Thanks for the feedback, but I don't think this is really getting me anywhere. I was really looking for suggested solutions, not additional problems related to the original question.

So you don't really see how your original problem is actually a multinomial? I wasn't giving you more work for the hell of it. Like I said it was *directly* related to your original problem. Take a minute and try to find the connection.

To be honest, I'm not sure you're understanding the problem. This may be my fault for not explaining it clearly. So let me try a slightly different angle.

You suggested thinking about the problem in terms of a multinomial (binomial) -- which I agreed with -- and suggested applying MLE -- which I also agreed with. However, ultimately I'm not interested in just solving for p (probability of success, or probability of "blue" in your example). I’m interested in solving for the parameters of the underlying distribution that drives p.

So yes, I can do as you suggested and establish the likelihood function for a multinomial distribution, take the natural log, take the derivative with respect to p, and solve for the p that maximizes log-likelihood. Then what? I’m left with an estimate of p, which doesn’t get me anywhere because p is a function of mu and sigma. My first thought was to substitute the Normal CDF as p – in other words include mu and sigma in the likelihood function explicitly – but again the CDF has no closed form solution, so this didn’t seem to be a viable approach.

I don’t know if this is coming across, but in any case I don’t think the problem is as easy as you are making it out to be. If I’m mistaken about any of the above, please let me know where.

So you agree that it's multinomial given the bin probabilities right? And how do you find the bin probabilities? Well they're just probabilities derived from the normal distribution (so these probabilities are functions of mu and sigma). So there you have it - that's how you get your likelihood.

Trust me I completely understand the problem - I was trying to get you to make that last step but I think you were just stopping at "oh I guess it could be a multinomial but I don't care about the ps" when you should have taken it one step further and realized the ps are functions of mu and sigma.

I completely understand the problem. What I'm trying to get you to do is set up the **** likelihood! You just seem too stubborn to do that.

So you agree that it's multinomial given the bin probabilities right? And how do you find the bin probabilities? Well they're just probabilities derived from the normal distribution (so these probabilities are functions of mu and sigma). So there you have it - that's how you get your likelihood.

Trust me I completely understand the problem - I was trying to get you to make that last step but I think you were just stopping at "oh I guess it could be a multinomial but I don't care about the ps" when you should have taken it one step further and realized the ps are functions of mu and sigma.

Rather than laboriously type out the likelihood function in a response -- finding the right symbol codes and whatnot -- I linked yesterday to a web page that showed the likelihood function (admittedly, for a binomial) that I would've ended up typing. Here it is again: https://onlinecourses.science.psu.edu/stat504/node/28. Regardless, you can safely assume for the purposes of this and future discussions that I understand MLE, how to set up a likelihood function, etc.

I think this is where we diverge. I've agreed several times that these probabilities are driven by mu and sigma. But the function is not closed-form. The PDF is, but the CDF isn't...it's a definite integration of the PDF from -infinity to x. So if you could please explain to me how to use that integral (i.e. the Normal CDF) in place of p such that I can maximize likelihood relative to mu and sigma, that would be fantastic. Or if there's a simpler way to do it, that would be even better. Either way, I hope you can see where the maximum likelihood function for a multinomial isn’t the piece of the puzzle that I’m missing.

See above. I fully realize that the ps are functions of mu and sigma. What I don’t yet understand – but am hoping you do – is how to employ the CDF of a Normal in the multinomial likelihood function.

I think this is where we diverge. I've agreed several times that these probabilities are driven by mu and sigma. But the function is not closed-form. The PDF is, but the CDF isn't...it's a definite integration of the PDF from -infinity to x. So if you could please explain to me how to use that integral (i.e. the Normal CDF) in place of p such that I can maximize likelihood relative to mu and sigma, that would be fantastic. Or if there's a simpler way to do it, that would be even better. Either way, I hope you can see where the maximum likelihood function for a multinomial isn’t the piece of the puzzle that I’m missing.

See above. I fully realize that the ps are functions of mu and sigma. What I don’t yet understand – but am hoping you do – is how to employ the CDF of a Normal in the multinomial likelihood function.