Do you know anything about maximum likelihood estimation?
Say 100 samples have been pulled from a Normal distribution with unknown mean and variance. All I know about the results are the following:
• 23 out of 100 are greater than 200
• 9 out of 100 are greater than 300
• 2 out of 100 are greater than 400
How would I find the best estimate of the mean and SD of the distribution based on this information?
Thanks in advance!
Do you know anything about maximum likelihood estimation?
I don't have emotions and sometimes that makes me very sad.
Yes, but I'm not clear how that would apply in this situation. Individual observations aren't available, so it doesn't seem like MLE applied to the PDF is helpful. The CDF has no closed form solution. I did think about using MLE and a binomial distribution (combined w/ an approximation to the Normal CDF) to solve for the mu and sigma, but it's unwieldy, and honestly I feel like I'm over-complicating things. Seems like a straightforward problem, but for some reason the answer isn't obvious to me. Thanks for your help!
If I tell you mu and sigma can you tell me the probability of an observation being less than 200? Greater than 200 but less than 300?
I don't have emotions and sometimes that makes me very sad.
Of course. But if I tell you Pr(X<200), can you tell me mu and sigma? No, because there is not a unique solution.
If I also tell you Pr(X>=200 AND X<300), then -- with some effort -- you could give me mu and sigma. But if I also give you Pr(X>=300 AND X<400), and those probabilities are based on observed results (not a Normal distribution w/ known parameters), then I think there probably is no mu and sigma which would describe that. I would be interested to know how you would solve for the most likely mu and sigma in that case.
I think you're forgetting how maximum likelihood works though. You basically say "if mu = 50 and sigma = 30 what is the probability of observing the data" (the likelihood) and your goal is to find the values of mu and sigma that maximize that. So pretend that you know mu and sigma for a second - can you write out the joint distribution of the observed data? Then it becomes a task of finding which values maximize that.
I don't have emotions and sometimes that makes me very sad.
It's funny you say that, because I thought the problem might be that I'm too stuck on EXACTLY how MLE works in normal circumstances! Normally, we'd be dealing with individual observations, not some result like "23 of 100 obs are >200". Given individual observations, the path is to use the probability density function to establish a likelihood function, and then maximize the likelihood with respect to the individual parameters.
But what is the likelihood function here?? We're dealing with a cdf, not a pdf, and the Normal cdf is not closed form.
Well it might have been normally distributed originally but that's not what you see now. Now all you have is binned data but given the parameters you can find the probabilities of the bins.
Forget the original problem exists and pretend that you're trying to solve this problem:
The probability of an observation being Red is , the probability of an observation being Blue is and the probability of it being neither is . If you observe 32 reds, 23 blues, and 17 neithers then what are the MLEs for and .
I don't have emotions and sometimes that makes me very sad.
I think you're hitting on one of the complications, which is that the information given speaks more directly to a binomial distribution than a Normal distribution. In your example, I don't think any MLE is needed; the best estimate for is 44.4% (32/72) and for it's 31.9% (23/72).
But that doesn't really get me anywhere, does it? I want to know something about the Normal distribution that underlies the percent of observations falling into the various "bins".
In the example you gave -- what are the MLEs of alpha (Prob of an observation being red) and beta (Prof of an observation being blue) -- you're talking about a binomial distribution. k successes in n trials. This page (https://onlinecourses.science.psu.edu/stat504/node/28) walks through the calculation better than I could in this quick reply, but the conclusion is that the MLE of alpha is k/n, or number of successes (32) divided by number of trials (72), or 44.4%.
So that's solved. But it doesn't appear to get me any closer to using that information to understand the underlying Normal distribution.
No it's technically not a binomial distribution. It would be a multinomial distribution (with a binomial there are only two possibilities).
Plus it's not solved. The whole point of it was to get you think about what the likelihood function actually is. There is a direct connection between the problem I gave you and the the problem you're trying to do.
I don't have emotions and sometimes that makes me very sad.
Is the conditions here that:
Say 100 samples have been pulled from a Normal distribution
23 out of 100 are greater than 200
9 out of 100 are greater than 300
2 out of 100 are greater than 400?
OR is it that:
66 out of 100 are less than 200
23 out of 100 are greater than 200 and less than 300
9 out of 100 are greater than 300 and less than 400
2 out of 100 are greater than 400?
And that all are statistically independent?
Yes, it's technically multinomial. However, it's just as easily stated as a binomial ("blue" or "not blue").
Thanks for the feedback, but I don't think this is really getting me anywhere. I was really looking for suggested solutions, not additional problems related to the original question.
Tweet |