Thread: parameter estimation based on observed survival probabilities

1. parameter estimation based on observed survival probabilities

Say 100 samples have been pulled from a Normal distribution with unknown mean and variance. All I know about the results are the following:

• 23 out of 100 are greater than 200
• 9 out of 100 are greater than 300
• 2 out of 100 are greater than 400

How would I find the best estimate of the mean and SD of the distribution based on this information?

2. Re: parameter estimation based on observed survival probabilities

Do you know anything about maximum likelihood estimation?

3. Re: parameter estimation based on observed survival probabilities

Yes, but I'm not clear how that would apply in this situation. Individual observations aren't available, so it doesn't seem like MLE applied to the PDF is helpful. The CDF has no closed form solution. I did think about using MLE and a binomial distribution (combined w/ an approximation to the Normal CDF) to solve for the mu and sigma, but it's unwieldy, and honestly I feel like I'm over-complicating things. Seems like a straightforward problem, but for some reason the answer isn't obvious to me. Thanks for your help!

4. Re: parameter estimation based on observed survival probabilities

If I tell you mu and sigma can you tell me the probability of an observation being less than 200? Greater than 200 but less than 300?

5. Re: parameter estimation based on observed survival probabilities

Originally Posted by Dason
If I tell you mu and sigma can you tell me the probability of an observation being less than 200? Greater than 200 but less than 300?
Of course. But if I tell you Pr(X<200), can you tell me mu and sigma? No, because there is not a unique solution.

If I also tell you Pr(X>=200 AND X<300), then -- with some effort -- you could give me mu and sigma. But if I also give you Pr(X>=300 AND X<400), and those probabilities are based on observed results (not a Normal distribution w/ known parameters), then I think there probably is no mu and sigma which would describe that. I would be interested to know how you would solve for the most likely mu and sigma in that case.

6. Re: parameter estimation based on observed survival probabilities

I think you're forgetting how maximum likelihood works though. You basically say "if mu = 50 and sigma = 30 what is the probability of observing the data" (the likelihood) and your goal is to find the values of mu and sigma that maximize that. So pretend that you know mu and sigma for a second - can you write out the joint distribution of the observed data? Then it becomes a task of finding which values maximize that.

7. Re: parameter estimation based on observed survival probabilities

It's funny you say that, because I thought the problem might be that I'm too stuck on EXACTLY how MLE works in normal circumstances! Normally, we'd be dealing with individual observations, not some result like "23 of 100 obs are >200". Given individual observations, the path is to use the probability density function to establish a likelihood function, and then maximize the likelihood with respect to the individual parameters.

But what is the likelihood function here?? We're dealing with a cdf, not a pdf, and the Normal cdf is not closed form.

8. Re: parameter estimation based on observed survival probabilities

Well it might have been normally distributed originally but that's not what you see now. Now all you have is binned data but given the parameters you can find the probabilities of the bins.

Forget the original problem exists and pretend that you're trying to solve this problem:

The probability of an observation being Red is , the probability of an observation being Blue is and the probability of it being neither is . If you observe 32 reds, 23 blues, and 17 neithers then what are the MLEs for and .

9. Re: parameter estimation based on observed survival probabilities

Originally Posted by Dason
Well it might have been normally distributed originally but that's not what you see now. Now all you have is binned data but given the parameters you can find the probabilities of the bins.

Forget the original problem exists and pretend that you're trying to solve this problem:

The probability of an observation being Red is , the probability of an observation being Blue is and the probability of it being neither is . If you observe 32 reds, 23 blues, and 17 neithers then what are the MLEs for and .
I think you're hitting on one of the complications, which is that the information given speaks more directly to a binomial distribution than a Normal distribution. In your example, I don't think any MLE is needed; the best estimate for is 44.4% (32/72) and for it's 31.9% (23/72).

But that doesn't really get me anywhere, does it? I want to know something about the Normal distribution that underlies the percent of observations falling into the various "bins".

10. Re: parameter estimation based on observed survival probabilities

Originally Posted by schleprock2
I don't think any MLE is needed
That's not a good way to think about this. Take a step back and ask yourself how would you find the MLE in this case.

11. Re: parameter estimation based on observed survival probabilities

In the example you gave -- what are the MLEs of alpha (Prob of an observation being red) and beta (Prof of an observation being blue) -- you're talking about a binomial distribution. k successes in n trials. This page (https://onlinecourses.science.psu.edu/stat504/node/28) walks through the calculation better than I could in this quick reply, but the conclusion is that the MLE of alpha is k/n, or number of successes (32) divided by number of trials (72), or 44.4%.

So that's solved. But it doesn't appear to get me any closer to using that information to understand the underlying Normal distribution.

12. Re: parameter estimation based on observed survival probabilities

No it's technically not a binomial distribution. It would be a multinomial distribution (with a binomial there are only two possibilities).

Plus it's not solved. The whole point of it was to get you think about what the likelihood function actually is. There is a direct connection between the problem I gave you and the the problem you're trying to do.

13. Re: parameter estimation based on observed survival probabilities

Is the conditions here that:
Say 100 samples have been pulled from a Normal distribution
23 out of 100 are greater than 200
9 out of 100 are greater than 300
2 out of 100 are greater than 400?

OR is it that:

66 out of 100 are less than 200
23 out of 100 are greater than 200 and less than 300
9 out of 100 are greater than 300 and less than 400
2 out of 100 are greater than 400?
And that all are statistically independent?

14. Re: parameter estimation based on observed survival probabilities

Originally Posted by Dason
No it's technically not a binomial distribution. It would be a multinomial distribution (with a binomial there are only two possibilities).

Plus it's not solved. The whole point of it was to get you think about what the likelihood function actually is. There is a direct connection between the problem I gave you and the the problem you're trying to do.
Yes, it's technically multinomial. However, it's just as easily stated as a binomial ("blue" or "not blue").

Thanks for the feedback, but I don't think this is really getting me anywhere. I was really looking for suggested solutions, not additional problems related to the original question.

15. Re: parameter estimation based on observed survival probabilities

Originally Posted by GretaGarbo
Is the conditions here that:
Say 100 samples have been pulled from a Normal distribution
23 out of 100 are greater than 200
9 out of 100 are greater than 300
2 out of 100 are greater than 400?

OR is it that:

66 out of 100 are less than 200
23 out of 100 are greater than 200 and less than 300
9 out of 100 are greater than 300 and less than 400
2 out of 100 are greater than 400?
And that all are statistically independent?
Hi Greta-

I think the problem could be equivalently stated that way, but I don't think those outcomes are statistically independent, since the number of samples is fixed. More outcomes less than 200 means that (statistically) fewer will be greater than 400.

Page 1 of 2 1 2 Last

 Tweet