# Thread: parameter estimation based on observed survival probabilities

1. ## Re: parameter estimation based on observed survival probabilities

Originally Posted by schleprock2
Thanks for the feedback, but I don't think this is really getting me anywhere. I was really looking for suggested solutions, not additional problems related to the original question.
So you don't really see how your original problem is actually a multinomial? I wasn't giving you more work for the hell of it. Like I said it was *directly* related to your original problem. Take a minute and try to find the connection.

2. ## Re: parameter estimation based on observed survival probabilities

Originally Posted by Dason
So you don't really see how your original problem is actually a multinomial? I wasn't giving you more work for the hell of it. Like I said it was *directly* related to your original problem. Take a minute and try to find the connection.
I believe I already agreed that it was multinomial, though I also said I felt it could be stated as binomial. After all, the former is just a generalization of the latter.

To be honest, I'm not sure you're understanding the problem. This may be my fault for not explaining it clearly. So let me try a slightly different angle.

You suggested thinking about the problem in terms of a multinomial (binomial) -- which I agreed with -- and suggested applying MLE -- which I also agreed with. However, ultimately I'm not interested in just solving for p (probability of success, or probability of "blue" in your example). I’m interested in solving for the parameters of the underlying distribution that drives p.

So yes, I can do as you suggested and establish the likelihood function for a multinomial distribution, take the natural log, take the derivative with respect to p, and solve for the p that maximizes log-likelihood. Then what? I’m left with an estimate of p, which doesn’t get me anywhere because p is a function of mu and sigma. My first thought was to substitute the Normal CDF as p – in other words include mu and sigma in the likelihood function explicitly – but again the CDF has no closed form solution, so this didn’t seem to be a viable approach.

I don’t know if this is coming across, but in any case I don’t think the problem is as easy as you are making it out to be. If I’m mistaken about any of the above, please let me know where.

3. ## Re: parameter estimation based on observed survival probabilities

I completely understand the problem. What I'm trying to get you to do is set up the **** likelihood! You just seem too stubborn to do that.

So you agree that it's multinomial given the bin probabilities right? And how do you find the bin probabilities? Well they're just probabilities derived from the normal distribution (so these probabilities are functions of mu and sigma). So there you have it - that's how you get your likelihood.

Trust me I completely understand the problem - I was trying to get you to make that last step but I think you were just stopping at "oh I guess it could be a multinomial but I don't care about the ps" when you should have taken it one step further and realized the ps are functions of mu and sigma.

4. ## Re: parameter estimation based on observed survival probabilities

Originally Posted by Dason
I completely understand the problem. What I'm trying to get you to do is set up the **** likelihood! You just seem too stubborn to do that.
Rather than laboriously type out the likelihood function in a response -- finding the right symbol codes and whatnot -- I linked yesterday to a web page that showed the likelihood function (admittedly, for a binomial) that I would've ended up typing. Here it is again: https://onlinecourses.science.psu.edu/stat504/node/28. Regardless, you can safely assume for the purposes of this and future discussions that I understand MLE, how to set up a likelihood function, etc.

Originally Posted by Dason
So you agree that it's multinomial given the bin probabilities right? And how do you find the bin probabilities? Well they're just probabilities derived from the normal distribution (so these probabilities are functions of mu and sigma). So there you have it - that's how you get your likelihood.
I think this is where we diverge. I've agreed several times that these probabilities are driven by mu and sigma. But the function is not closed-form. The PDF is, but the CDF isn't...it's a definite integration of the PDF from -infinity to x. So if you could please explain to me how to use that integral (i.e. the Normal CDF) in place of p such that I can maximize likelihood relative to mu and sigma, that would be fantastic. Or if there's a simpler way to do it, that would be even better. Either way, I hope you can see where the maximum likelihood function for a multinomial isn’t the piece of the puzzle that I’m missing.

Originally Posted by Dason
Trust me I completely understand the problem - I was trying to get you to make that last step but I think you were just stopping at "oh I guess it could be a multinomial but I don't care about the ps" when you should have taken it one step further and realized the ps are functions of mu and sigma.
See above. I fully realize that the ps are functions of mu and sigma. What I don’t yet understand – but am hoping you do – is how to employ the CDF of a Normal in the multinomial likelihood function.

5. ## Re: parameter estimation based on observed survival probabilities

Originally Posted by schleprock2
Rather than laboriously type out the likelihood function in a response -- finding the right symbol codes and whatnot -- I linked yesterday to a web page that showed the likelihood function (admittedly, for a binomial) that I would've ended up typing. Here it is again: https://onlinecourses.science.psu.edu/stat504/node/28. Regardless, you can safely assume for the purposes of this and future discussions that I understand MLE, how to set up a likelihood function, etc.

I think this is where we diverge. I've agreed several times that these probabilities are driven by mu and sigma. But the function is not closed-form. The PDF is, but the CDF isn't...it's a definite integration of the PDF from -infinity to x. So if you could please explain to me how to use that integral (i.e. the Normal CDF) in place of p such that I can maximize likelihood relative to mu and sigma, that would be fantastic. Or if there's a simpler way to do it, that would be even better. Either way, I hope you can see where the maximum likelihood function for a multinomial isn’t the piece of the puzzle that I’m missing.

See above. I fully realize that the ps are functions of mu and sigma. What I don’t yet understand – but am hoping you do – is how to employ the CDF of a Normal in the multinomial likelihood function.
Just use the CDF. There isn't a closed form for it but we do have a symbol for the function and software can evaluate the CDF. Just like how you might not be able to easily tell me what the square root of 23423421 is without using a calculator we can do the same thing for the normal CDF.

6. ## Re: parameter estimation based on observed survival probabilities

Originally Posted by Dason
Just use the CDF. There isn't a closed form for it but we do have a symbol for the function and software can evaluate the CDF. Just like how you might not be able to easily tell me what the square root of 23423421 is without using a calculator we can do the same thing for the normal CDF.
Yes, of course software can evaluate the CDF. Or, equivalently, I could look it up in a table in a back of a textbook. But earlier you were suggesting that I just incorporate this into a log-likelihood function. How does that software or that table help me here? I'll reflect your earlier suggestion back at you: try substituting the Normal CDF formula for the p in a multinomial likelihood function and you'll see what I mean.

7. ## Re: parameter estimation based on observed survival probabilities

You don't just put the CDF in as p. Plus you have more than one "p" since it's multinomial (you really need to quit pretending that it's binomial). You can use the CDF to get the values for each p though. If I told you mu=100 and sigma = 20 can you tell me the probability of an observation falling into each "bin"? Now instead of using actual numbers do the same thing but keep it as mu and sigma.

All the likelihood (or log-likelihood) is is just a function that given choices for your parameters it gives you some number. It doesn't matter if you have to evaluate a normal CDF or anything like that. But when you do an evaluation of the likelihood you have values for the parameters so it's perfectly possible to evaluate the corresponding normal CDF.

8. ## Re: parameter estimation based on observed survival probabilities

I am not sure what exactly OP is struggling with. Since post #7 OP response with the term "CDF", I think OP really has some understanding in MLE and in survival analysis as well.

Now the data have been binned - in other words we have censored data: interval, left, and right censoring. It seems that the problem of OP is that the likelihood is not in closed form, so there is no closed form solution for the MLE and OP does not like that.

IMHO:
1. If OP agrees to use MLE to estimate the parameters, you need to accept the fact there will be no closed form solution available and you have to use numerical method for the maximization.

2. If OP must need some kind of closed form solution, you may try to seek other estimation method. But I am not sure if there is a common known method to provide closed form for this problem.

9. ## The Following User Says Thank You to BGM For This Useful Post:

schleprock2 (10-28-2014)

10. ## Re: parameter estimation based on observed survival probabilities

Thanks BGM. I think maybe they were struggling with the fact that there isn't going to be a closed form solution to the MLE. I guess I didn't point out that there never explicitly has to be a closed form solution. The main goal is to explicitly write out the likelihood and once you do that you can use a computer to find a maximum.

11. ## Re: parameter estimation based on observed survival probabilities

Dragan recently showed this Taylor series approximation for the normal CDF (post #8). If it feels better and more concrete to be able to do a kind of pocket calculator computations.

12. ## The Following User Says Thank You to GretaGarbo For This Useful Post:

schleprock2 (10-28-2014)

13. ## Re: parameter estimation based on observed survival probabilities

Originally Posted by BGM
I am not sure what exactly OP is struggling with. Since post #7 OP response with the term "CDF", I think OP really has some understanding in MLE and in survival analysis as well.

Now the data have been binned - in other words we have censored data: interval, left, and right censoring. It seems that the problem of OP is that the likelihood is not in closed form, so there is no closed form solution for the MLE and OP does not like that.

IMHO:
1. If OP agrees to use MLE to estimate the parameters, you need to accept the fact there will be no closed form solution available and you have to use numerical method for the maximization.

2. If OP must need some kind of closed form solution, you may try to seek other estimation method. But I am not sure if there is a common known method to provide closed form for this problem.
Thanks BGM. You're correct that I was hoping to find a closed form solution to the problem. Instead, I was able to maximize the likelihood via numerical methods -- and, FWIW, using a binomial distribution rather than multinomial. The "brute force" approach isn't ideal for what I was hoping to do, but I think it is likely the best solution available.

14. ## Re: parameter estimation based on observed survival probabilities

I honestly don't know what you're doing when you're saying you used the binomial instead of the multinomial. It worries me as it implies you're doing something kind of wrong. You have more than two options so this is a multinomial problem. Why are you so against using the multinomial?!?

15. ## Re: parameter estimation based on observed survival probabilities

Originally Posted by Dason
I honestly don't know what you're doing when you're saying you used the binomial instead of the multinomial. It worries me as it implies you're doing something kind of wrong. You have more than two options so this is a multinomial problem. Why are you so against using the multinomial?!?
I hope you don't worry too much about it...I've tested the method and confirmed that it works, so confident I'm not doing anything wrong. I had already begun to set the problem up using the binomial distribution, so continued on that path. To see why it works, recognize that the likelihood I'm looking to maximize in the example from the original post is the product of 3 binomial distributions:

1st: 23 successes out of 100 trials, p1 = 1 - Phi((200 - mu)/sigma)
2nd: 9 successes out of 100 trials, p2 = 1 - Phi((300 - mu)/sigma)
3rd: 2 successes out of 100 trials, p3 = 1 - Phi((400 - mu)/sigma)

16. ## Re: parameter estimation based on observed survival probabilities

Originally Posted by schleprock2
I hope you don't worry too much about it...I've tested the method and confirmed that it works
From what you've written I don't think you have it quite right. It is possible that I'm interpretting the data incorrectly but from the way it's phrased I don't think I am. There are two different ways I can interpret the data you've given but regardless you end up with 4 different bins. "< 200", "between 200 and 300", "between 300 and 400", "greater than 400". You need to find the number of observations that fall into each of those bins. Then the first bin will get it's probability through one evaluation of the normal CDF, for the second and third bins you'll need to take the different between two evaluations of the normal CDF, and the last bin in the only one that should have the form "1 - Phi((something - mu)/sigma)".

I still don't get why you continually claim that you are using the binomial. If you think they're the same thing then just use the correct terminology because you have more than two outcomes so it isn't binomial. It's a restricted case of the multinomial where the probabilities for each bin are dictated by the normal distribution.

Page 2 of 2 First 1 2

 Tweet