Dear all,

I need help with understanding the approach of so called "Multi-Stage modeling". I encountered a paper/research related to comparison of regression models for LGD estimation (a characteristic for LGD is that it is from interval <0;1>). The paper:

Since we observe a lot of observation with LGD = 0 and LGD = 1 (see distribution on page 3) it is recommended to use so called "Multi-Stage Modeling" or "Two-Stage Modeling". It means that firstly you perform a logistic regression to separate "No Recovery/No Loss" cases from "Some Recovery/Some Loss" cases for full sample (this is that I read). And then you perform a simple OLS (or other method) only for observation with LGD from interval (0,1) a then you multiply probability from the first logistic regression with LGD estimation from OLS. Actually this is the problem I don't understand properly.

Please, have a look at the attached PDF on page 13 where you can find a decision tree approach (multi-stage approach). After that, on the page 14 you can find a formula for calculation estimated LGD.

Could anybody please explain me how to perform such model in practice? Why they are talking about 2 logistic models?

Isn't it sufficient to estimate only one logistic regression? To separate observation with LGD = 0 and observation with LGD > 0 for the full sample and then perform OLS only for observation with LGD in interval (0;1)?

If it is sufficient to do it in that way, what would be the formula for calculation the definite LGD?

Please, I really need help with this issue. I appreciate every suggestion.

Thank you very much.