If a compound distribution is made up of a gamma dist with parameters (a,b), and b is gamma-distributed so that its parameters are (c,d), such that I have a compound distribution -> f(a,c,d) (i.e. b has been marginalised out).

My question is:

If I am using the log-likelihood of the compound dist and running optimization code on this function to estimate the parameters for a,c,d then are the values I get for a,c,d equivalent to the values I would get for a,c,d if I solved for (a,b) and (c,d) using their original gamma functions? I am asking because the compound function does not solve using standard optimization algorithms, but the individual gamma functions do, so I was wondering why one would use the compound dist in the first place? I was not sure how to go about proving this mathematically.

Thanks for your responses!