The usual way to show that \( \sqrt{n} \left( \hat{\theta}-\theta_0 \right) \xrightarrow{D} N \left( 0, \frac{1}{I \left( \theta_0 \right)} \right) \) is to expand \( l \prime \left( \theta \right) \) , the log likelihood, into a Taylor series of order 2 about \( \theta_0 \) and evaluate it at \( \hat{\theta_n} \).

Doing so and rearranging we

\( \sqrt{n} \left( \hat{\theta}-\theta_0 \right) = \frac{n^{-1/2} l \prime \left( \theta_0 \right)}{-n^{-1} l \prime \prime \left( \theta_0 \right) - \left(2n \right)^{-1} \left( \hat{\theta_n} -\theta_0 \right) l \prime \prime \prime \left(\theta^{*} _n \right)} \)

where \( \theta* _n \) is between \( \theta_0 \) and \( \theta_n \), i.e. we use the lagrangian form of the remainder.

The usual asymptotic laws are at work in the numerator and the denominator and everything is fine for me for the most part. What I need your help on, is understanding how we bound in probability the second term in the denominator. I will present the way my book does it and you can advise me on how to better digest the method.

We assume the pdf is three times differentiable as a function of \( \theta \) and require that it is uniformly bounded in a neighborhood of \( \theta_0 \) by a function that is independent of \( \theta \) , i.e.

\( \left| \frac{\partial^3 log f \left( x; \theta \right)}{\partial \theta^3} \right| \leq M\left( x \right) \)

with \( E_{\theta_0} \left[ M \left( X \right) \right] < \infty \) for all \( \theta_0 -c < \theta< \theta_0 +c \) and all \( x \) in the support of \( X \).

Now under these assumptions, \( | \theta^{*}_n - \theta_0 | < c_0 \) and therefore \( \left| - \frac{1}{n} l \prime \prime \prime \left( \theta^{*} _n \right) \right| \leq \frac{1}{n} \sum_{i=1}^{n} M \left( X_i \right) \).

But \( \sum_{i=1}^{n} M \left( X_i \right) \xrightarrow{P} E_{\theta_0} \left[ M \left( X \right) \right] \) by the WLLN. For the bound then they select \( 1+E_{\theta_0} \left[ M \left( X \right) \right] \). Then let \( \epsilon>0 \) be given and choose \( N_1, N_2 \) so that

\( n\geq N_2 \Rightarrow P \left[ \left| \hat{\theta_n} -\theta_0 \right| < c_0 \right] \geq 1-\frac{\epsilon}{2} \)

\( n \geq N_1 \Rightarrow P \left[ \left| \frac{1}{n} \sum _{i=1}^{n} M \left( X_i \right) -E_{\theta_0} \left[ M \left( X \right) \right] \right| <1 \right] \geq 1-\frac{\epsilon}{2} \)

Finally,

\( n\geq max \{ N_1 ,N_2 \} \Rightarrow P \left[ \left| -\frac{1}{n} l \prime \prime \prime \left( \theta^{*}_n \right) \right| \leq 1+ E_{\theta_0} \left[ M \left( X \right) \right] \right] \geq 1-\frac{\epsilon}{2} \)

hence \( n^{-1} l \prime \prime \prime \left( \theta^{*}_n \right) \) is bounded in probability. That concludes the theorem.

My question is, how exactly, is the last probability derived, in relationship with the information we have? It seems a few steps are skipped at the most crucial point. I understand that since both \( \hat{\theta_n} \) and \( \frac{1}{n} \sum_{i=1}^n M \left( X_i \right) \) converge in probability, they are bounded in probability but from there I cannot derive the last result. Any insight is greatly appreciated. Thank you.