Proof of constistency of Maximum Likelihood Estimators (MLE)

#1
Hi all,

I would appreciate some help comprehending a logical step in the proof below about the consistency of MLE. It comes directly from Introduction to Mathematical Statistics by Hogg and Craig and it is slightly different than the standard intuitive one that makes use of the Weak Law of Large Numbers.

So here goes:

Assume that \( \hat{\theta_n} \) solves the estimating equation \( \frac{\partial l(\theta)}{\partial \theta}=0 \). We also assume the usual regularity conditions. Denote \( \theta_0 \) the true parameter which by assumption is an interior point of some set \( \Omega \). Then \( \hat{\theta_n} \xrightarrow{P} \theta_0 \)

Proof

Let \( \mathbf{X}=(x_1,x_2, \ldots, {x_n}) \), the vector of observations. Since \( \theta_0 \) is an interior point in \( \Omega \) , \( (\theta_0 -a, \theta_0 +a) \subset \Omega \) for some \( a>0 \). Define \( S_n \) to be the event:

\(S_n= \{ \mathbf {X} : l(\theta_0 ; \mathbf{X}) > l(\theta_0 -a ; \mathbf{X}) \}
\cap \{ \mathbf{X}: l(\theta_0; \mathbf{X}) > l( \theta_0 +a ;\mathbf{X}) \} \)

But on \( S_n \) , \( l (\theta ) \) has a local maximum. \( \hat{\theta_n} \) such that \( \theta_0-a<\hat{\theta_n}<\theta_0+a \) and \( l^{\prime} \left(\hat{\theta_n} \right)=0 \) .

That is:

\(S_n \subset \{ \mathbf{X}: | \hat{ \theta_{n} } \left( \mathbf{X} \right) -\theta_{0} | < a \} \cap \{ \mathbf{X}: l^{ \prime} \left( \hat{\theta_n} \left( \mathbf{X} \right) \right) =0 \} \)

It is precisely at this point that I find their proof a little obscure. How come they consider \( S_n \) a subset of that othet set?Their explanation is unclear.

Of course the proof is not complete at this point but if I have this clarified, I can take it from there. Thank you in advance .
 
Last edited:

BGM

TS Contributor
#2
If an event \( E \) happens implies another event \( F \) happens,

then in set notation \( E \subseteq F \)
 

BGM

TS Contributor
#4
Let me try to explain.

First note that an event is a subset of the sample space \( \Omega \), containing the elements \( \omega \).

A real-valued random variable (vector) \( \mathbf{X}(\omega) \) is a measurable function mapping from \( \Omega \) to \( \mathbf{R}^n \). For simplicity, or whatever reason people usually suppress the argument and just write \( \mathbf{X} \) (just treat this as an object in functional analysis sense), unless someone want to emphasize it (like me).

So \( \mathbf{X} \) is random because of it can have different choices of argument \( \omega \). Now if you fix an \( \omega \) inside the event \( S_n \), then we can just treat \( \mathbf{X} \) as the usual deterministic real number (vector).

By the definition of \( S_n \), fix an \( \omega \in S_n \),

the likelihood function at the true parameter \( l(\theta_0;\mathbf{X}(\omega)) \) is larger than the one at the boundary of the \( a \)-neighborhood of \( \theta_0 \). Under some regularity condition, say continuous and differentiable, one can apply intermediate value theorem and Rolle's theorem to claim that there exist a local, interior extrema. The choice of \( S_n \) ensure that at least \( \theta_0 \) is a possible candidate for the local maxima thus guarantee the existence of interior, local maximum inside the neighborhood.

Therefore we can define \( \hat{\theta}_n(\mathbf{X}(\omega)) \) to be one of the interior, local maximum (if it is a convex function then it has to be unique).

We know from elementary calculus lesson that the derivative at the critical point is \( 0 \) whenever the derivative exist. So the second condition

\( l'(\hat{\theta}_n(\mathbf{X}(\omega))) = 0 \)

is obviously met.

The first condition is also very intuitive: by definition \( \hat{\theta}_n(\mathbf{X}(\omega)) \) is an interior point of the neighborhood. Therefore the distance from the centre \( \theta_0 \) must be smaller than the radius \( a \).

Mathematically, we can write

\( \theta_0 - a < \hat{\theta}_n < \theta_0 + a \)

\( \iff -a < \hat{\theta}_n - \theta_0 < a \)

\( \iff |\hat{\theta}_n - \theta_0| < a \)
 
#5
Let me try to explain.

First note that an event is a subset of the sample space \( \Omega \), containing the elements \( \omega \).

A real-valued random variable (vector) \( \mathbf{X}(\omega) \) is a measurable function mapping from \( \Omega \) to \( \mathbf{R}^n \). For simplicity, or whatever reason people usually suppress the argument and just write \( \mathbf{X} \) (just treat this as an object in functional analysis sense), unless someone want to emphasize it (like me).

So \( \mathbf{X} \) is random because of it can have different choices of argument \( \omega \). Now if you fix an \( \omega \) inside the event \( S_n \), then we can just treat \( \mathbf{X} \) as the usual deterministic real number (vector).

By the definition of \( S_n \), fix an \( \omega \in S_n \),

the likelihood function at the true parameter \( l(\theta_0;\mathbf{X}(\omega)) \) is larger than the one at the boundary of the \( a \)-neighborhood of \( \theta_0 \). Under some regularity condition, say continuous and differentiable, one can apply intermediate value theorem and Rolle's theorem to claim that there exist a local, interior extrema. The choice of \( S_n \) ensure that at least \( \theta_0 \) is a possible candidate for the local maxima thus guarantee the existence of interior, local maximum inside the neighborhood.

Therefore we can define \( \hat{\theta}_n(\mathbf{X}(\omega)) \) to be one of the interior, local maximum (if it is a convex function then it has to be unique).

We know from elementary calculus lesson that the derivative at the critical point is \( 0 \) whenever the derivative exist. So the second condition

\( l'(\hat{\theta}_n(\mathbf{X}(\omega))) = 0 \)

is obviously met.

The first condition is also very intuitive: by definition \( \hat{\theta}_n(\mathbf{X}(\omega)) \) is an interior point of the neighborhood. Therefore the distance from the centre \( \theta_0 \) must be smaller than the radius \( a \).

Mathematically, we can write

\( \theta_0 - a < \hat{\theta}_n < \theta_0 + a \)

\( \iff -a < \hat{\theta}_n - \theta_0 < a \)

\( \iff |\hat{\theta}_n - \theta_0| < a \)
Thank you very much for going at length to give a concise response.

If I understood correctly \( S_n \) is a subset of the othet set because the construction of \( S_n \) implies the existence of a local maximum, which we know to be interior and therefore the derivative has to be zero.

I will have to do some more thinking about the chain of events but I think I understand the basic idea now.

Thanks again.
 
Last edited: