The joint normality of the two estimates for linear regression model

#1
I have a different opinion with my colleagues on this question. Could someone help us to clarify this issue?

\(
Y = X_1\beta_1 + X_2\beta_2 + \epsilon
\)
where \(X_1\), \(X_2 \in R^{n\times 1}\), \(\beta=\bordermatrix{&\cr&\beta_1\cr&\beta_2}\) and \(\epsilon \in R^{n\times1} \sim N(0, \sigma^2I_{n\times n})\), \(\epsilon_i\) is iid. The estimate \(\hat{\beta}\) of \(\beta\) is \(\hat{\beta}=(X^{'}X)^{-1}X^{'}Y\), \(X=(X_1,X_2)\). Now we suggest another linear model,
\(
Y=X_2\gamma+\eta
\)
where \(Y\) and \(X_2\) are same as above and \(\eta \sim N(0,\tau^2I_{n\times n})\).
For this linear model, we have the estimate \(\hat{\gamma}\) of \(\gamma\) is \(\hat{\gamma}=(X_2^{'}X_2)^{-1}X_2^{'}Y\). We know that \(\hat{\beta}\) and \(\hat{\gamma}\) are normal respectively. Now the question is that whether the joint distribution of \(\hat{\beta}\) and \(\hat{\gamma}\) is normal as well?
 
Last edited:

BGM

TS Contributor
#2
Assumptions and well-known facts:

1. \( \epsilon_i \) are i.i.d. as \( \sim \mathcal{N}(0, \sigma^2), i = 1, 2, \ldots, n \)

2. \( X_{1i}, X_{2i} \) are constants.

Therefore \( Y_i = X_{1i}\beta_1 + X_{2i}\beta_2 + \epsilon_i \sim \mathcal{N}(X_{1i}\beta_1 + X_{2i}\beta_2, \sigma^2) \) and they are independent

Most importantly, \( (Y_1, Y_2, \ldots, Y_n) \) jointly follows a multivariate normal distribution as they are independent.

As \( \hat{\beta} \) is an affine transformation of the multivariate normal vector, it has a normal distribution.

The question now comes:

What is \( \eta \)? Is it defined to be like \( \epsilon + X_1\beta_1 \) which has non-zero mean?

The variables \( Y \) "in the second model" is another independent sample or what?

Are \( \eta \) and \( \epsilon \) independent?
 

Dason

Ambassador to the humans
#4
BGMs concerns still hold though. You can't just say that Y and X2 are the same as above and then claim that \(\eta\) is normally distributed with a mean of 0.
 
#5
Could we just ignore the contradiction between these two models and just talk about the normality of these two estimates? Actually, this happens in practice. The first model is regarded as a full model and the second model is misspecified (one ore more covariates are omitted from the model). The is because, sometimes, you fail to observe all the covariates in the model.
 

BGM

TS Contributor
#6
So let say the first model is the "true" model. Then as what you written,

\( \begin{bmatrix} \hat{\beta} \\ \hat{\gamma} \end{bmatrix} =
\begin{bmatrix} (X^TX)^{-1} X^T \\ (X_2^TX_2)^{-1} X_2^T \end{bmatrix}\mathbf{Y} \)

As long as \( \mathbf{Y} \) is a multivariate normal vector, and those \( X \) are constants, then the above estimators are just affine transformation of \( \mathbf{Y} \) and thus again they are jointly distributed as a multivariate normal distribution. Of course you have to ensure the transformation has full rank.
 
#7
So let say the first model is the "true" model. Then as what you written,

\( \begin{bmatrix} \hat{\beta} \\ \hat{\gamma} \end{bmatrix} =
\begin{bmatrix} (X^TX)^{-1} X^T \\ (X_2^TX_2)^{-1} X_2^T \end{bmatrix}\mathbf{Y} \)

As long as \( \mathbf{Y} \) is a multivariate normal vector, and those \( X \) are constants, then the above estimators are just affine transformation of \( \mathbf{Y} \) and thus again they are jointly distributed as a multivariate normal distribution. Of course you have to ensure the transformation has full rank.
We can't say the transformation has full rank, because
\( \begin{bmatrix} (X^TX)^{-1} X^T \\ (X_2^TX_2)^{-1} X_2^T \end{bmatrix}\mathbf{Y}= \)
\(
\begin{align*}
\begin{pmatrix}\hat{\beta}\\\hat{\gamma}\end{pmatrix}&=\begin{pmatrix}(X^{'}X)^{-1}X^{'}\\ (X_2^{'}X_2)^{-1}X_2^{'}\end{pmatrix}Y\\
&=\begin{pmatrix}&\cr&\begin{pmatrix}\begin{pmatrix}X_1^{'}\\ X_2^{'}\end{pmatrix}(X_1,X_2) \end{pmatrix}^{-1}\begin{pmatrix}X_1^{'}\\ X_2^{'}\end{pmatrix}\cr&(X_2^{'}X_2)^{-1}X_2^{'}\end{pmatrix}Y\\
&=\begin{pmatrix}&\cr&\begin{pmatrix}X_1^{'}X_1&X_1^{'}X_2\\X_2^{'}X_1 &X_2^{'}X_2\end{pmatrix}^{-1}\begin{pmatrix}X_1^{'}\\ X_2^{'}\end{pmatrix}\cr&(X_2^{'}X_2)^{-1}X_2^{'}\end{pmatrix}Y\\
&=\begin{pmatrix}&\cr&\begin{pmatrix}X_1^{'}X_1&X_1^{'}X_2\\X_2^{'}X_1 &X_2^{'}X_2\end{pmatrix}^{-1}\cr&\begin{matrix}0&(X_2^{'}X_2)^{-1}\end{matrix}\end{pmatrix}\begin{pmatrix}X_1^{'}\\ X_2^{'}\end{pmatrix}Y\\
\begin{matrix}0&(X_2^{'}X_2)^{-1}\end{matrix}
\end{align*}

\)
 
#8
So let say the first model is the "true" model. Then as what you written,

\( \begin{bmatrix} \hat{\beta} \\ \hat{\gamma} \end{bmatrix} =
\begin{bmatrix} (X^TX)^{-1} X^T \\ (X_2^TX_2)^{-1} X_2^T \end{bmatrix}\mathbf{Y} \)

As long as \( \mathbf{Y} \) is a multivariate normal vector, and those \( X \) are constants, then the above estimators are just affine transformation of \( \mathbf{Y} \) and thus again they are jointly distributed as a multivariate normal distribution. Of course you have to ensure the transformation has full rank.
We can't say the transformation has full rank, because
\( \begin{pmatrix} (X^TX)^{-1} X^T \\ (X_2^TX_2)^{-1} X_2^T \end{pmatrix}\mathbf{Y}= \)
\(
=\begin{pmatrix}&\cr&\begin{pmatrix}\begin{pmatrix}X_1^{'}\\ X_2^{'}\end{pmatrix}(X_1,X_2) \end{pmatrix}^{-1}\begin{pmatrix}X_1^{'}\\ X_2^{'}\end{pmatrix}\cr&(X_2^{'}X_2)^{-1}X_2^{'}\end{pmatrix}Y
\)
\(
=\begin{pmatrix}&\cr&\begin{pmatrix}X_1^{'}X_1&X_1^{'}X_2\\X_2^{'}X_1 &X_2^{'}X_2\end{pmatrix}^{-1}\begin{pmatrix}X_1^{'}\\ X_2^{'}\end{pmatrix}\cr&(X_2^{'}X_2)^{-1}X_2^{'}\end{pmatrix}Y
\)
\(
=\begin{pmatrix}&\cr&\begin{pmatrix}X_1^{'}X_1&X_1^{'}X_2\\X_2^{'}X_1 &X_2^{'}X_2\end{pmatrix}^{-1}\cr&\begin{matrix}0&(X_2^{'}X_2)^{-1}\end{matrix}\end{pmatrix}\begin{pmatrix}X_1^{'}\\ X_2^{'}\end{pmatrix}Y
\)

We know that the \(\begin{pmatrix}X_1^{'}\\ X_2^{'}\end{pmatrix}Y \) is joint normal, because \(Y\) is independently normal and \(\begin{pmatrix}X_1^{'}\\ X_2^{'}\end{pmatrix}\) has full row rank. But, the \(\begin{pmatrix}&\cr&\begin{pmatrix}X_1^{'}X_1&X_1^{'}X_2\\X_2^{'}X_1 &X_2^{'}X_2\end{pmatrix}^{-1}\cr&\begin{matrix}0&(X_2^{'}X_2)^{-1}\end{matrix}\end{pmatrix}\) has not full row rank, because the \( \begin{pmatrix}X_1^{'}X_1&X_1^{'}X_2\\X_2^{'}X_1 &X_2^{'}X_2\end{pmatrix}^{-1} \) has full rank.
Does this mean that \(\begin{pmatrix} \hat{\beta} \\ \hat{\gamma} \end{pmatrix}\) is not joint normal?
 
Last edited:

BGM

TS Contributor
#9
I give you a simple example. Suppose \( \begin{bmatrix} Y_1 \\ Y_2 \end{bmatrix}\) is a multivariate normal random vector.

Next consider the random vector

\( \begin{bmatrix} aY_1 + bY_2 \\ cY_1 + dY_2 \end{bmatrix}\) where \( a, b, c, d \) are constants.

So will you say that the joint distribution of this vector does not exist, when \( (a, b) \) and \( (c, d) \) are not linearly independent?

In other words, when the rank of the transformation matrix is 1, it is a degenerate one. When it has a full rank of 2, you still have a multivariate normal random vector.
 
#10
I give you a simple example. Suppose \( \begin{bmatrix} Y_1 \\ Y_2 \end{bmatrix}\) is a multivariate normal random vector.

Next consider the random vector

\( \begin{bmatrix} aY_1 + bY_2 \\ cY_1 + dY_2 \end{bmatrix}\) where \( a, b, c, d \) are constants.

So will you say that the joint distribution of this vector does not exist, when \( (a, b) \) and \( (c, d) \) are not linearly independent?

In other words, when the rank of the transformation matrix is 1, it is a degenerate one. When it has a full rank of 2, you still have a multivariate normal random vector.
This is the point. Can we say \( \begin{pmatrix}Z_1\\Z_2 \end{pmatrix}=\begin{pmatrix} aY_1 + bY_2 \\ cY_1 + dY_2 \end{pmatrix}\) is bivariate normal, when \( (a, b) \) and \( (c, d) \) are linearly dependent?
For example, we take \(a=c=1\) and \(c=d=0\). In this case, \( (a, b) \) and \( (c, d) \) are linearly dependent.
Since,
\( \begin{pmatrix}Z_1\\Z_2 \end{pmatrix}=\begin{pmatrix} aY_1 + bY_2 \\ cY_1 + dY_2 \end{pmatrix}\)
\( \begin{pmatrix}Z_1\\Z_2 \end{pmatrix}=\begin{pmatrix} a&b\\c&d\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\end{pmatrix}\)
\( \begin{pmatrix}Z_1\\Z_2 \end{pmatrix}=\begin{pmatrix} 1&0\\1&0\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\end{pmatrix}\)
\( Z_1=Y_1 \) and \( Z_2=Y_1\). You mean \( \begin{pmatrix}Z_1\\Z_2 \end{pmatrix} is normal?
\( F(z_1,z_2)=P(Z_1<z_1 \cap Z_2<z_2) \)
\( F(z_1,z_2)=P(Y_1<z_1 \cap Y_1<z_2) \)
\( F(z_1,z_2)=P(Y_1<min(z_1,z_2)) \)
\( F(z_1,z_2)=\phi(Y_1<min(z_1,z_2)) \) , where \(\phi\) is normal distribution function.
Can we say \( F(z_1,z_2) \) is a normal distribution?\)
 
#11
I give you a simple example. Suppose \( \begin{bmatrix} Y_1 \\ Y_2 \end{bmatrix}\) is a multivariate normal random vector.

Next consider the random vector

\( \begin{bmatrix} aY_1 + bY_2 \\ cY_1 + dY_2 \end{bmatrix}\) where \( a, b, c, d \) are constants.

So will you say that the joint distribution of this vector does not exist, when \( (a, b) \) and \( (c, d) \) are not linearly independent?

In other words, when the rank of the transformation matrix is 1, it is a degenerate one. When it has a full rank of 2, you still have a multivariate normal random vector.
This is the point. Can we say \( \begin{pmatrix}Z_1\\Z_2 \end{pmatrix}=\begin{pmatrix} aY_1 + bY_2 \\ cY_1 + dY_2 \end{pmatrix}\) is normal, when \( (a, b) \) and \( (c, d) \) are linearly dependent?
For example, \(a=c=1, b=d=0\), in this case, \( (a, b) \) and \( (c, d) \) are linearly dependent.
Since,
\( \begin{pmatrix}Z_1\\Z_2 \end{pmatrix}=\begin{pmatrix} aY_1 + bY_2 \\ cY_1 + dY_2 \end{pmatrix}\)
\( \begin{pmatrix}Z_1\\Z_2 \end{pmatrix}=\begin{pmatrix} a&b\\c&d\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\end{pmatrix}\)
\( \begin{pmatrix}Z_1\\Z_2 \end{pmatrix}=\begin{pmatrix} 1&0\\1&0\end{pmatrix}\begin{pmatrix} Y_1\\Y_2\end{pmatrix}\)
\( Z_1=Y_1 \) and \( Z_2=Y_1\). You mean \( \begin{pmatrix}Z_1\\Z_2 \end{pmatrix}\) is normal?
\( F(z_1,z_2)=P(Z_1<z_1 \cap Z_2<z_2) \)
\( F(z_1,z_2)=P(Y_1<z_1 \cap Y_1<z_2) \)
\( F(z_1,z_2)=P(Y_1<min(z_1,z_2)) \)
\( F(z_1,z_2)=\phi(min(z_1,z_2)) \) , where \(\phi\) is normal distribution function of \(Y_1\).
Can we say \( F(z_1,z_2) \) is a normal distribution?
 
Last edited:

BGM

TS Contributor
#12
As I posted before, I think degenerate Multivariate normal distribution is a proper name to describe this. The nomenclature alone is not important - as long as you know the "dimension" of the resulting vector it should be fine.

Some will say a constant value has a degenerate distribution. So you may think that a constant and a random variable has a degenerate joint distribution, but that maybe odd to other people.
 
#13
As I posted before, I think degenerate Multivariate normal distribution is a proper name to describe this. The nomenclature alone is not important - as long as you know the "dimension" of the resulting vector it should be fine.

Some will say a constant value has a degenerate distribution. So you may think that a constant and a random variable has a degenerate joint distribution, but that maybe odd to other people.
Thanks for your reply. But I am a little confused about this degenerate Multivariate normal distribution. Is it still a normal distribution or not? For me, I think it is not normal. Because the bivariate normal distribution can be expressed as
\(
f(x,y)=\frac{1}{2\pi\sigma_x\sigma_y\sqrt{1-\rho^2}}exp(-\frac{1}{2}(x-\mu)\Sigma^{-1}(x-\mu))
\)
where \(\Sigma=\begin{pmatrix} \sigma_x^2&\rho\sigma_x\sigma_y\\\rho\sigma_x\sigma_y&\sigma_y^2 \end{pmatrix} \)
I don't think \(F(z_1,z_2)=\phi(min(z_1,z_2)) \) can be expressed like the above formula.
 
Last edited:

BGM

TS Contributor
#14
Sorry I think we are either off topic or lost the focus.

Again I can give a even more simple example: Let \( Y \) be a univariate normal random variable. Consider the normal random vector

\( \begin{bmatrix} Y \\ Y \end{bmatrix} \)

Actually you know the behavior of this random vector very well - if you consider it as the coordinates on the \( x-y \) plane, then it represents a point normally distributed on the line \( x = y \). So you know how to simulate/generate it. You have all the information related to this random vector. You can describe it to the other people. You can even try to give a name like a "degenerate" multivariate normal. Yet you cannot write down a joint pdf for it (of course that is natural as it is degenerate. In multivariate normal case, you may think that is due to the variance-covariance is not invertible as it does not have a full rank) . Is it the only trouble? I do not think it bother your study about this object.