# Endogeneity and R-squared

#### Deepn6

##### New Member
Hello. I have a time series regression equation with lagged independent variables. One of the independent variables is strongly endogenous with the dependent variable. I tried looking at statistics textbooks but could not find if given this situation, the adjusted R-squared is still valid. I know the coefficients are biased and even consistently biased. Its the explanatory of the model as a whole I am asking about.
The equation is one year ahead stock returns as a function of a few explanatory variables.
Many thanks for considering my request, DEEP.

#### hlsmith

##### Not a robit
If something was endogenuous to the DV that would mean the DV causes or is upstream of it right? Is this what you mean?

#### spunky

##### Doesn't actually exist
Well... it's not super difficult to show that for OLS regression models of the form $$Y= \beta_0 + \beta_1X_1 + \beta_2X_2 +...+\beta_pX_p + \epsilon$$, the $$R^{2}$$ statistic is a function of the elements of the correlation matrix of the predictors and the vector of correlations between the predictors and the dependent variable.

More specifically, $$R^{2} = r'_{x,y} \beta$$ where $$r_{x,y}$$ is the p X 1 vector of correlations of the predictors and the dependent variable and $$\mathbf{\beta}$$ is the vector of regression coefficients. If the regression coeffcients are biased,$$R^{2}$$ is most likely going to be biased as well.

#### Deepn6

##### New Member
If something was endogenuous to the DV that would mean the DV causes or is upstream of it right? Is this what you mean?
Yes.

#### Deepn6

##### New Member
Well... it's not super difficult to show that for OLS regression models of the form $$Y= \beta_0 + \beta_1X_1 + \beta_2X_2 +...+\beta_pX_p + \epsilon$$, the $$R^{2}$$ statistic is a function of the elements of the correlation matrix of the predictors and the vector of correlations between the predictors and the dependent variable.

More specifically, $$R^{2} = r'_{x,y} \beta$$ where $$r_{x,y}$$ is the p X 1 vector of correlations of the predictors and the dependent variable and $$\mathbf{\beta}$$ is the vector of regression coefficients. If the regression coeffcients are biased,$$R^{2}$$ is most likely going to be biased as well.

#### Deepn6

##### New Member
But only the endogenous explanatory variable is biased. Is it possible that this biases another variable the other way, leaving the R squared roughly unaffected?

#### spunky

##### Doesn't actually exist
But only the endogenous explanatory variable is biased. Is it possible that this biases another variable the other way, leaving the R squared roughly unaffected?
True. since $$R^{2}$$ is a linear combination of correlations and (standardized) regression coefficients, the bias it experiences depends on the size of the coefficients as well as how many of them are biased and in which direction we would observe the bias.

However, the key point is that we don't always know that for sure, right? So it could or could not be a problem, but there is no way of knowing this unless you both have population-level data **and** the necessary instruments to remove endogeneity and then assess where the biases are coming from and in which direction they are moving.

#### Deepn6

##### New Member
I think you are right. But many research papers just report an R-squared or any changes when there are endogenous variables.
Many thanks.
PS This maybe something for an advanced econometrics paper...