Regression with non-stationarity


Fortran must die
This issue is big in time series, but recently I began to wonder about its application to all regression. First, I should note that our data (and I suspect many others) are non-stationary most of the time. And various authors assert that in the presence of non-Stationarity (or seasonality) regression will generate invalid results -so called spurious regression. This is not an issue primarily of autocorrelation disrupting statistical tests or widening the CI. Its an issue where the slopes themselves may be invalid. Dividing this into time series / not time series makes little sense to me. Much (possibly most) data in not going to contain data points gathered at just one point in time - so all data is likely to be influenced by non-Stationarity if it exists.

Yet Stationarity is not one of the assumptions of linear regression. And generally speaking this issue seems to not be considered in the non-time series literature. So is it a problem, bias in the data tied to non-Stationarity, or not? And what is the solution if it exists? No method I have found in the time series world works well with non-Stationarity. They are complex, judgmental, difficult to analyze slopes, and take a lot of time.

Is it true that it does not matter if the variables are stationary, only the residuals?


Fortran must die
A related problem is what happens if you dependent variable is integrated of order 1[requiring one level of differencing] one of your predictors is ordered as a level one, but a second is differenced of order 2 (requiring two levels of differencing to be stationary).

Do you difference all the variables to they are stationary? If you do is the interpretation of the slope different for those you differenced once than twice?

Also it is argued that when you difference you should not have an intercept. But if you do this how can you interpret dummy variables from categorical variables [that rely on the intercept to be interpreted when you have more than two levels of the categorical variable}?

"However, notice that the constant term α disappears when we take differences. Because α affects all values of y in the same way, taking the difference eliminates it from the equation. When performing a regression in differences, we generally want to remove the constant term. Including a constant in the differenced equation would be equivalent to having a timetrend in the original “levels” equation." p 61


Fortran must die
This is an example of what drives me (a non-statistician) crazy. Two authors who seem to totally disagree.

Author one

"In general, regression models for non-stationary variables give spurious results. Only exception is if [there is cointegration, in this case the residuals will be stationary].


I think the above author is milhoj

Another author says....
"With one regressor, the order of integration of y and x must match for the specification to make economic sense. With more than one regressor and an integrated dependent variable, it is possible to have a mixture of integrated and stationary regressors. For example, we could add some (stationary) dummy variables to a regression with integrated y and x."

I don't have that link handy...

So who is correct? I am sure it will depend (On something I lack expertise on) :(