# Thread: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

1. ## 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

Hello dear forum members!

My univariate multiple regression model includes 5 predictors + 2 interaction terms. However, examination of the fitted values plots revealed one predictor having a curvilinear relationship with the DV (that predictor is also a theory-based moderator). Ramsey regression specification-error test (RESET) rejected null that there are no omitted variables, thus I included a squared term for the non-linear predictor in the equation.

The model has then passed the RESET test. Moreover, the residual Q-Q plot (and JB normality test) has improved greatly after the inclusion of the squared term.

My question is - can I proceed with OLS estimation of the coefficients having these non-linear terms in the equation? What would be the proper way of addressing this issue (considering that all other predictors have linear relationship with the DV)?

2. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

If you can transform the variable to make it linear you certainly can. Or model the variable as you did [adding a non-linear term a quadratic if I understood what you did]. It is done all the time.

Ultimately the answer depends on whether the variable that has a non-linear relationship is inherently non-linear or can be made linear [transformed to be linear]. Commonly this depends on (I believe) on whether the X or the slope is non-linear.

3. ## The Following User Says Thank You to noetsi For This Useful Post:

kiton (01-22-2015)

4. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

Would you consider this graph to show a linear relationship - this is log transformed - see attachment.

Thank you.

5. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

Yes, it seems you can move forward. Is your purpose to define the Y variable or attempt to predict it?

Take Noetsi's comments into consideration. If you keep the current term in the model, you just need to make sure you accurately explain it. The linearity in the model is based on the linear combination of model terms (vector spaces), and as you seem to already know the normality assumption is on the model residuals.

Did you also keep the original non-squared version of the variable in the model as well?

6. ## The Following User Says Thank You to hlsmith For This Useful Post:

kiton (01-22-2015)

7. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

I would say that the graph reflects a monotonic curve [which is formally non-linear I believe]. I would think a quadratic term would model that although a lowess is by definition non-parametric so I am unsure of its usage here.

One way to know if an equation is non-linear is to specify a non-linear term and see if it is statistically signficant. If it is not than that supports the view that the model is linear [although it could be you added the wrong non-linear term] One way to check for non-linearity that is fairly simple is Box Tidwell. This helps determine if non-linearity is suggested by the data. I tried to find a link, my experience with it is from books.

I played around with General Additive Models for a while to address non-linearity. It appears to me to be an excellent approach to this, if for nothing else than the diagnostic elements it adds. But it is far from simple and in the end there were elements I simply failed to grasp.

8. ## The Following User Says Thank You to noetsi For This Useful Post:

kiton (01-22-2015)

9. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

Yeah, but if the model is fitting fine with the exception of one variable not having a straight line relationship, though this can be addressed with a transformation (squared term), you are probably fine moving forward as mentioned earlier.

10. ## The Following User Says Thank You to hlsmith For This Useful Post:

kiton (01-22-2015)

11. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

I agree with that. To me what is simplest is to simply do Box Tidwel and try to transform the variables. Then see if the transformation or adding non-linear terms works [through seeing if the new term is significant].

I am not sure how to determine if a model with a linear term is better than one that has been transformed to be linear or by adding a non-linear term such as a quadratic. Because of the nature of R squared you can not use that I would think [since it only looks a linear explained variance]?

AIC maybe?

12. ## The Following User Says Thank You to noetsi For This Useful Post:

kiton (01-22-2015)

13. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

Originally Posted by hlsmith
Yes, it seems you can move forward. Is your purpose to define the Y variable or attempt to predict it?

Take Noetsi's comments into consideration. If you keep the current term in the model, you just need to make sure you accurately explain it. The linearity in the model is based on the linear combination of model terms (vector spaces), and as you seem to already know the normality assumption is on the model residuals.

Did you also keep the original non-squared version of the variable in the model as well?
The purpose of the study is to predict Y. On of the key problems is that the variables are not normal (log transformation does not solve the problem), so I am building the argument on the paper by Williams, Grajales, and Kurkiewicz (2013) and specify my model in accordance with the best fitted residuals.

I surely did keep the original non-squared version of the variable in the model.

I wonder though, in terms of proper justification, what is better: (a) modeling curvilinear relationship as X+X^2, or (b) ln(X) ?

Thank you very much for the feedback, hlsmith

14. ## The Following 2 Users Say Thank You to kiton For This Useful Post:

GretaGarbo (01-23-2015), hlsmith (01-22-2015)

15. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

Originally Posted by noetsi
I agree with that. To me what is simplest is to simply do Box Tidwel and try to transform the variables. Then see if the transformation or adding non-linear terms works [through seeing if the new term is significant].

I am not sure how to determine if a model with a linear term is better than one that has been transformed to be linear or by adding a non-linear term such as a quadratic. Because of the nature of R squared you can not use that I would think [since it only looks a linear explained variance]?

AIC maybe?
I will surely explore the suggested Box Tidwell, thank you for suggestion.

I did run the model comparison using global F test and also R squared difference (as suggested by Aiken and West, 1991). Both test are in favor of a model with X+X^2 modeling.

Also, attached is a Q-Q residual plot that I include to justify the final model specification. The saved residuals passed the Shapiro-Wilk and Shapiro-Stefania normality tests. However, they ALMOST passed the Jarque-Bera test - which I heard is the most robust of the three (p=.044).

Other assumptions:

- Multicollinearity - NO;
- Exogeniety - NO;
- Heteroskedasticity - YES, addressing that by using robust SE for heteroskedastic data (vce(hc3) in STATA);
- RESET - OK.

I sincerely appreciate your feedback, Sir.

16. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

You do have some outliers in the upper tail. I would suggest a skew and kurtosis test. If you have extra time on your hand you can run one of the many test of influence such as Cooks d or DFBETA for the impact of outliers on your data.

Which robust SE did you use? White's?

In comparing models I think the one most recomended is AIC.

17. ## The Following User Says Thank You to noetsi For This Useful Post:

kiton (01-22-2015)

18. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

Originally Posted by kiton
so I am building the argument on the paper by Williams, Grajales, and Kurkiewicz (2013)
I hear that's a good paper.

19. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

That is a pretty Q-Q plot. What did you do for the exogeneity and link test?

20. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

Originally Posted by noetsi
You do have some outliers in the upper tail. I would suggest a skew and kurtosis test. If you have extra time on your hand you can run one of the many test of influence such as Cooks d or DFBETA for the impact of outliers on your data.

Which robust SE did you use? White's?

In comparing models I think the one most recomended is AIC.
That is correct, I do have a number of outliers. It was a weighted decision to retain them, since they are "the story tellers". I am planning on mentioning that in the limitations section. I did examine the Cook's distances as well. Depending on the threshold, their number varies D>1 - zero, D>4/N - apx 5%.

In case of SE, I am using u²/(1-h)² (Davidson and MacKinnon, 1993).

21. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

Originally Posted by hlsmith
That is a pretty Q-Q plot. What did you do for the exogeneity and link test?
Link test - just followed the guidelines suggested by STATA guide (-linktest- command)

For the exogeniety: (a) examined the correlation b/w predictors and residuals (must be zero), and (b) conducted a Hausman Chi-square test.

22. ## Re: 1 out of 5 predictors is non-linear - can I proceed with OLS estimation?

It was a weighted decision to retain them, since they are "the story tellers".
One of the most common comments on outliers is that you should always wonder why they exist. They can be your best learning experience about the data. And the common recomendations is that you should not remove non clerical outliers - although when they badly distort the regression line I personally always had problems with that advice.