# Thread: Variable transformation in OLS regression

1. ## Variable transformation in OLS regression

Hi there,

I have a question related to the transformation of independent variables in an OLS regression. More specifically:

When I transform an independent variable, when do I have to keep the original variable into the model?

For instance, I know that, if I have an issue with linearity (in the following model; Y on X), I could try to put X squared into the model, while keeping the original value of X. So, I'll have to regress Y on X + X^2. Yet, sometimes, it seems that you don't need to keep the original variable into the model, such as when you have an issue of normality, and you apply a log or squared root transformation.

My impression is that, when you transform a variable for linearity issues, you will keep the original data into your model. But when you transform a variable to achieve normality, you only keep the transformed variables. Is it right?

Thank you,

Tux

2. ## Re: Variable transformation in OLS regression

Hello there! To tell you the truth, I'm not sure that you square the variable in order to normalize it (if that is what you want). In such case, you could simply take a log and use it in the model (the interpretation of the coefficient would be different, though). When you square the variable, you assume a curvilinear relationship between X and Y -- and that is a quite complex issue that is heavily dependent on theory. Besides, if you indeed assume curvilinear relationship and want to test it, then both X and X_sq must be retained in the model.

Hope this helps.

3. ## Re: Variable transformation in OLS regression

Just a reminder: OLS regression does not assume that any of your independent or dependent variables are normally distributed. It assumes only that the error terms are normally distributed. And even that is a relatively unimportant assumption with a reasonably large sample size. If you have a good principled reason to think that transforming an IV will result in normal errors, this might be ok, but definitely don't tranform an IV just to give that IV a normal distribution.

The usual shameless plug for our article on regression assumptions: http://pareonline.net/getvn.asp?v=18&n=11

4. ## Re: Variable transformation in OLS regression

Perfect point, Mr. Williams. Wanted to double-check for myself though, what would you consider a "reasonably large sample size" to "violate" this assumption?

5. ## Re: Variable transformation in OLS regression

Haha. Well the assumption of normal errors is still violated even with a large sample size: the point is that it won't have any important effects. If the errors are independently and identically distributed with mean zero, the normality assumption isn't required for the OLS estimator to be unbiased, consistent, and efficient (BLUE), regardless of sample size. The normal errors assumption is technically required for the coefficients to have a normal sampling distribution (and for significance tests and confidence intervals based on the normality assumption thus to be trustworthy). However, if the other assumptions hold, then the sampling distribution of the coefficients will converge toward normality anyway as the sample size grows larger.

How big a sample size is required to give robustness to non-normality of errors depends on the extent of error of non-normality and how much of a change to Type 1/2 error rates and confidence interval coverage would actually bother you. But even something like N = 30 is probably enough to make the normality assumption unproblematic in many situations (or that's the widely accepted folklore anyway). In general I'd say it's almost always more important to focus on the other assumptions of OLS regression that get less airtime than this one!

6. ## The Following User Says Thank You to CowboyBear For This Useful Post:

kiton (06-16-2016)

7. ## Re: Variable transformation in OLS regression

Thank you for such a great reply, Sir. It surely added clarity to the understanding of the residual assumption. Also, I cannot agree more with you on the importance of other assumption and remedies to address those.

8. ## Re: Variable transformation in OLS regression

Violations of linearity, I think, is one of the assumptions that is not asymptotically correct so a large sample size won't make up for violations of assumptions unlike normality. Also I would think this would bias the slope unlike normality or heteroscedastic results which only effects the p values.

I have never seen a rule that covers including the time variable relative to the quadratic. It makes little sense to me for the quadratic to be significant when the time variable itself is not given what each mean. Note that is only one type of nonlinearity.

9. ## Re: Variable transformation in OLS regression

Originally Posted by noetsi
Violations of linearity, I think, is one of the assumptions that is not asymptotically correct so a large sample size won't make up for violations of assumptions unlike normality. Also I would think this would bias the slope unlike normality or heteroscedastic results which only effects the p values.
Yeah this is totally right. Unmodelled linearity leads to violation of the assumption that the conditional mean of the error terms are all zero. Which in turn can lead to bias regardless of sample size. (Unless you have the whole population I guess, and even then you might have an unbiased model which still doesn't accurately describe the actual underlying relationship!) That said, adding polynomial terms to match the pattern seen in the sample risks overfitting and isn't necessarily going to make it less likely that an assumption breach takes place, so it's something to be done cautiously.

And to go back to the original question: If you are trying to model a quadratic relationship, you must include both X and X^2. Both terms are required for the fitted relationship to be quadratic. Y = B0 + B1X + B2X^2 is a quadratic model, but Y = B0 + B2X^2 isn't, and takes a completely different shape. Similar for cubic models or other polynomial models.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts