+ Reply to Thread
Results 1 to 8 of 8

Thread: Variable transformation in OLS regression

  1. #1
    Points: 346, Level: 6
    Level completed: 92%, Points required for next Level: 4
    Tux's Avatar
    Posts
    10
    Thanks
    1
    Thanked 1 Time in 1 Post

    Variable transformation in OLS regression




    Hi there,

    I have a question related to the transformation of independent variables in an OLS regression. More specifically:

    When I transform an independent variable, when do I have to keep the original variable into the model?

    For instance, I know that, if I have an issue with linearity (in the following model; Y on X), I could try to put X squared into the model, while keeping the original value of X. So, I'll have to regress Y on X + X^2. Yet, sometimes, it seems that you don't need to keep the original variable into the model, such as when you have an issue of normality, and you apply a log or squared root transformation.

    My impression is that, when you transform a variable for linearity issues, you will keep the original data into your model. But when you transform a variable to achieve normality, you only keep the transformed variables. Is it right?

    Thank you,

    Tux

  2. #2
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Variable transformation in OLS regression

    Hello there! To tell you the truth, I'm not sure that you square the variable in order to normalize it (if that is what you want). In such case, you could simply take a log and use it in the model (the interpretation of the coefficient would be different, though). When you square the variable, you assume a curvilinear relationship between X and Y -- and that is a quite complex issue that is heavily dependent on theory. Besides, if you indeed assume curvilinear relationship and want to test it, then both X and X_sq must be retained in the model.

    Hope this helps.

  3. #3
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Variable transformation in OLS regression

    Just a reminder: OLS regression does not assume that any of your independent or dependent variables are normally distributed. It assumes only that the error terms are normally distributed. And even that is a relatively unimportant assumption with a reasonably large sample size. If you have a good principled reason to think that transforming an IV will result in normal errors, this might be ok, but definitely don't tranform an IV just to give that IV a normal distribution.

    The usual shameless plug for our article on regression assumptions: http://pareonline.net/getvn.asp?v=18&n=11
    Matt aka CB | twitter.com/matthewmatix

  4. #4
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Variable transformation in OLS regression

    Perfect point, Mr. Williams. Wanted to double-check for myself though, what would you consider a "reasonably large sample size" to "violate" this assumption?

  5. #5
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Variable transformation in OLS regression

    Haha. Well the assumption of normal errors is still violated even with a large sample size: the point is that it won't have any important effects. If the errors are independently and identically distributed with mean zero, the normality assumption isn't required for the OLS estimator to be unbiased, consistent, and efficient (BLUE), regardless of sample size. The normal errors assumption is technically required for the coefficients to have a normal sampling distribution (and for significance tests and confidence intervals based on the normality assumption thus to be trustworthy). However, if the other assumptions hold, then the sampling distribution of the coefficients will converge toward normality anyway as the sample size grows larger.

    How big a sample size is required to give robustness to non-normality of errors depends on the extent of error of non-normality and how much of a change to Type 1/2 error rates and confidence interval coverage would actually bother you. But even something like N = 30 is probably enough to make the normality assumption unproblematic in many situations (or that's the widely accepted folklore anyway). In general I'd say it's almost always more important to focus on the other assumptions of OLS regression that get less airtime than this one!
    Matt aka CB | twitter.com/matthewmatix

  6. The Following User Says Thank You to CowboyBear For This Useful Post:

    kiton (06-16-2016)

  7. #6
    Points: 4,664, Level: 43
    Level completed: 57%, Points required for next Level: 86
    kiton's Avatar
    Location
    Corn field
    Posts
    234
    Thanks
    47
    Thanked 51 Times in 46 Posts

    Re: Variable transformation in OLS regression

    Thank you for such a great reply, Sir. It surely added clarity to the understanding of the residual assumption. Also, I cannot agree more with you on the importance of other assumption and remedies to address those.

  8. #7
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Variable transformation in OLS regression

    Violations of linearity, I think, is one of the assumptions that is not asymptotically correct so a large sample size won't make up for violations of assumptions unlike normality. Also I would think this would bias the slope unlike normality or heteroscedastic results which only effects the p values.

    I have never seen a rule that covers including the time variable relative to the quadratic. It makes little sense to me for the quadratic to be significant when the time variable itself is not given what each mean. Note that is only one type of nonlinearity.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  9. #8
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Variable transformation in OLS regression


    Quote Originally Posted by noetsi View Post
    Violations of linearity, I think, is one of the assumptions that is not asymptotically correct so a large sample size won't make up for violations of assumptions unlike normality. Also I would think this would bias the slope unlike normality or heteroscedastic results which only effects the p values.
    Yeah this is totally right. Unmodelled linearity leads to violation of the assumption that the conditional mean of the error terms are all zero. Which in turn can lead to bias regardless of sample size. (Unless you have the whole population I guess, and even then you might have an unbiased model which still doesn't accurately describe the actual underlying relationship!) That said, adding polynomial terms to match the pattern seen in the sample risks overfitting and isn't necessarily going to make it less likely that an assumption breach takes place, so it's something to be done cautiously.

    And to go back to the original question: If you are trying to model a quadratic relationship, you must include both X and X^2. Both terms are required for the fitted relationship to be quadratic. Y = B0 + B1X + B2X^2 is a quadratic model, but Y = B0 + B2X^2 isn't, and takes a completely different shape. Similar for cubic models or other polynomial models.
    Matt aka CB | twitter.com/matthewmatix

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats