# Using skewed data in a regression

#### Jfield7

##### New Member
Hi, I was wondering if anyone could help me. I am conducting a mediation analysis with regression and one of the variables in negatively skewed the z score is -3.078, so the skew is significant. I have tried transforming the variables, however as you have to do all the variables and with the same type of transformation (we have a lot of variables) it makes some of the variables which were previously not skewed, skewed.

So now I am trying to find out how much it would effect the results to just carry out a linear regression with the variables as they are - as there is no form on non-parametric regression.

Thanks!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Which variable is skewed, dependent of independent. Why not look at generalized linear models (GLM) and have you examine the models residual for normality or are you just talking about a singe variable?

#### Jfield7

##### New Member
Which variable is skewed, dependent of independent. Why not look at generalized linear models (GLM) and have you examine the models residual for normality or are you just talking about a singe variable?
It is the independent variable which is skewed. I am an undergraduate so my stats knowledge is not very good. I don't know a generalized linear model is, and why I would use that instead. And also I do not know how to examine the models residual for normality? But it is just the independent variable which is skewed the dependent and mediator are fine. I don't know what the impact would be on using the skewed variable in the model.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
You can have a skewed independent variable in linear regression as long as your model residuals are normally distributed. Do some research on how to examine for this in your program (software) and let us know if you have any questions.

#### noetsi

##### No cake for spunky
There is considerble disagreement in practice on this topic. On this board, and I think this is correct, it is held that the distribution of the independent variables is not important. That only the normality of the residuals is. A caution here is that this is commonly not agreed on in classes (and in many books on data analysis) where skewness is stressed in terms of the IV. And, regardless of who is right, if your instructor thinks it matters it does.

#### Jfield7

##### New Member
Thank you for your help. Do you know any papers that quote it is okay to do the regression if it is just the IV that is skewed, my supervisor seems very hot on justifying why we choose to do everything the way we have.

#### noetsi

##### No cake for spunky
I believe skew only effects the normality assumption. If so you might read William Berry's "Understanding Regression Assumptions" by Sage p81-82. He notes that it does not effect whether the parameters are biased - its primary impact is on test of statistical significance. Even so this normally only matters (he notes) with small sample sizes.

As with many statistical issues I am sure you will find others who strongly disagree. I read one such book a decade ago that stressed that even few outliers could totally distort the results even with large samples.

The berry book is a good starter for regression assumptions generally not just for skew.