normality transformation and regression

#1
Hi,
I have performed 1/sqrtY, and sqrtY transformations on different data sets to achieve normal distribution for simple linear regressions.

So after performing these regressions, please advice if the following is correct.

In following, X is independent variable and Y is dependent variable
1)For 1/sqrt(X): Regression equation : y=(b/sqrt(x)) + a (same as predicted value)
2)For 1/sqrt(Y): Regression equation : 1/sqrt(y)=bx+a, Predicted value = y=1/(bx +a)^2
3)For sqrt(Y) : Regression equation : Sqrt(y)=bx+a, Predicted value = y=(bx + a)^2

From the above three, only for the 2) the sign of correlation coefficient/regression direction will change.
 

Karabiner

TS Contributor
#2
I have performed 1/sqrtY, and sqrtY transformations on different data sets to achieve normal distribution for simple linear regressions.
If you mean, you wanted to achieve normal distribution
of the dependent variables: this doesn't matter in linear
regression. Distribution of the INdepedenten variables
also doesn't have to be normal.

Only the distribution of the residuals matters. And if your
samples sizes are large enough (> 50, or > 80 or so),
then even residuals' deviation from normality doesn't
matter much.

With kind regards

K.
 
#3
Thanks Karabiner for your reply.
Just to confirm, so it does not matter whether the dependent or independent variables raw data do or do not have normal distribution for regression?
So does that mean performing shapiro-wilk test and qq norm plots on both dependent and independent variables is not necessary at all before the regression?
You mentioned the residuals though need to show normal distribution. What is the best way to test that?
Can you please explain a bit more?
Thank you very much,
cheers,
 
#4
You can perform the normality tests or just construct a QQplot and histogram of the residuals to check normality of the residuals. As stated, it's the normality of the residuals that matters.
 
#5
Thanks Disvengeance! So 1)a shapiro-wilk performed on raw data, does that test the normal distribution of raw data or that of the residuals?
and 2)from my original forst post, if for any reason transformations are used, and you want to present the results in the original non-transformed variables, which tranformers require reversing of the direction of relationship and which do not?
From the above, I suppose 1/sqrt(Y) would require and sqry(Y) and 1/sqrt(x) will not. Is that correct?
 

maartenbuis

TS Contributor
#6
1) The residuals
2) Backtransforming is not that simple, you can look at (Duan 1983) for a possible solution.

Naihua Duan (1983) Smearing Estimate: A Nonparametric Retransformation Method. Journal of the American Statistical Association, Vol. 78, No. 383, pp. 605-610
 
#7
Thanks Maartenbuis!
I am a novice and could not understand much from the complicated terms in that artcle, but would research more on smearing estimate.
My concern is majorly about the direction of the relationship and correction required in that. If my aim is not to retransform to predict Y, but only exploratory - investigating the direction of relationship between x and y, I need to know if the direction of relationship would be changed or unchanged in regression using those three transformations?
Thanks again,
Cheers!