When Log10 is not sufficient to achieve normal distribution...

I've examined numberious ways to get my positive skewed data to conform statistically via Shapiro-Wilkes to no avail - Log10 appears to visually suggest it approximates normally distributed data but the test results (000) say otherwise. I've already deleted outliers beyond 2.5 std dev. Thoughts?


Ambassador to the humans
Why are you trying to transform to normality in the first place? There are typically better methods you can use when normality isn't met. There is also a lot of misunderstanding about what we need to be normally distributed in the first place. So if you could describe your data and what you're actually trying to achieve we can probably give you better advice.


No cake for spunky
Tukey did a "ladder" that suggests transformations when log transformations don't work. You might look at those. They use various roots and powers moving from lower order transformations where the data is not normal to where it is extremely abnormal.
The data set consists of 1,062 firm level observations across 10 different industries. The goal is to perform a hierarchical multiple regression (DV=Tobin's Q) testing a moderating categorical variable on two different constructs (each summated z scores) and there are 4 control variables.
Using a Log10 transformation on the DV, it results in a skewness of .507 with a std error of .075 well above the target +/- 1.96 range.


No cake for spunky
Hiearchial multiple regression is used to describe at least two totally different methods. First, when you are adding variables in a series of blocks and testing if the new variables add additional predictive power and second when you are actually using multi-level analysis (which uses WLS not OLS to do its estimation commonly).


No cake for spunky
I think your best bet (if you want normality by transforming your data) is to work through Tukey's ladder which is in many text and likely on line. What Dason as arguing I think is that its not the normality of the IV and DV per se that matters (which is what you appear to me to be testing) its the normality of the residuals that does. And normality is only required in regression for the confidence intervals. Your parameters will be unbiased and BLUE I believe even if your data is not normal (even if the residuals are not). Although not all statisticians agree....(this is an ongoing dispute between Dason and I, but he is likely right on anything statistical).:p

Note that if you are using some forms of maximum liklihood as an estimator (compared to OLS) this will commonly require normality to work even if regression itself does not. But most likely with linear regression you won't be using ML anyhow.

If you are doing blocks than what you likely care about is the signficance of the F change test which determines if the R squared increased signficantly. I don't know if normality is required for the F test, although I doubt it.

This is one of those areas where what classes commonly teach and statisticians generally believe appears to be very different. There is often tremendous focus in classes on statistics on normality, even though its not actually a part of the central Gaus-Markov assumptions of regression. That has been really hard for me to accept - given how often I was taught the importance of normality.