# standardizing and skewness

#### sten

##### New Member
Hi, I would really appreciate some input on this question. So I want to create a
measure of parenting that is derived from two scales, one measuring warmth and the other measuring control. The scales are created by taking the mean of two set of items that are binary, coded yes/no.

One thought was to standardize the two scales (mean=0; std= 1) and then sum the two standardized scores together to generate a measure of parenting.

The problem is that both measures are highly skewed, the tails are to the left, and most of the points to the right. What implications does this have for standardizing. I realize that when I standardize then that doesn't change the skewness, each person will still be in the same relative position.

If my sample size is greater than 2000 does it matter what the distributions are? Can I approximate the normal distribution?

Any thoughts would be very much appreciated!!!

#### sten

##### New Member
So because of the negative skew, I did a reflection of both variables and
then did both a log transformation in SAS.

Here are the new statistics:

original warmth skewness: -0.7555848
log and reflected warmth skewness: 0.5193317
inverse and reflected warmth skewness: -0.3049108

Would I basically conclude from this that the inverse and reflected measure is
the best outcome?

Also, I'm a bit concerned that all this data manipulation really changes the interpretation of my measure. If I then use this measure in a logistic regression as my outcome variable, what does this mean for how I interpret the coefficients? Would they have to be back transformed?

Thank you very much for your input!

#### jkotlerman

##### New Member
you can also try other transformation, such as the square root or the cube root, sometimes those work better. Also when no transformation works you can try to break your measure into ordinal categories.
If I gather this correctly, and these two measures are your outcome measures, then why would you be using logistic regression? Logistic regression is not for continuous outcomes. On the other hand if you break your measure into categories, you can use ordinal logistic regression.
You are correct that the results of the transformed model are harder to interpret, but you can use the 'estimate' statement in SAS and then retransform the value back to the original units. There is a transformation factor (also known as smear factor) that you need to use when retransforming. If you don't use it, you are introducing a bias into your answer.

Jenny Kotlerman
www.statisticalconsultingnetwork.com

#### sten

##### New Member
Hi Jenny, Sorry I should clarify, I was considering transforming the variables, and then doing a median split, which is why I was considering logistic regression. But I've been reconsidering that because of the potential loss of power, I would prefer to keep the measures continuous. My sample size is large, around 2200, but several of my other variables are binary or categorical, and I want to be careful about breaking everything into categories with tiny sample sizes.

I should also mention that the data is multilevel, a 3 level hierarchical model.
many thanks! I will try the other transformation. I didn't know about the estimate command, will check that out.

many thanks!

#### mp83

##### TS Contributor
Just to tip in something...

skewness*sqrt(n/6)~N(0.1)

So you can construct a test