Can't answer your question, but if you desperately must use it as a dependent variable one of the Wild bootstrap estimators has a skewness corrector (don't know if it's good enough for this).
Dear all,
I am using a variable that is extremely skewed and has little variance. This is normal in my field (criminology) but I think this one goes to extremes so that I should better not use it. I notice also in my results that it works very contradictory to expected outcomes, even when I apply techniques that don't assume normality and account for heteroscedasticity (negative binomial regression).
But I had the assignment to include this variable in my work, so now I am looking for a rule of thumb I can refer to as to back up my intuitive decision to leave it out. I have trouble finding an answer to my question on the internet.
These are relevant characteristics of my variable:
serious property crime variety
Value|Frequency| Percent
,00 |402 |86,5
1,00 |38 |8,2
2,00 |12 |2,6
3,00 |8 |1,7
4,00 |3 |,6
5,00 |2 |,4
Total 465 100
Variance ,498
Skewness 3,855
Std. Error of Skewness ,113
Kurtosis 16,717
Std. Error of Kurtosis ,226
Does anybody know more about why it would be wrong to work with this variable? Is it so that the law of large numbers is violated so that correlates with this variable are not very trustworthy?
Thanks in advance!
M
Can't answer your question, but if you desperately must use it as a dependent variable one of the Wild bootstrap estimators has a skewness corrector (don't know if it's good enough for this).
Mathis (08-06-2012)
Thanks, much appreciated!! But what really want to do is not use it in as a dependent variable. I just want to back this decision up for my supervisor (who has used the variable before so probably does not see any problems).
Anyone else have any ideas?
For anyone stumbling upon this thread wondering about the same thing. I eventually found some standards about acceptable skewness and kurtosis for psychometric purposes.
Skewness should be between -2/+2 and Kurtosis -5/+5.
See: http://jolt.merlot.org/vol4no4/sher_1208.htm
I do not understand why you are so against using it. Is it a significant predictor, but has a paradoxical effect so you are looking for a reason to exclude it. If this is just an example or homework and the outcomes do not agree with anecdotal experiences, that is probably fine, since it is not a real dataset. Perhaps the variable is labelled wrong, but if this is just a pedagogical example I would include it and explain its relationship the best you can.
Mathis (08-07-2012)
No, it is a real dataset and I use it for my master thesis.
It's not used as a predictor but as a dependent variable. And it the predictors regress very differently with it from other crime type scales, who do have an acceptable skewness and kurtosis. I believe conclusions drawn from regressions with this variable are not correct.
I could include it in my work and always make the note that the effects found should be interpreted with caution due to its characteristics? I'll think about it.

The rule for skew I have seen is beyond +- 3. There are a variety of ways to deal with skew, one of which is a transformation (commonly logging is done of the DV). Another might be to use a non-parametric test. A transformation might also address a lack of variation (although I am not sure), which is a serious issue. This will attenuate any results you get.
But my question is, why consider a variable with little variation? This suggests that no other factor has significant impact on it, because it does not vary much regardless.
"Facts are stubborn things, but statistics are more pliable." Mark Twain
|
|