log transformation of binary variables

#1
Hi everyone,

I am trying to do a factor analysis of a group of drinking behavior variables, and some of them are highly positively skewed. I want to do a log transformation of these variables, but they are binary. I can't find anything that deals with whether this is a problem or not - but when I do the transformation it doesn't look right. Is there any reason it would be different with a binary variable? Does anyone have any suggestions about what transformation is best (ln or log10, or something else)?

thanks for the help in advance!
 

Dason

Ambassador to the humans
#2
... what would be the point of transforming a binary variable? And if by binary you mean 0 or 1 then you really can't log transform the data since log(0) is undefined (on the real number line).
 
#3
Well, from my understanding, one can do ln(variable+1) to eliminate the problems of having zeros in the dataset. But I don't know if there are other issues with transforming binary variables. They are still non-normally distributed, which is what I'm trying to fix.
 

Dason

Ambassador to the humans
#4
The thing is though that since the data is binary all you would essentially do is change it from 0/1 to 0/log(2) which doesn't do anything to help. Transforming binary data doesn't really make sense to me here.
 

noetsi

No cake for spunky
#5
I know there is a lot of disagreement on this, but many argue you can't do SEM including EFA with binary data.

I have never read anything that says that you can't transform binary data with logs. I don't think it will make the data normal if that is what you want.
 

noetsi

No cake for spunky
#7
After I posted my previous comment I wen't to the M-Plus site (they have one of the better softwares for SEM). They noted that:

Mplus does exploratory and confirmatory factor analysis of dichotomous items.
So apparently there is a way to do EFA with dichotomous data - although I have never heard about it. Nor my instructor in SEM who told me you should not do it (and is an expert in SEM). :)

http://www.statmodel.com/discussion/messages/8/50.html?1302294648
 

spunky

Can't make spagetti
#8
So apparently there is a way to do EFA with dichotomous data - although I have never heard about it. Nor my instructor in SEM who told me you should not do it (and is an expert in SEM). :)
that's because Muthen is a student of Joreskog... and Joreskog (in his brilliancy) figured out this probelm very well through the use of... yes...

TETRACHORIC CORRELATION MATRICES.

people, if anyone out there reads this and is either doing factor analysis on binary data (0/1) or polytomous data (likert-type scales) you factor-analyze the tetrachoric (for binary) or polychoric (for likert-type scale) correlation matrices... even Muthen cites his academic father Joreskog in the maximum likelihood estimation Mplus makes of the tetrachoric correlation matrix... that's how maximum-likelihood factor analysis was invented in the first place...


altho i do have to go back to jpkelly's original question... so what exactly are you trying to do? how was the data of these drinking behaviours coded? why factor analysis?