data transformation links please

#4
Thanks for the links!
depressingly i have 4 sets of data, all with different distributions:
first one: lognormal
second one:burr(4p)
third one: log-logistic (3p)
fourth :gamma (3p)

has anyone got any advice for these specific distributions?
cheers
--yoma
 
#6
my data is definitely not normally distributed and for a glm it does need to be.
the only problem is i dont seem to be having much luck.
to try and normalise the gamma distribution i have tried:
square
square root
log

but to no avail.
this is the first time i have done data transformation and would appreciate any help
cheers
Yoma
 

Dason

Ambassador to the humans
#7
I agree with Link. Whatever you're trying to do can probably be done with the knowledge you have. You seem to know quite a bit about the data. Are you positive that those are the distributions or are you just guessing?
 

Dason

Ambassador to the humans
#9
Ok but you still haven't told us what you're trying to do with the data. Also, just because you ran it through a program and it told you what it thinks the distribution is doesn't make you 100% positive that that is the true distribution. It's giving you it's best guess. I was asking if you knew that it came from some theoretical distribution (like you generated the data yourself or something). I mean I can generate some data from the gamma distribution and you'd could be pretty convinced that it came from a normal distribution if I didn't tell you what it really was...

So what are you trying to do with this data? Why do you feel it needs to be normally distributed?
 
#10
i am trying to fit it to a General linear model to see if there is a significant difference between wild type weight and 1a,and 1b and 1c between levels of two factors one with 7 levels and another with 2.
basically what i was trying to find out how to do in this post:
Code:
http://talkstats.com/showthread.php?t=13096
the data is not generated but from an experiment (same experiment as in the post linked to above)
i need the data to be normally distributed as this is an assumption of a glm!
 
#12
how would i go about doing that?
i have access to minitab, r and easyfit!
why would i be only interested in the normality of my residuals?
cheers
yoma
 

Link

Ninja say what!?!
#13
my data is definitely not normally distributed and for a glm it does need to be.
the only problem is i dont seem to be having much luck.
to try and normalise the gamma distribution i have tried:
square
square root
log

but to no avail.
this is the first time i have done data transformation and would appreciate any help
cheers
Yoma
I remember a classmate of mine worrying about this once. I sat and listened while he explained it to the professor. From my memory of the conversation, you don't need to worry about the data being normally distributed for glm's. What you need to verify is that the residuals are normally distributed. Transforming the data isn't necessary if you're worried about that.
 
#15
ok
I have looked over again and using a regression plot it does not look like they are normally distributed.

HTML:
http://s884.photobucket.com/albums/ac50/yoma819/?action=view&current=minitabresidual.jpg
1a, 1b and 1c are plotted against wt
many thanks
yoma
 

Dason

Ambassador to the humans
#16
why would i be only interested in the normality of my residuals?
The only reason we even care is because one of the assumptions we make when deriving the theory is that the errors are normally distributed. We don't care how the data itself is distributed because we say that once we adjust for our predictors the errors/residuals will be normally distributed.

I guess one way to see why this is what we care about is consider we are comparing two groups.
Code:
#data from first group
y1 <- rnorm(100,5)
#data from second group
y2 <- rnorm(100,100)
#data overall
y <- c(y1,y2)
hist(y) #clearly not normal
hist(y1) #once we adjust though they look normal
hist(y2)
Clearly the overall data isn't normally distributed... but who cares. Once we look at each group individually they look normal so we're all right. This is why we only care if the residuals are normally distributed.
 

Link

Ninja say what!?!
#17
I don't know what the "%" is in the Y-axis, but it looks like you plotted it wrong. You're supposed to plot the fitted values against the residuals.

Edit: Looking at the graph again, I think I see what kind of graph it is. If it's what I think it is, then yes, you do have a problem.
 
Last edited:
#18
ok got it:
Code:
http://i884.photobucket.com/albums/ac50/yoma819/fittedresiduals.jpg
so we have assertained that i do infact need to transform my data.
and advice on what kind of transformation?
thanks again
--yoma
 
#19
The only reason we even care is because one of the assumptions we make when deriving the theory is that the errors are normally distributed. We don't care how the data itself is distributed because we say that once we adjust for our predictors the errors/residuals will be normally distributed.

I guess one way to see why this is what we care about is consider we are comparing two groups.
Code:
#data from first group
y1 <- rnorm(100,5)
#data from second group
y2 <- rnorm(100,100)
#data overall
y <- c(y1,y2)
hist(y) #clearly not normal
hist(y1) #once we adjust though they look normal
hist(y2)
Clearly the overall data isn't normally distributed... but who cares. Once we look at each group individually they look normal so we're all right. This is why we only care if the residuals are normally distributed.
ok i understand why i am looking at the residuals now , thanks for clearing that confusion up.
i take it in your R code you are generating random data and putting it into y1?
Code:
y1 <- rnorm(100,5)
and then normally distributed data into y2

Code:
y2 <- rnorm(100,100)
but what does:
Code:
y <- c(y1,y2)
do?
sorry just trying to understand your R code!
cheers
Yoma
 
#20
does anyone know of any software that will transform data automatically (like quickfit does for distributions)
i know quickfit is not 100% but it gives a great direction to go in and then further test.
many thanks
yoma