#### yoma819

##### New Member
Can anyone suggest a guide or link for more information on the transformation or non normally dirtibuted data?
many thanks
yoma

#### George_Y

##### New Member
Here is a start: http://pareonline.net/getvn.asp?v=8&n=6. There are some refs at the end of that too. Tabachnick and Fidell is always a good source. There is plenty of this on the net though so give it a good google search and you should find your answers.

#### yoma819

##### New Member
depressingly i have 4 sets of data, all with different distributions:
first one: lognormal
second one:burr(4p)
third one: log-logistic (3p)
fourth :gamma (3p)

has anyone got any advice for these specific distributions?
cheers
--yoma

##### Ninja say what!?!
Why are you interested in transforming the data? Its possible that you could use the data as-is.

#### yoma819

##### New Member
my data is definitely not normally distributed and for a glm it does need to be.
the only problem is i dont seem to be having much luck.
to try and normalise the gamma distribution i have tried:
square
square root
log

but to no avail.
this is the first time i have done data transformation and would appreciate any help
cheers
Yoma

#### Dason

I agree with Link. Whatever you're trying to do can probably be done with the knowledge you have. You seem to know quite a bit about the data. Are you positive that those are the distributions or are you just guessing?

#### yoma819

##### New Member
100% positive.
i have run the tests.
basically i ran it through minitab which told me that my 4 sets of data were not normally distributed then i ran it through easyfit which told me which distributions fitted my data.
have a look:
http://s884.photobucket.com/albums/ac50/yoma819/
cheers
yoma

#### Dason

Ok but you still haven't told us what you're trying to do with the data. Also, just because you ran it through a program and it told you what it thinks the distribution is doesn't make you 100% positive that that is the true distribution. It's giving you it's best guess. I was asking if you knew that it came from some theoretical distribution (like you generated the data yourself or something). I mean I can generate some data from the gamma distribution and you'd could be pretty convinced that it came from a normal distribution if I didn't tell you what it really was...

So what are you trying to do with this data? Why do you feel it needs to be normally distributed?

#### yoma819

##### New Member
i am trying to fit it to a General linear model to see if there is a significant difference between wild type weight and 1a,and 1b and 1c between levels of two factors one with 7 levels and another with 2.
basically what i was trying to find out how to do in this post:
Code:
http://talkstats.com/showthread.php?t=13096
the data is not generated but from an experiment (same experiment as in the post linked to above)
i need the data to be normally distributed as this is an assumption of a glm!

#### Dason

Actually we don't care if the data is normally distributed. Just if the residuals are normally distributed.

#### yoma819

##### New Member
how would i go about doing that?
why would i be only interested in the normality of my residuals?
cheers
yoma

##### Ninja say what!?!
my data is definitely not normally distributed and for a glm it does need to be.
the only problem is i dont seem to be having much luck.
to try and normalise the gamma distribution i have tried:
square
square root
log

but to no avail.
this is the first time i have done data transformation and would appreciate any help
cheers
Yoma
I remember a classmate of mine worrying about this once. I sat and listened while he explained it to the professor. From my memory of the conversation, you don't need to worry about the data being normally distributed for glm's. What you need to verify is that the residuals are normally distributed. Transforming the data isn't necessary if you're worried about that.

##### Ninja say what!?!
Actually we don't care if the data is normally distributed. Just if the residuals are normally distributed.
LOL. Danggit Dason! Got there before I did!

#### yoma819

##### New Member
ok
I have looked over again and using a regression plot it does not look like they are normally distributed.

HTML:
http://s884.photobucket.com/albums/ac50/yoma819/?action=view&current=minitabresidual.jpg
1a, 1b and 1c are plotted against wt
many thanks
yoma

#### Dason

why would i be only interested in the normality of my residuals?
The only reason we even care is because one of the assumptions we make when deriving the theory is that the errors are normally distributed. We don't care how the data itself is distributed because we say that once we adjust for our predictors the errors/residuals will be normally distributed.

I guess one way to see why this is what we care about is consider we are comparing two groups.
Code:
#data from first group
y1 <- rnorm(100,5)
#data from second group
y2 <- rnorm(100,100)
#data overall
y <- c(y1,y2)
hist(y) #clearly not normal
hist(y1) #once we adjust though they look normal
hist(y2)
Clearly the overall data isn't normally distributed... but who cares. Once we look at each group individually they look normal so we're all right. This is why we only care if the residuals are normally distributed.

##### Ninja say what!?!
I don't know what the "%" is in the Y-axis, but it looks like you plotted it wrong. You're supposed to plot the fitted values against the residuals.

Edit: Looking at the graph again, I think I see what kind of graph it is. If it's what I think it is, then yes, you do have a problem.

Last edited:

#### yoma819

##### New Member
ok got it:
Code:
http://i884.photobucket.com/albums/ac50/yoma819/fittedresiduals.jpg
so we have assertained that i do infact need to transform my data.
and advice on what kind of transformation?
thanks again
--yoma

#### yoma819

##### New Member
The only reason we even care is because one of the assumptions we make when deriving the theory is that the errors are normally distributed. We don't care how the data itself is distributed because we say that once we adjust for our predictors the errors/residuals will be normally distributed.

I guess one way to see why this is what we care about is consider we are comparing two groups.
Code:
#data from first group
y1 <- rnorm(100,5)
#data from second group
y2 <- rnorm(100,100)
#data overall
y <- c(y1,y2)
hist(y) #clearly not normal
hist(y1) #once we adjust though they look normal
hist(y2)
Clearly the overall data isn't normally distributed... but who cares. Once we look at each group individually they look normal so we're all right. This is why we only care if the residuals are normally distributed.
ok i understand why i am looking at the residuals now , thanks for clearing that confusion up.
i take it in your R code you are generating random data and putting it into y1?
Code:
y1 <- rnorm(100,5)
and then normally distributed data into y2

Code:
y2 <- rnorm(100,100)
but what does:
Code:
y <- c(y1,y2)
do?
sorry just trying to understand your R code!
cheers
Yoma

#### yoma819

##### New Member
does anyone know of any software that will transform data automatically (like quickfit does for distributions)
i know quickfit is not 100% but it gives a great direction to go in and then further test.
many thanks
yoma