Hi everyone, I’ve been doing some statistical analyses in R on some data. It’s for use in a manuscript I’m hoping to get published in a biological journal. Unfortunately, the tests I ended up having to run are kind of at the limit of my understanding, so I was hoping that I could run through what I did with my data, and could get opinions on whether this was a valid approach or not. So let’s start:

I needed to test the effect of a combination of categorical and continuous independent variables on a continuous dependent variable, with a random factor included to control for pseudo replication.

My untransformed data for the responding variable looks like this:

Now, originally, I was going to run a GLMM (generalized linear mixed-model), but no matter what families, links, and transformations I tried, I couldn’t meet the assumption of homoscedasticity (variance always increased)… this was the closest I got:

It was suggested to me that I might have more luck with a GEE (generalized estimation equation) instead, so I went about doing that. First, I fit the data to a GLM (generalized linear model) by transforming the responding variable as such: I took the natural log of the responding variable, added a constant to bring the lowest value up to 1, and used a natural log transformation on the data once again (i.e. an ln-ln transformation). I used a gaussian family with an identity link (the best combination was actually a quasipoisson family with a log link, but there is no quasipoisson family usable for GEE tests in R from what I understand...so I stayed with the gaussian/identity). This rendered the data homoscedastic and normal, as shown by the GLM plot output:

After I fit the data to the GLM, I tested the ln-ln transformed data in a GEE, using, of course, the same family and link function as in the GLM. I used an exchangeable structure as it provided the lowest QIC, QICu and CIC, and the highest quasi likelihood.

Now, other than wondering if this process seems valid, I have two questions:

1. Someone I’ve been talking to said that although they thought my approach was statistically meaningful, it wasn’t scientifically meaningful. They said that transforming my responding variable makes interpretation difficult and should be avoided. They also said it would be much more meaningful to fit an untransformed quasipoisson type of model that violates assumptions than to do the transformations I did to meet assumptions. Is this true? Would a journal find that acceptable?

2. Since my responding variable is continuous, I can’t use the poisson family despite it being a good fit for my data…. And there doesn’t seem to be any way to use a quasipoisson family in the GEE, though I can use link=“log”… is it even possible to fit it to a quasipoisson type of model?

3. If I use the model that I fitted here.... when I report descriptive statistics in my manuscript, would it be more appropriate to discuss untransformed means and medians, or to back-transform? I am leaning towards untransformed since back-transforming from ln or log values does some weird things.

Any input would be extremely valuable!

Thanks!

I needed to test the effect of a combination of categorical and continuous independent variables on a continuous dependent variable, with a random factor included to control for pseudo replication.

My untransformed data for the responding variable looks like this:

Now, originally, I was going to run a GLMM (generalized linear mixed-model), but no matter what families, links, and transformations I tried, I couldn’t meet the assumption of homoscedasticity (variance always increased)… this was the closest I got:

It was suggested to me that I might have more luck with a GEE (generalized estimation equation) instead, so I went about doing that. First, I fit the data to a GLM (generalized linear model) by transforming the responding variable as such: I took the natural log of the responding variable, added a constant to bring the lowest value up to 1, and used a natural log transformation on the data once again (i.e. an ln-ln transformation). I used a gaussian family with an identity link (the best combination was actually a quasipoisson family with a log link, but there is no quasipoisson family usable for GEE tests in R from what I understand...so I stayed with the gaussian/identity). This rendered the data homoscedastic and normal, as shown by the GLM plot output:

After I fit the data to the GLM, I tested the ln-ln transformed data in a GEE, using, of course, the same family and link function as in the GLM. I used an exchangeable structure as it provided the lowest QIC, QICu and CIC, and the highest quasi likelihood.

Now, other than wondering if this process seems valid, I have two questions:

1. Someone I’ve been talking to said that although they thought my approach was statistically meaningful, it wasn’t scientifically meaningful. They said that transforming my responding variable makes interpretation difficult and should be avoided. They also said it would be much more meaningful to fit an untransformed quasipoisson type of model that violates assumptions than to do the transformations I did to meet assumptions. Is this true? Would a journal find that acceptable?

2. Since my responding variable is continuous, I can’t use the poisson family despite it being a good fit for my data…. And there doesn’t seem to be any way to use a quasipoisson family in the GEE, though I can use link=“log”… is it even possible to fit it to a quasipoisson type of model?

3. If I use the model that I fitted here.... when I report descriptive statistics in my manuscript, would it be more appropriate to discuss untransformed means and medians, or to back-transform? I am leaning towards untransformed since back-transforming from ln or log values does some weird things.

Any input would be extremely valuable!

Thanks!

Last edited: