Mixed ANOVA - Can We Standardize After Box-Cox Transformed Values

#1
Background:
In a mixed ANOVA (2 timepoints, 3 interventiongroups, so 2x3 factorial ANOVA), the homogeneity of the residual variance was violated as indicated by significant Levene test, for the post-intervention measurement timepoint (T2)
Unbenannt.png
As far as I understand it, this prohibits any further analysis for the output of the ANOVA. Thus, we did perform a box-cox-transformation on the data from measurement timepoint T2 (following this stackoverflow post).

Question / Problem:
As our original variable has a range from -10 to + 10 we added a Constant (minimum value in T2 + 0.0001) to Perform the Box-Cox transformation. This results in a strongly distorted plot of the effect. The line plot showing the differential effect with different lines for each group and measurement time on x-axis are now distorted. We are thus wondering, would it be possible / statistically allowed/valid, to just standardize (z-transformation) all values, e.g., pre-intervention time points (T1) as well as the already box-cox transformed post-intervention (T2) values in order to make the scaling for both measurement time points comparable? Or can / shold we just transform values at T1 in a similar fashion (adding constant, minimum + 0.0001, followed by Box-Cox transformation)?
 
Last edited:

Karabiner

TS Contributor
#2
As far as I understand it, this prohibits any further analysis for the output of the ANOVA.
If the groups have the same size, and/or the difference of variances is not very large,
then you can proceed. Making that decision on the basis of a test of significance
is not useful. Whether something becomes significant or not, depends on power,
i.e. on sample size in the frist place.
Thus, we did perform a box-cox-transformation on the data from measurement timepoint T2
First of all, if you transform your dependent variable, then you change your study, since
you do no more analyse your original variable, but something else instead. What does
your Box-Cox-transformed variable stand for, what does it represent, substantially?

Moreover, you cannot transform your dependent variable at only one time point. You
will then compare values of the original variable with values of the transformed
variable. That is not interpretable.

Besides, your source states that Box-Cox is not so much intended for inhomogenous
variances, but for non-normality of the resiudals (no issue here), and non-linearity.

With kind regards

Karabiner
 
Last edited:
#3
Thank you for taking the time and watching into this.
If the groups have the same size, and/or the difference of variances is not very large,
then you can proceed.
With equal you mean the mathematical equal e.g. n1 = n2 = n3? We have n1 = 29, n2 = 26, n3 = 30. So close but not equal.
Would you happen to know or have any reference on what would be considered "not very large"?
As it is new field it is hard to judge on the domain knowledge. The variance for the 3 groups in the dependent variable at measurement point 2 (T2) would be S1² = 8.2, S2² = 8.1, S3²=4.4 (on a side note, is there a way to insert latex or formula in posts on talkstats?)

First of all, if you transform you dependent variable, then you change your study, since
you do no more analyse your original variable, but something else instead. What does
your Box-Cox-transformed variable stand for, what does it represent, substantially?
Thanks for pointing this out! You are absolutely right with this. This is also the origin of the confusion regarding the new plots.
We are aware that it is allways difficult to intrepret results once you decide to transform a variable. We decided to try it, as it was a common suggestion in the literature to try a transformation (yes, for residual variance heterogeneity this was also suggested not only for normality).
We thought we could give it a try, as the scaling is not really important at this time. We are generally only interested if there is a meaningfull interaction between condition (group) and time point.

Moreover, you cannot transform your dependent variable at only one time point. You
will then compare values of the original variable with values of the transformed
variable. That is not interpretable.
Thank you for this advice. I guess based on your previous comments in my mind I am already searching for another alternative to a transformation, but just to clarify, would we do the exact same transformation to dependent variable values at T1? E.g., adding the same constant and doing the same power-transformation (with the identical lambda, despite might not being optimal)?

One last question, if the differences between variances (8,8,4, see above), would be considered to too large. And you would clearly and for good reasons advice against a transformation of the two variables, what would be our options to model the effect of the intervention on the outcome?
I just found also this quote:
„An unfortunate common practice is to pursue multiple comparisons only when the null hypothesis of homogeneity (typically based on the F-Test) is rejected […]“ (Hsu, 1996)
Meaning we could just dismiss the ANOVA and still conduct mutliple-group comparisons?

Kind regards,
imposter
 
Last edited:

Karabiner

TS Contributor
#4
We have n1 = 29, n2 = 26, n3 = 30. So close but not equal.
There is a rule of thumb saying that a ratio of 1.5:1 between largest and smallest is acceptable.
Stevens, J. (1999). Intermediate Statistics. A Modern Approach. London: Erlbaum. pp 75-76.

Thank you for this advice. I guess based on your previous comments in my mind I am already searching for another alternative to a transformation, but just to clarify, would we do the exact same transformation to dependent variable values at T1?
Don't you think that this pursuit of transformation on purely mathematical grounds is completely
detached from the substantal questions for which the study is done?

I do not know the nature of your dependent variable. For example, if it's something with a natural zero
it could perhaps be justified to make a log transformation, as is sometimes the case with monetary
variables (income, wealth...) or time-related variables (speed etc.), or growth.
if the differences between variance (8,8,4, see above), would be considered to too large.
Then I'd look for whether robust standard errors can be used also in a mixed ANOVA.

With kind regards

Karabiner
 
Last edited:
#5
Thank you so much, that was really helpful.

Then I'd look for whether robust standard errors can be used also in a mixed ANOVA.
Actually this is a great suggestion, and exactly what is recommended here either to do a transformation
or use the WRS2 package for a robust ANOVA (https://www.datanovia.com/en/lessons/mixed-anova-in-r/#check-assumptions)

Don't you think that this pursuit of transformation on purely mathematical grounds is completely
detached from the substantal questions for which the study is done?
As I totally agree with your line of argumentation, I potentially will look into the robust mixed ANOVA with WRS2 Package.

Our dependent variable is somewhat strange. It is a sum score build on a questionnaire measuring an attitude.
The authors of the original questionnaire suggested to first standardize (z-transform) all individual items and then add them up.
This is why we have naturally negative value in the dependent variable, and a mean of somewhat around zero. (Zero for all the individual z-transformed items, but of course not for the sum of all items). So in response to your question, I guess there is no such thing as a natural null point (or absolute scale).

As I am intrigued and highly motivated in understanding the statistics behind and reasoning behind the analyses decisions, I am still wondering
why it is, that:
have the same size, and/or the difference of variances is not very large,
then you can proceed
I understand your point that inferential hypothesis tests like the Levene's test always
depend on sample size (thus power). Is that the main point? Is there any reference one could cite on that?

Is this exemplary sentence for the results section correct / acceptable?
Homogeneity of the residual variances at the post measurement time point was rejected, Levene's test F(2,142) = 2.47, p = 0.047.
However, due to roughly equal group sizes (Stevens, 1999) and only moderate differences in the residual variances, we
continued with the analyses.
(Of course with the addition of a sentence or paragraph in the discussion).

Additionally I am looking for a way to classify or judge the differences in the variances in the Levene's test.
E.g., an effect size for the differences in the variances.
I found this https://stats.stackexchange.com/que...for-difference-between-variances-levenes-test
hinting to the variability ratio (Nakagawa et al., 2015), could it be a good addition to the sentence above, to provide an effect size
for the Levene's test, in order to quantify / justify the decision of a "not to large" difference in the residual variances?

Nakagawa, S., Poulin, R., Mengersen, K., Reinhold, K., Engqvist, L., Lagisz, M., & Senior, A. M. (2015). Meta‐analysis of variation: ecological and evolutionary applications and beyond. Methods in Ecology and Evolution, 6(2), 143-152.

PS: I also ordered the 3. ed. (2013) of the Stevens book from our university library, and am still waiting but from the preview one can obtain over at google books, it seems the referenced rule of thumb (1:1.5) have been moved to another chapter (or page), or removed https://books.google.de/books?id=cv...aXv3Wn8Z&lr&hl=de&pg=PA74#v=onepage&q&f=false

As an Addition to future readers:
I will be inspecting Harltey's Fmax (variance ratio), as explained in Andy Field's (2009) book "As with the K–S test (and other tests of normality), when the sample size is large, small differences in group variances can produce a Levene’s test that is significant (because, as we saw in Chapter 1, the power of the test is improved). A useful double check, therefore, is to look at Hartley’s FMax, also known as the variance ratio (Pearson & Hartley, 1954). (...) 10 is more or less always going to be non-significant, with 15–20 per group the ratio needs to be less than about 5, and with samples of 30–60 the ratio should be below about 2 or 3" (Field, 2009; Chapter 5.6 p. 150)

Interpretation of Hartley's Fmax in our case:
In our case would be Fmax = 1.86, so smaller than the critical values (obtained from a table of 2.78), thus, we can proceed with the ANOVA?

Field, A. (2009). Discovering statistics using SPSS. Sage PublicationsSage CA: Thousand Oaks, CA.

All the best
Imposter
 
Last edited:

Karabiner

TS Contributor
#6
PS: I also ordered the 3. ed. (2013) of the Stevens book from our university library, and am still waiting but from the preview one can obtain over at google books, it seems the referenced rule of thumb (1:1.5) have been moved to another chapter (or page), or removed
A number of references for that topic can be found here.

With kind regards

Karabiner