Tukey-like pairwise comparisons of variances

#1
Hi all,

I’m testing a hypothesis that requires Tukey-like comparisons of the variances (or standard deviations) of multiple groups. I’ve already used Tukey to conduct pairwise comparisons of the means for these groups, but I’m looking for a similar test that would compare their internal variances. To be clear, I’m looking for a test that would compare these variances while producing a p-value for each comparison separately (much like Tukey). Does such a test exist?

Unfortunately, it appears an ANOVA or a Levene’s test won’t suffice, since they produce p-values for the entire dataset rather than for separately-compared pairs. I’ve considered using F-tests to compare each pair of variances individually, but I’ve been told this approach is problematic because each test covers only part of the sample. I’ve also considered using Bartlett’s test, but because it’s generally used to measure homoscedasticity, it produces p-values that are highly sensitive to departures from normality (and therefore unreliable as an indicator of significant group differences).

To summarize, I’m looking for a multiple-comparisons test of group variances (or standard deviations) rather than of group means. I found a useful article on Tukey-like variance comparisons (see here), but it hasn’t yet led me to a method that could help

If it helps to answer my question, I’d be happy to elaborate on the focus of my study. I generally use STATA to analyze my data, but I’m open to using other applications.

Can anyone help?

Thanks!

Zach
 

obh

Active Member
#3
Hi Zach,

I think the following Levene's test also runs Tukey HSD on the variance. (not the means)

Actually the Levene's doesn't really compare the variances as variance defined but measured similar measurement, so you may use it to compare the variances, and this makes it easy also to use the Tukey HSD test for this case

Please try the following, Is this what you ask for?

http://www.statskingdom.com/230var_levenes.html

I assume you can do the same in with R
 
#5
Hi Zach,

I think the following Levene's test also runs Tukey HSD on the variance. (not the means)

Actually the Levene's doesn't really compare the variances as variance defined but measured similar measurement, so you may use it to compare the variances, and this makes it easy also to use the Tukey HSD test for this case

Please try the following, Is this what you ask for?

http://www.statskingdom.com/230var_levenes.html

I assume you can do the same in with R
Thanks!! I'll give this a try and follow up in the thread. Can you clarify what you mean when you say that the test "doesn't compare the variance defined but measured similar measurement"? I assume you're referring to this explanation from the website: "Target: To check if the difference between the variances of two or more groups is significant, using a sample data. The Levene's tests perform an ANOVA test over the absolute deviations from each group's average or the absolute deviations from each group's median." Is that correct?
 

obh

Active Member
#6
Yes, using the median instead of the mean count as more robust than using the average.
Actually, when using the median it is called the Brown–Forsythe test, but except replacing mean by median the calculation is the same.
You may choose the center, but the default is the median.
 
#7
Yes, using the median instead of the mean count as more robust than using the average.
Actually, when using the median it is called the Brown–Forsythe test, but except replacing mean by median the calculation is the same.
You may choose the center, but the default is the median.
Great, thanks so much for clarifying. When I perform separate Tukey tests for the group means vs. medians, they indeed produce very different p-values across pairs. To be clear, my study has two hypotheses: The first focuses on differences among group means, while the second ― which is the focus of this thread ― focuses on differences among group variances/standard deviations. If I understand correctly, your advice implies that I should: A) Use the p-values from my first Tukey test (whose center is the mean), to report the significance of differences in group means (i.e. to test H1), AND B) Use the p-values from the second Tukey test (whose center is the median), to report the significance of differences in group variances (i.e. to test H2). Is that correct, or are you suggesting that the median be used as the center when testing both hypotheses? If you support the former approach, could you explain why selecting the median as the center is the preferable way of comparing variances?

It's worth noting that the p-values associated with the median-based test are much higher than for the mean-based test, which indicates that most of the group differences are statistically insignificant.

Thanks again ― this has been very helpful.
 
#8
Sorry -- It might be more correct to describe the first test as Tukey, and the second test as a Levene's test with the median selected as the center. The difference is unclear to me, because the calculator for Levene's test produces pairwise comparisons labeled "Tukey HSD / Tukey Kramer"...

Regardless, my above question is the same: When testing my two hypotheses, should I perform the same test for H1 and H2 (median-based only), or separate tests (mean-based for H1; median-based for H2)?
 

katxt

Active Member
#9
Here is an idea which might work. Tukey's HSD uses the group means and the SEs of those means and goes through a particular process. I can't think of any reason why exactly the same process wouldn't work with the group variances and the SEs of those variances. (After all, the process doesn't know where the numbers came from.) The only problem is finding the SE of each sample variance. One method is an approximate formula SE var = var.sqrt(2/(n-1)) which is quite good with normal samples of a dozen or more. Another would be to bootstrap each sample to find its SE.
 

obh

Active Member
#10
Sorry -- It might be more correct to describe the first test as Tukey, and the second test as a Levene's test with the median selected as the center. The difference is unclear to me, because the calculator for Levene's test produces pairwise comparisons labeled "Tukey HSD / Tukey Kramer"...

Regardless, my above question is the same: When testing my two hypotheses, should I perform the same test for H1 and H2 (median-based only), or separate tests (mean-based for H1; median-based for H2)?
Hi Zach, Katxt

When you compare the means you do the following:
1. Run One way ANOVA over the data
2. Run Tukey HSD

To compare the variances you do the following:
1. Subtract the group's center from each value in each group (center = median or average)
now you get processed data.
2. Run One way ANOVA over the processed data
3. Run Tukey HSD over the processed data.

When comparing the variances, step 1. + step 2 together are the Levene's test when using the average as a center or Brown-Forsythe when using the median as a center.
step 3 is ...I don't know, "Tukey HSD over the differences from the center" ?

Names ...
Tukey HSD assumes equal sample size for each group
Tukey Kramer doesn't assume an equal sample size and with equal sample size, it will give the same results as Tukey HSD.
When R runs the Tukey HSD test it is actually the Tukey Kramer test.
 

obh

Active Member
#11
If I understand correctly, your advice implies that I should: A) Use the p-values from my first Tukey test (whose center is the mean), to report the significance of differences in group means (i.e. to test H1), AND B) Use the p-values from the second Tukey test (whose center is the median), to report the significance of differences in group variances (i.e. to test H2). Is that correct, or are you suggesting that the median be used as the center when testing both hypotheses? If you support the former approach, could you explain why selecting the median as the center is the preferable way of comparing variances?
.
If I understand you... you are not correct.

1. To compare the means you need to run the One way ANOVA test and then the "regular" Tukey HSD
http://www.statskingdom.com/180Anova1way.html
It will also give you the R code, so you may run on R if you don't trust an online calculator.

2. To compare the variances you need to run the Levenes test, I believe the Tukey HSD test in the Levene's calculator runs over the differences, say compared only the variances.
You may get there also the R code for the Levenes test, but not for the Tukey HSD over the differences.
But calculating the differences is very easy so you may also try to run on R.

3. When comparing the variances should you use median or average?
I assume using the median is more robust for non-normal data, but probably less powerful, that's probably why the p-values were higher.
If the data is reasonably normal your the sample size is more than 30 you should probably use the mean as the center.
PS did you calculate the sample size before?
 

katxt

Active Member
#12
As I understand the original question, we're not interested in showing that the variances are (more or less) equal which is the usual test before an anova. That's the sort of thing that Levene's test does.
What Zach is looking for (I think) is which particular groups have variances that are provably different from other particular groups.
One basic way would be to test the groups in pairs for equal variance and get a p value for each comparison. Then use Bonferroni to decide which p values indicate significant differences between particular groups. This would work, but because Bonferroni is conservative, Zach is (I think) looking for a Tukey type test for the multiple comparisons. kat
 

obh

Active Member
#13
Hi Katxt,

Lovely to chat with you :)

I understand the same as you.
I suggested running Tukey HSD on the differences frim the average of each group.

Cheers
 

katxt

Active Member
#16
So effectively you are using a t test on the deviations in place of an F test on the variances to compare the variances on the grounds that the average deviance will be higher if the variance is higher.
 

obh

Active Member
#17
Yes, kind of, the Tukey HSD actually use the studentized range distribution

Both F and T derives from the normal distribution.
One way ANOVA - F right tail with 2 groups, equivalent to Two-sample t-test (pooled variance )

So if One way ANOVA on the differences - F right tail with 2 groups (Levene's test !) is good to compare variances ... I assume t-test on the differences is good as well. (Actually, I checked and it exactly the same result)

The only question is why there is a big difference between the result of Levene's test over two groups (right-tailed, center=mean) and F test for two variances (two-tailed), I would expect it to have a similar p-value
 
Last edited:

katxt

Active Member
#18
Yes, that's strange all right. I would have thought that for two groups they would have been equivalent.
The differences aren't normal. Does that matter?
 

obh

Active Member
#19
Both tests, Levene's test and F test for two variances assumes normal population (of course when n>30 CLT works)
Generally, with ANOVA test the normality assumption is not for each group but for the residuals.

If each group distributes normally, then Normal - constant (average) also distribute normally
 
#20
If I understand you... you are not correct.

1. To compare the means you need to run the One way ANOVA test and then the "regular" Tukey HSD
http://www.statskingdom.com/180Anova1way.html
It will also give you the R code, so you may run on R if you don't trust an online calculator.

2. To compare the variances you need to run the Levenes test, I believe the Tukey HSD test in the Levene's calculator runs over the differences, say compared only the variances.
You may get there also the R code for the Levenes test, but not for the Tukey HSD over the differences.
But calculating the differences is very easy so you may also try to run on R.

3. When comparing the variances should you use median or average?
I assume using the median is more robust for non-normal data, but probably less powerful, that's probably why the p-values were higher.
If the data is reasonably normal your the sample size is more than 30 you should probably use the mean as the center.
PS did you calculate the sample size before?
OBH and Katxt -- Thanks so much for these extremely informative and helpful responses. I'm replying to OBH's above message because I understand it to be a summary of the approach I should use to test both hypotheses. The process for testing the first hypothesis (comparison of the means) is now clear to me. From what I understand, the tests for H1 should be conducted separately from the Levene's test using the ANOVA/Tukey operations described above. This all seems very straightforward, but please let me know if I'm mistaken.

Regarding the tests for H2 (comparison of the variances), I'll summarize your advice here, and you can let me know if this is correct:

First, I'll address a point that we haven't yet discussed explicitly: I assume the standard deviations calculated as part the Levene's test (or Brown-Forsythe if using the median) can be reported as measures of internal variance for each group. I'm asking this because the Tukey output for the Levene's calculator appears to report differences between group means, not between group variances. When I hover the cursor above the "differences" column in the Tukey table, it tells me that "differences" actually refers to group means. In the event that the differences in variance aren't automatically calculated in Levene's, OBH notes that "calculating the differences is very easy so you may also try to run on R," but I'm not sure how exactly this should be done. Can you clarify?

Second, when comparing the group variances (pairwise), the p-values in the Levene's test Tukey table should indicate whether the differences in variance are statistically significant. If my data are reasonably normal and the sample size is above 30 (as it is), I should select the mean as the center. In this case, the test I'm conducting is Levene's. Otherwise, if the groups have non-normal distributions, I should select the median. If I select the median, the test is called Brown-Forsythe, even though I'll be using the Levene's calculator to get the results. Notably, Katxt orignially wrote: "What Zach is looking for (I think) is which particular groups have variances that are probably different from other particular groups." This is absolutely correct, and it led Katxt to recommended different tests, but from what I understand, Katxt now agrees with the above advice from OBH.

Is all of this correct? Are OBH and Katxt in agreement? Also, if possible, could you clarify how the differences in variance might be calculated? Again, THANK YOU for this extraordinary advice!

Zach