# Thread: paired samples-equality of variance and 95% CI around difference

1. ## paired samples-equality of variance and 95% CI around difference

Hi,

Can a few of you please review the approach I plan to take for obvious errors?

I have 50 subjects and each have a measure taken on the same variable before and after treatment. So, this is standard paired t-test time, but what I am actually interested in is the variance of the treatment versus the control. I would like to test the equality of variance for these two groups of values (treatment and control) and also place a 95% confidence interval around the difference of these two variances. I would prefer randomization/resampling methods to be used for each as normality assumptions do not hold and I would like a robust result. I have not found any routines specifically like what I would want, so I think I may have to do the following in R. Any advice on an easier or better approach is welcome.

I know that equality of variance for paired data can be tested using the pitman-morgan statistic. I was planning on calculating this for the original data and then randomly switching the values within pairs the the pre-treatment and post-treatment measures in order to achieve randomization that respects the paired nature of the data. I could then extract p-values based upon the percent of randomizations with more extreme pitman-morgan statistic.

For the 95% CI interval around the differences, I thought I would resample pairs of values with replacement. So, I would select among the 50 subjects 50 times with replacement. I would then calculate the variance for the pre-treatment measures and for the post-treatment measures and I would then take the difference and store this value. I would do this many times and then determine the 95% confidence interval by ordering my resamples and simply taking the 2.5% and 97.5% percentiles.

Does this make sense at all?

Thanks,
Seth

2. ## Re: paired samples-equality of variance and 95% CI around difference

The third paragraph looks nice to me. But bear in mind that since they are paired data, calculate the differences and construct a 95 ci for the mean difference.
Now if you want to see whether the two groups are diffrent I suggest you do something like repeated measures anova. it is let's say a paired t-test that will also see if the two groups are significantly different or not. There is definetely a routine in R for this, but I haven't used it. I use SPSS for this usually.
What you sauid in the second paragraph. Why do you need a pitman-morgan test? for two dependent samples?

3. ## Re: paired samples-equality of variance and 95% CI around difference

Thanks for the reply. I was wanting a confidence interval around the differences in variances between control/treatment. That's why I mentioned resampling and taking the variance of control and treatment from each resample and then taking the difference in these two variances for each iteration. What you wrote was encouraging but left me a bit confused, so maybe I didn't express myself well. It would be like this for one resample (using 3 subjects for ease of illustration).

subject control treatment
1 10 15
2 15 20
3 7 9

A sample with replacement might be subject 1, subject 2, and subject 2 (again).

The variance for the control would be the variance of (10, 15, 15) = X and that of the treatment would be variance of (15,20, 20) = Y. I would then subtract X from Y (Y-X) and store this value. If I did this 1,000 times, let's say I would have a mean difference in variances as well as 95% CI using percentile method or another. Make sense?

As far as the Pitman-Morgan statistics, it is a test of equality of variance for paired data. Other's like Levene's and Bartlett's test are tests of equality for independent data (no pairs). There is a paper with a permutation test using the Pitman-Morgan stat, but it attempts to test the joint null hypothesis that there is no difference in mean or variance between the paired samples, so I would want to eliminate the difference in mean part as that is not of interest in my case.

Thanks again,

Seth

4. ## Re: paired samples-equality of variance and 95% CI around difference

Ok is see, yes seems ok for the bootstrap. But I strogly suggest not the difference in the variances, but the ratio.

5. ## Re: paired samples-equality of variance and 95% CI around difference

Originally Posted by Masteras
Ok is see, yes seems ok for the bootstrap. But I strogly suggest not the difference in the variances, but the ratio.
I second this suggestion.

7. ## Re: paired samples-equality of variance and 95% CI around difference

because it hoilds more inofrmation. The F-test uses the same test statistic, (not because it is similar to an F-test, this statement is not quite correct).

8. ## Re: paired samples-equality of variance and 95% CI around difference

I have to confess that I can't really see how the ratio of variances is in general a better choice than the difference of variances for this application. I was curious exactly how one would implement this type of bootstrapping in R anyway, so I worked it out on some simulated data and compared the results of var(A)-var(B) to var(A)/var(B). Code and results are below.

I'm going to cross-post a version of this code in the code sticky from the R/Splus forum shortly, along with some geeky details about the implementation and a discussion of how to get it to play even nicer with the boot() function, if anyone is interested in that. But you should be able to adapt this code with minimal modification to work on your data as is, seth.

Code:
``````> library(data.table)
> library(boot)
> set.seed(12345) # I've got the same combination on my luggage!
>
> ### make some paired data with unequal variances
> dat <- data.table(subject=rep(1:50,2),
+                   prepost=rep(c(-1,1),each=50),
+                   subint=rep(rnorm(50,mean=0,sd=5),2),
+                   subslope=rep(rnorm(50,mean=5,sd=3),2),
+                   error=c(rnorm(50,mean=0,sd=5),rnorm(50,mean=0,sd=10)),
+                   key="subject,prepost")
> dat\$dv <- round(55 + dat\$subint + dat\$subslope*dat\$prepost + dat\$error,2)
> dat <- data.table(subject=1:50,
+                   pre=dat[prepost==-1]\$dv,
+                   post=dat[prepost==1]\$dv,
+                   key="subject")
>
> ### examine
subject   pre  post
[1,]       1 55.67 45.11
[2,]       2 41.92 74.87
[3,]       3 51.40 61.57
[4,]       4 40.05 50.72
[5,]       5 55.75 59.93
[6,]       6 37.40 49.23
> nrow(dat)
[1] 50
> dat[,list(mean_pre=mean(pre),mean_post=mean(post))]
mean_pre mean_post
[1,]  49.8998   62.8648
> dat[,list(var_pre=var(pre),var_post=var(post))]
var_pre var_post
[1,] 79.73543 142.5389
> cor(dat\$pre,dat\$post)
[1] 0.2546421
>
> ### bootstrap!
> getvarstats <- function(data, seeds) {
+   index <- max.col(matrix(c(c(seeds[2:length(seeds)],seeds[1]),seeds),ncol=2))-1
+   index[length(seeds)] <- !index[length(seeds)]
+   index <- c(1:length(seeds)+index*length(seeds),
+              1:length(seeds)+index*length(seeds)+length(seeds))
+   values <- c(data\$pre,data\$post,data\$pre)
+   d <- data.table(pre=values[index[1:length(seeds)]],
+                   post=values[index[seq(length(seeds)+1,2*length(seeds))]])
+   return(c(vardiff = var(d\$post) - var(d\$pre),
+            varratio = var(d\$post)/var(d\$pre),
+            postvar = var(d\$post),
+            prevar = var(d\$pre)))
+ }
> resamples <- 1000000
> system.time({results <- boot(data=dat, statistic=getvarstats, R=resamples)})
user   system  elapsed
983.075    8.630 1048.131
> hist(results\$t[,1],breaks=100)
> p_diff <- mean(results\$t[,1] > var(dat\$post)-var(dat\$pre)
+                | results\$t[,1] < var(dat\$pre)-var(dat\$post))
> p_diff
[1] 0.085984
> hist(results\$t[,2],breaks=100)
> p_ratio <- mean(results\$t[,2] > var(dat\$post)/var(dat\$pre)
+                 | results\$t[,2] < var(dat\$pre)/var(dat\$post))
> p_ratio
[1] 0.015534``````
The bootstrapped p-value for the variance difference is .086, while the bootstrapped p-value for the variance ratio is .016, despite that they used the exact same resamples. Since I simulated the data such that the variance for post "truly is" greater than the variance for pre, we might say that if alpha=.05 then the result from the variance difference is a type II error while the result from the variance ratio is not. Testing relative power and type 1 error rates for these using many simulated data sets would be interesting and wouldn't involve much more work at all if anyone is interested.

But again, I confess that I don't have an intuitive grasp on why these two statistics differ practically in terms of the results they get. My prediction was that the two results from above would be identical. I would appreciate some insight on this. Even the apparently obvious fact that the variance ratio "holds more information" than the variance difference is not at all obvious to me...

9. ## Re: paired samples-equality of variance and 95% CI around difference

Opa wait, you said you wanted to calculate confidence intervals, not perform hypothesis testing. Furthermore, the absolute difference of the variances could be something, not just the differences. Is the differecne a pivotal statistic? no, the ratio is. But anyway, did you do resampling under the null hypothesis? I think not.

10. ## Re: paired samples-equality of variance and 95% CI around difference

Yes, it resamples the variance difference and the variance ratio under the null hypothesis that they are 0 and 1, respectively.
Code:
``````> quantile(results\$t[,1],probs=c(.025,.5,.975))
2.5%          50%        97.5%
-71.14886036  -0.00124751  71.13353136
> quantile(results\$t[,2],probs=c(.025,.5,.975))
2.5%       50%     97.5%
0.6216802 0.9999918 1.6083679``````
Histogram of variance difference resamples:

Histogram of variance ratio resamples:

11. ## Re: paired samples-equality of variance and 95% CI around difference

Ok, let's say that you have a point. You said before

"The bootstrapped p-value for the variance difference is .086, while the bootstrapped p-value for the variance ratio is .016, despite that they used the exact same resamples. " the results agree, there is not difference in the variances based upon the confidence intervals of either way.

12. ## Re: paired samples-equality of variance and 95% CI around difference

How do you figure that they agree? The results based on the variance difference would lead us to believe that our observed results are over 5x more likely under the null hypothesis than the results based on the variance ratio.

13. ## Re: paired samples-equality of variance and 95% CI around difference

the confidence intervals you gave us and the plot of the bootstrapped data also.

14. ## Re: paired samples-equality of variance and 95% CI around difference

...also what?

15. ## Re: paired samples-equality of variance and 95% CI around difference

also the plots of the bootstrapped data.