# Thread: paired samples-equality of variance and 95% CI around difference

1. ## Re: paired samples-equality of variance and 95% CI around difference

I'm sorry, but I really don't know what you're saying...

855 of the 10000 resampled variance differences are either greater than the actual variance difference or less than the actual variance difference*-1.
153 of the 10000 resampled variance ratios are either greater than the actual variance ratio or less than the actual variance ratio^-1.
In what sense does this indicate "agreement" between the variance difference and the variance ratio?

2. ## Re: paired samples-equality of variance and 95% CI around difference

I wasn't sure on that either. But I do think the ratio is the better measure. Pretty much anytime you're dealing with scale you want to deal with ratios.

3. ## Re: paired samples-equality of variance and 95% CI around difference

Any specific thoughts on why it is the better measure here? The answers given by the two clearly differ somewhat. But it's not at all clear which answer I should "trust" more.

I mean, let's say we were interested in assessing through bootstrapping the relative means (a more typical example) rather than the relative variances. Rather than computing the mean difference at each resample, we could in principle compute the mean ratio and get a distribution of mean ratios centered at 1 just like we did for the variance ratio above. How would this be an obviously better or worse approach than resampling the mean difference? Isn't one of the points of bootstrapping that you don't have to make assumptions about the distribution of the statistic you are interested in?

4. ## Re: paired samples-equality of variance and 95% CI around difference

Originally Posted by Jake
I have to confess that I can't really see how the ratio of variances is in general a better choice than the difference of variances for this application. I was curious exactly how one would implement this type of bootstrapping in R anyway, so I worked it out on some simulated data and compared the results of var(A)-var(B) to var(A)/var(B). Code and results are below.

I'm going to cross-post a version of this code in the code sticky from the R/Splus forum shortly, along with some geeky details about the implementation and a discussion of how to get it to play even nicer with the boot() function, if anyone is interested in that. But you should be able to adapt this code with minimal modification to work on your data as is, seth.

Code:
``````> library(data.table)
> library(boot)
> set.seed(12345) # I've got the same combination on my luggage!
>
> ### make some paired data with unequal variances
> dat <- data.table(subject=rep(1:50,2),
+                   prepost=rep(c(-1,1),each=50),
+                   subint=rep(rnorm(50,mean=0,sd=5),2),
+                   subslope=rep(rnorm(50,mean=5,sd=3),2),
+                   error=c(rnorm(50,mean=0,sd=5),rnorm(50,mean=0,sd=10)),
+                   key="subject,prepost")
> dat\$dv <- round(55 + dat\$subint + dat\$subslope*dat\$prepost + dat\$error,2)
> dat <- data.table(subject=1:50,
+                   pre=dat[prepost==-1]\$dv,
+                   post=dat[prepost==1]\$dv,
+                   key="subject")
>
> ### examine
subject   pre  post
[1,]       1 55.67 45.11
[2,]       2 41.92 74.87
[3,]       3 51.40 61.57
[4,]       4 40.05 50.72
[5,]       5 55.75 59.93
[6,]       6 37.40 49.23
> nrow(dat)
[1] 50
> dat[,list(mean_pre=mean(pre),mean_post=mean(post))]
mean_pre mean_post
[1,]  49.8998   62.8648
> dat[,list(var_pre=var(pre),var_post=var(post))]
var_pre var_post
[1,] 79.73543 142.5389
> cor(dat\$pre,dat\$post)
[1] 0.2546421
>
> ### bootstrap!
> getvarstats <- function(data, seeds) {
+   index <- max.col(matrix(c(c(seeds[2:length(seeds)],seeds[1]),seeds),ncol=2))-1
+   index[length(seeds)] <- !index[length(seeds)]
+   index <- c(1:length(seeds)+index*length(seeds),
+              1:length(seeds)+index*length(seeds)+length(seeds))
+   values <- c(data\$pre,data\$post,data\$pre)
+   d <- data.table(pre=values[index[1:length(seeds)]],
+                   post=values[index[seq(length(seeds)+1,2*length(seeds))]])
+   return(c(vardiff = var(d\$post) - var(d\$pre),
+            varratio = var(d\$post)/var(d\$pre),
+            postvar = var(d\$post),
+            prevar = var(d\$pre)))
+ }
> resamples <- 1000000
> system.time({results <- boot(data=dat, statistic=getvarstats, R=resamples)})
user   system  elapsed
983.075    8.630 1048.131
> hist(results\$t[,1],breaks=100)
> p_diff <- mean(results\$t[,1] > var(dat\$post)-var(dat\$pre)
+                | results\$t[,1] < var(dat\$pre)-var(dat\$post))
> p_diff
[1] 0.085984
> hist(results\$t[,2],breaks=100)
> p_ratio <- mean(results\$t[,2] > var(dat\$post)/var(dat\$pre)
+                 | results\$t[,2] < var(dat\$pre)/var(dat\$post))
> p_ratio
[1] 0.015534``````
The bootstrapped p-value for the variance difference is .086, while the bootstrapped p-value for the variance ratio is .016, despite that they used the exact same resamples. Since I simulated the data such that the variance for post "truly is" greater than the variance for pre, we might say that if alpha=.05 then the result from the variance difference is a type II error while the result from the variance ratio is not. Testing relative power and type 1 error rates for these using many simulated data sets would be interesting and wouldn't involve much more work at all if anyone is interested.

But again, I confess that I don't have an intuitive grasp on why these two statistics differ practically in terms of the results they get. My prediction was that the two results from above would be identical. I would appreciate some insight on this. Even the apparently obvious fact that the variance ratio "holds more information" than the variance difference is not at all obvious to me...
Were you constructing the bootstrap confidence intervals using the percentile-based method? If so, I'm initially surprised at your results. It seems that the subtraction of one number from another is monotonically related to the ratio (1/2>1/4 as (1-2)>(1-4). Therefore, it seems that, if the same bootstraps are used via the same seed, that the p-values would be the same because the ranks should be the same. One possibility is a rounding error being more common in one scenario. I may be way off here however.

Thanks everyone for the discussion so far. I'll will read more carefully what has come before and post again soon.

5. ## Re: paired samples-equality of variance and 95% CI around difference

Yes, the p-values and confidence intervals are based on quantiles of the actual resampled statistics. I was surprised too.

6. ## Re: paired samples-equality of variance and 95% CI around difference

the confidence interval for the difference includes 0, not rejct. The conf incterval for the ratio includes unity, not reject. Second, the calculation of the p-value is wrong.

7. ## Re: paired samples-equality of variance and 95% CI around difference

Masteras, the confidence intervals are around the null hypothesis. So of course they contain 0 and 1. In fact, you may notice that they are centered at 0 and 1. We reject if the actual point estimate is within the confidence interval around the null. For the variance difference, it is within that interval. For the variance ratio, it is not.

The p-values are correct. They are two-tailed p-values, so they are calculated as the proportion of bootstrap-resampled statistics that are either greater than the actual point estimate OR less than the point estimate IF the order of the groups had been reversed. Do you think these should be calculated some other way?

8. ## Re: paired samples-equality of variance and 95% CI around difference

quantile(results\$t[,2],probs=c(.025,.5,.975))
2.5% 50% 97.5%
0.6216802 0.9999918 1.6083679 , I see unity is icluded in the ratio.

"We reject if the actual point estimate is within the confidence interval around the null." reject?

The p-value for the bootstrap test is (#Tboot>Tobs+1)(B+1), where B is the number of bootrstrap samples.

9. ## Re: paired samples-equality of variance and 95% CI around difference

Looking at the two histograms, the ratio histogram is truncated at zero and so the way that numbers are handled in terms of rounding would be more likely to make a difference. It would be interesting to run this again but make the ratio double precision and use a smaller number of replicates, so that they could be examined in a spreadsheet for any "weirdness". Also, taking the log of the ratio may accomplish the same thing. As it stands though, the ratio and differences should have the same ranks, so I wonder if we are viewing an artifact ?!?

I have R on my other machine, I'll try what I suggested later this evening.

10. ## Re: paired samples-equality of variance and 95% CI around difference

Seems I was off my rocker on the last few posts. The ranks will vary between differences and ranks as shown by a simple example (6-4=4-2 while 6/4 < 4/2). For now, I've gone with differences. I'll change if it gets "sent back". Thanks again. Will post anything I learn that is informative to this topic.