# Bootstrap bca confidence intervals for large number of statistics (boot)

#### CB

##### Super Moderator
(This isn't really a question thread - I'm just providing some possibly useful code, and asking for suggestions on how it can be improved).

The boot.ci function in the boot package allows users to get confidence intervals for particular statistics, with the limits obtained via bootstrapping instead of traditional parametric methods. However, boot.ci doesn't readily allow one to bootstrap confidence intervals for multiple statistics at once (e.g. all the factor loadings from an exploratory factor analysis).

So I've written a function that uses the boot.ci function to produce confidence intervals for multiple statistics, with output given in a convenient matrix. Jacob Wegelin has previously suggested an alternative way to do this on the R help list. But I'm a bit of an R newbie, didn't understand his code, and thought it might be interesting to see if I could figure out how to do this from scratch.

Code for the function is below; I've called it bcafunction.

Code:
bcafunction <- function(boot.out, conf = 0.95){
matrixcis <- cbind(seq(1:ncol(boot.out$t)), sapply(seq(1:ncol(boot.out$t)), function(index){
bootci <- boot.ci(boot.out, type = "bca", index = index, conf = conf)
bootci$bca[1, 4] }), sapply(seq(1:ncol(boot.out$t)), function(index){
bootci <- boot.ci(boot.out, type = "bca", index = index, conf = conf)
bootci$bca[1, 5] })) colnames(matrixcis) <- c("index", "LL", "UL") matrixcis } Usage (once above code is pasted into the R window): bcafunction(boot.out, conf = 0.95) Arguments boot.out An object of class "boot" containing the output of a bootstrap calculation for the statistics of interest. conf The required confidence level. The default is 0.95 (a 95% confidence interval). I should probably comment the code above, but I'm having a bit of trouble doing so in a senseful way (the code makes sense in my head, but explaining it is harder). Let me know if it would be valuable to anyone for me to have another go at doing so. The function is only set up to work for bias corrected accelerated (bca) confidence intervals, which the boot package documentation refers to as "adjusted bootstrap percentile" intervals. Applying the function to work with other methods (e.g. first order normal approximation, studentized), is presumably feasible, but this paper suggests bca intervals are one of the best methods for bootstrapping confidence intervals. When used in its unadulterated form to produce confidence intervals one at a time, the boot.ci function sometimes produces warnings that a confidence interval may be unstable, sometimes accompanied by a statement that extreme quantiles were used in the calculation for a particular limit. The bcafunction I've suggested here doesn't produce these warnings, and I'm not sure how these could be implemented. Using a large number of replicates (e.g. R > 2000) when using boot object (boot.out) may help reduce the risk of these problems. The function can be quite time-consuming. If anyone has any suggestions for how the code could be improved or sped up, let me know! #### TheEcologist ##### Global Moderator Hi CowboyBear, Lets see if I can provide some code as well, it might help you or other people. The code works without any additional packages. First tell me a little bit more of your bootstrap approach. The code above looks far too complex for a bootstrap but then I don't really know what you are doing. [btw I never use the boot package as bootstrapping is relatively simple and I hate it when there is any 'blackbox' in my work - coded by someone else - that prevents me from knowing exactly what I am doing. Don't worry that is just me, a subjective little tick]. Could you provide a bit more details? I presume you have conducted a non-parametric bootstrap (sensu Efron & Tibshirani 1993) and the results of R 'resamplings' (not sure if that is a word ) are stored in boot.out? If you are using the 'bias-corrected and accelerated bootstrap" this means that you need to adjust for e.g. skewness in the bootstrap distribution. Is this the case? Otherwise just use the standard percentiles (or do you want to provide a code that does it all?) Anyway here is a code for bootstrapping without installing any packages. Some requirements to make the bootstrap work Code: # create some data & set random number seed & indicate the number of resamples (N) dat=runif(150); set.seed(90210); N=10000 The actual bootstrap, we're taking means, then calculating percentiles directly from the bootstrapped estimates, alpha = 0.05.. note that these are pretty robust CI as they do not make any assumptions about the bootstrap distribution. I prefer these above any that make use of the se/sd. Code: quantile(replicate(N,mean(dat[sample(1:length(dat),length(dat),replace=T)])),prob=c(0.025,0.975)) Last edited: #### Dason ##### Ambassador to the humans Code: bcafunction <- function(boot.out, conf = 0.95){ matrixcis <- cbind(seq(1:ncol(boot.out$t)), sapply(seq(1:ncol(boot.out$t)), function(index){ bootci <- boot.ci(boot.out, type = "bca", index = index, conf = conf) bootci$bca[1, 4]
}),
sapply(seq(1:ncol(boot.out$t)), function(index){ bootci <- boot.ci(boot.out, type = "bca", index = index, conf = conf) bootci$bca[1, 5]
}))
colnames(matrixcis) <- c("index", "LL", "UL")
matrixcis
}

I should probably comment the code above, but I'm having a bit of trouble doing so in a senseful way (the code makes sense in my head, but explaining it is harder). Let me know if it would be valuable to anyone for me to have another go at doing so.
I have some suggestions for how to make the code a little cleaner but that's not why I'm commenting. Good code shouldn't need many comments because it should be clear immediately what you're doing. However sometimes it's not easy to do that. In the case of the code that you've written it might not be apparent what you're doing in all cases. For example anytime you grab a specific value from a matrix/dataframe it's probably not clear to somebody just reading the code what it is you're grabbing. A comment saying something like
Code:
bootci <- boot.ci(boot.out, type = "bca", index = index, conf = conf)
# Grabbing the lower confidence limit
bootci$bca[1, 4] makes a lot of difference. The other thing I notice is that you define two anonymous functions that essential do the same thing but just return a different value from the results. You could create a single function and just add an extra parameter specifying which column you want returned. #### CB ##### Super Moderator Hi CowboyBear, Lets see if I can provide some code as well, it might help you or other people. The code works without any additional packages. Great! Could you provide a bit more details? I presume you have conducted a non-parametric bootstrap (sensu Efron & Tibshirani 1993) and the results of R 'resamplings' (not sure if that is a word ) are stored in boot.out? That's quite right. The boot package has a default option of an "ordinary" non-parametric bootstrap, which I believe would be the same variety as in the Efron & Tibshirani reference. In the example I've been working with, the boot.out object is the result of drawing 2000 resamples from an original dataset, and performing a factor analysis for each resample using the psych package. (The factor solution is rotated towards a common target solution for each resample). Btw: if "resamplings" isn't a word, I think it should be one! If you are using the 'bias-corrected and accelerated bootstrap" this means that you need to adjust for e.g. skewness in the bootstrap distribution. Is this the case? Otherwise just use the standard percentiles (or do you want to provide a code that does it all?) My interest in the bias-corrected accelerated method isn't so much about a particular concern about skewness or bias in the example I'm working with so much as just trying to be generally cautious. I.e., the BCa method is (as I understand it) trustworthy even in the presence of bias and skewness in the bootstrap distribution, whereas percentile confidence intervals or confidence intervals calculating using t-statistics are trustworthy in a more restricted range of situations. That said, I find technical articles about bootstrapping pretty hard going (my lack of mathematical stats background is a bit of a hindrance), so I might be mistaken or being overcautious. The BCa interval is described here, if it helps. Code: quantile(replicate(N,mean(dat[sample(1:length(dat),length(dat),replace=T)])),prob=c(0.025,0.975)) Phwoar. I must admit, that is very clean and elegant compared to my code! #### CB ##### Super Moderator I have some suggestions for how to make the code a little cleaner but that's not why I'm commenting. Good code shouldn't need many comments because it should be clear immediately what you're doing. However sometimes it's not easy to do that. In the case of the code that you've written it might not be apparent what you're doing in all cases. For example anytime you grab a specific value from a matrix/dataframe it's probably not clear to somebody just reading the code what it is you're grabbing. A comment saying something like Code: bootci <- boot.ci(boot.out, type = "bca", index = index, conf = conf) # Grabbing the lower confidence limit bootci$bca[1, 4]
makes a lot of difference. The other thing I notice is that you define two anonymous functions that essential do the same thing but just return a different value from the results. You could create a single function and just add an extra parameter specifying which column you want returned.
Thanks Dason! I'm going to have a go at commenting and editing the code now - good point about the two similar functions. (EDIT: perhaps not right now - gotta get ready to go watch Super 8!)