Sample size and width of bootstrapped confidence intervals

okapi12

New Member
Hello all,

I am looking for some help on confidence intervals calculated using the residual bootstrapping technique.

For confidence intervals calculated according to multipling the t stastistic by the standard error of the mean, there is a clear link between the width of the interval and sample size (quadrupling sample size halves width). I am wanting to know whether any such link exists between sample size and width when the confidence interval is instead calculated by residual bootstrapping (note I am not talking about the number of resamples during bootstrapping, but the number of samples in the original dataset). Basically, I want to estimate how the width of a confidence interval would reduce according to changes in sample size. This is for a continuous, rather than categorial, dataset.

hlsmith

Less is more. Stay pure. Stay poor.
I have no documentation to base this on, but the interval range for BS will likely narrow early on, but converge pretty quickly to a near constant range value. A quick simulation would likely support this. The SE used in CI calculation will formulaically narrow it, given n-value is in the denominator, while the BSCI is based on the point estimate calculation which doesn't have the n-value in the denominator, just narrows as the sample size approaches the super population.

I haven't heard the bootstrap called the residual bootstrap. You are just talking about sampling with replacement with sample rate equal to the underlying sample size, correct?

Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
I did a quick simulation in SAS, see below code. It was for the mean of a random normal variable (35, 10). I then adjusted the sample sizes and bootstrapped.     In the same respective order as above:

CI95_Lower CI95_Upper 30.2242, 49.4568

CI95_Lower CI95_Upper 33.0238, 38.2887

CI95_Lower CI95_Upper 33.7026, 35.4450

CI95_Lower CI95_Upper 34.9273, 35.4866

CI95_Lower CI95_Upper 34.8493, 35.0256

Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
I just switched out the n1 = value then reran code. I lifted most of this code from Rick Wicklin's Do Loop blog to save time.

Code:
%let n1 = 5;
data sample (drop=i);
do i = 1 to &n1;
x = rand("Normal", 35, 10);
output;
end;
run;
%let NumSamples = 5000;       /* number of bootstrap resamples */
/* 2. Generate many bootstrap samples */
proc surveyselect data=sample NOPRINT seed=1
out=BootSSFreq(rename=(Replicate=SampleID))
method=urs              /* resample with replacement */
samprate=1              /* each bootstrap sample has N observations */
/* OUTHITS                 option to suppress the frequency var */
reps=&NumSamples;       /* generate NumSamples bootstrap resamples */
run;
proc means data=BootSSFreq noprint;
by SampleID;
freq NumberHits;
var x;
output out=OutStats mean=mean;  /* approx sampling distribution */
run;
title "sample size &n1";
proc sgplot data=OutStats;
histogram mean;
run;
proc univariate data=OutStats noprint;
var mean;
output out=Pctl pctlpre =CI95_
pctlpts =2.5  97.5       /* compute 95% bootstrap confidence interval */
pctlname=Lower Upper;
run;
proc print data=Pctl noobs; run;