Why does bootstraping with 50% samplesize and no replacements always gives appropriate confidence intervals.

I compared different ways to calculate confidence intervals. On the one side I used the direct formular on the other side I used a percentile bootstrap methoden. (Calculate the statistic (e. g. mean) on n subsamples and choose the a/2 and 1-a/2 percentile for the confidence interval.)

I noticed that applying the bootstrap method with replacement and a sample size that equals the original sample leads to (almost) the same confidence interval than the one I calculated by formular. The interesting part is that when I apply the bootstrap method without replacement and a sample size that equals half of the size from the original data I also get (approxemately) the same confidence interval.

Has anyone an idea why both parametrisations of the bootstrap method leads to the same result. Or aren't the confidence intervals equal?

More details and thoughts:
For my research I took several input datasets (n=100,000) that are choosen from different distributions (standard normal, beta, student-t, ...). Furthermore I calculated several statistics (mean, quantiles, regression coefficients, ...). To all combinations of dataset and statistics I calculated the confidence intervals (number of subsamples = 1,000). For each combinations the three approaches (formular, bootstrap with replacement, bootstrap without replacement) leads to very similar confidence intervals.
When I choose a bootstap method without replacement and a sample size that quals the original dataset, then there is no variation between each subsample and the "confidence interval" contains only one point. I get the maximum variation when each subsample contains only one datapoint. So the truth must be somewhere in the middle, but I wonder why it is always at 50% subsample size.

Thanks for your ideas.

I tried different kind of distributions. log-normal, chi², alpha, beta, t, uniform and so on.
I didn't used smaller sample sizes because I think that this would lead to more variation in the results, so that they are less comparable.


Less is more. Stay pure. Stay poor.
Well there are reasons people use replacement and equal sample size. The justification is that you are creating another potential data realization from the super population and the distribution of their estimates serve as a proxy for the underlying population's value. If you are not using replacement and using a smaller sample - you are just creating a random sample of your sample and doing this repeatedly is just creating more smaller samples of your original sample - which I am guessing the original sample is just random draws of the defined distribution. This process does not take into account other realizations of the true underlying population.

How are you confirming confidence coverage of the true mean? Likely the reason you are not seeing issues is that 100K and 50K observations is a lot. If you were to repeat the process with a smaller original and subsequent samples I would imagine coverage would become questionable. Though I will note that in practice I am applied not theoretical.


Well-Known Member
I wouldn't dismiss the jackknife too quickly. The delete-d jackknife method has a factor ( n-d)/d which is just 1 in your case. The only real difference is that you are taking a random sample of possible deletions instead of listing them all.
@hlsmith: In practise I saw people use different parametrizations of bootstrapping (e. g. 80% sample size with replacement). In papers they use "same sample size with replacement" but I haven't seen an argument why this parametrization is used. I agree with you that "same sample size with replacement" leads to many samples that are comparable to chosing this samples from the population.
But while I was comparing different parametrizations lead to similar confidence intervals, which is not true for other parametrization.

The confidence interval is for some functions well known on a theoretical base (e. g. mean). So I compared the theoretical approach with the bootstrap approach. (See picture).
I only tried a few combinations of distribution and functions with small data sets but the thesis that 50% without replacement seems to be true aswell. The problem with small data sets is, that the CI varies more and depend more on the seed I use.

@katxt: jackknife has something to do with bootstrapping without replacement. Bootstrapping takes subsamples randomly while jackknife take into account all possible subsamples (without replacement) with the size n-d. But I don't see how this elps me in this point.



Less is more. Stay pure. Stay poor.
It kind of looks like you are running each scenario once, what happens when you run them 10k times each? Do they still have nominal coverage?


Well-Known Member
@katxt: jackknife has something to do with bootstrapping without replacement. Bootstrapping takes subsamples randomly while jackknife take into account all possible subsamples (without replacement) with the size n-d. But I don't see how this helps me in this point.
You're right - the jackknife should strictly use the means of all the subsamples. However, if there are too many to enumerate, then you will get a good approximation by taking a random sample of the subsample means. If, for instance, you have a sample of 50 and you consider all the subsamples of 25, then there are 50C25 subsamples or about 10^14 possible sub-means. This is clearly impossible to list but you can easily take a random sample of say 10000 of them and use those values instead.
In descriptive terms, using the delete-d jackknife subsamples, the SE of the mean = SD of the subsample means x sqrt((n-d)/d)).
So, calculating the SE of the mean with a sample of 50, you can use the standard SD/sqrt(n), or the bootstrap on the 50 numbers with replacement and take the SD of the bootstrapped subsample means (the basic definition of SE), or resample 25 numbers out of the 50 without replacement and use the SE = SD of the subsample means x sqrt((50-25)/25)) = SD of the subsample means x 1, and you should get the same number each time more or less. This is what you have done. All is fine so far.
We can check the delete d jackknife formula SE of the mean = SD of the subsample means x sqrt((n-d)/d)) using the same 50 numbers but taking a sample of 10 without replacement. Sqrt((50-10)/10)) = 2 so this time take the SD of the subsample means and times by 2. You should get the same as SD/sqrt(50) as before. Similarly, if you take samples of 40 without replacement, sqrt((50-40)/40)) = 1/2 so 1/2 of the SD of the subsamples should give the same SE as the other calculations.
@katxt: Thanks for your explanation and I guess you were right. It has something to do with the jackknife method.
When d = n/2 then sqrt((n-d)/d) = 1 and I agree that bootstrapping without replacement is a good estimator for the jackknife method.