Hello,
I compared different ways to calculate confidence intervals. On the one side I used the direct formular on the other side I used a percentile bootstrap methoden. (Calculate the statistic (e. g. mean) on n subsamples and choose the a/2 and 1-a/2 percentile for the confidence interval.)
I noticed that applying the bootstrap method with replacement and a sample size that equals the original sample leads to (almost) the same confidence interval than the one I calculated by formular. The interesting part is that when I apply the bootstrap method without replacement and a sample size that equals half of the size from the original data I also get (approxemately) the same confidence interval.
Has anyone an idea why both parametrisations of the bootstrap method leads to the same result. Or aren't the confidence intervals equal?
More details and thoughts:
For my research I took several input datasets (n=100,000) that are choosen from different distributions (standard normal, beta, student-t, ...). Furthermore I calculated several statistics (mean, quantiles, regression coefficients, ...). To all combinations of dataset and statistics I calculated the confidence intervals (number of subsamples = 1,000). For each combinations the three approaches (formular, bootstrap with replacement, bootstrap without replacement) leads to very similar confidence intervals.
When I choose a bootstap method without replacement and a sample size that quals the original dataset, then there is no variation between each subsample and the "confidence interval" contains only one point. I get the maximum variation when each subsample contains only one datapoint. So the truth must be somewhere in the middle, but I wonder why it is always at 50% subsample size.
Thanks for your ideas.
I compared different ways to calculate confidence intervals. On the one side I used the direct formular on the other side I used a percentile bootstrap methoden. (Calculate the statistic (e. g. mean) on n subsamples and choose the a/2 and 1-a/2 percentile for the confidence interval.)
I noticed that applying the bootstrap method with replacement and a sample size that equals the original sample leads to (almost) the same confidence interval than the one I calculated by formular. The interesting part is that when I apply the bootstrap method without replacement and a sample size that equals half of the size from the original data I also get (approxemately) the same confidence interval.
Has anyone an idea why both parametrisations of the bootstrap method leads to the same result. Or aren't the confidence intervals equal?
More details and thoughts:
For my research I took several input datasets (n=100,000) that are choosen from different distributions (standard normal, beta, student-t, ...). Furthermore I calculated several statistics (mean, quantiles, regression coefficients, ...). To all combinations of dataset and statistics I calculated the confidence intervals (number of subsamples = 1,000). For each combinations the three approaches (formular, bootstrap with replacement, bootstrap without replacement) leads to very similar confidence intervals.
When I choose a bootstap method without replacement and a sample size that quals the original dataset, then there is no variation between each subsample and the "confidence interval" contains only one point. I get the maximum variation when each subsample contains only one datapoint. So the truth must be somewhere in the middle, but I wonder why it is always at 50% subsample size.
Thanks for your ideas.