Bootstrap with binary data

Hi All,

I kind of stuck on where to start for bootstrap with binary data.

So, I have big numbers of data that consists an acrivity within area. The data will represent just like does the activity is success of failure, represent by 1 for success and 0 for failure.

Example: in area A, we got 12 samples of data where 9 of them are success, so the result is 75%

With so quite number of data has small sanple size, we want to use bootstrap for those data. I have been look so many reference on how to do it. Some of the reference show that we can use rbinom function in R.

From the previous example, I run this in R

rbinom(50, 12, 0.75)
The idea is to run 50 times repetition, each repetition consists 12 trial, with 75% probability

Then we sum the result of the function and simply divide it by 12 x 50 = 600

Honestly, I wondering if it is the right way to do it, any suggestion?



Well-Known Member
Can you explain exactly what you want to do with your data. It seems that you are doing a simulation, rather than, say, a bootstrap test.
The bootstrap idea is to jumble the data that you have and do the same analysis many times and collate the results. Usually a bootstrap method is used to find a confidence interval, or perhaps a p value, in a situation where the usual statistical methods aren't applicable for some reason.
Hi @katxt , the idea is to make small sample data "readable" by do bootstrap on it.

In Area Z, we have 3 sample with 2 of them are success, that makes it 66%. But, due to low sample, we cannot use this data. So the idea is do boostrap for these kind of data, with 50 times repetition. That makes the 3 sample data is "likely" become 3*50 = 150 sample data, of course the result will be different from 66%, it depends on the random binomial result I guess.

Does this method acceptable?


Well-Known Member
Unfortunately you cannot get bigger samples just by repeating the data you have. If you could, nobody would do large experiments. Virtually all statistics formulas involve the sample size and that is the number of independent data points you have.