I work as an engineer at a large manufacturer and we typically use high stress tests on new products to discover failure modes. Sometimes when we have failures, we attempt to fix the problem and want to re-run the test to see if we were successful. The most obvious question then and the one I get a lot is "How many more units do I need to test to be sure the problem is fixed?" or the way I look to phrase it, "What is the sample size of the next test to show the new sample to be significantly different than the first assuming 0 failures?" This is the bottom-line question I am trying to figure out.

I have read two textbooks on statistics for engineers and the closest thing I can find is using 2x2 contingency tables and "reverse-engineering" the chi-squared test to spit out a sample size (one of the occurrence values in the table). However, after more research on the internet, this does not seem accurate since we are dealing with occurrence values less than 5 in the table (in fact, occurrences of 0, since we assume 0 failures in the second group). I have also researched Fisher's exact test extensively on the web, but am afraid of using it since it assumes the row and column totals are fixed. I cannot quite figure out if this is valid for my situation or not.

Overall it seems to me there is not much information on this, and if there is information, it is highly debated among statisticians.

I guess I am looking for a current view on this topic or if anyone has run into this situation before. Any help or insight is greatly appreciated.