I work as an engineer at a large manufacturer and we typically use high stress tests on new products to discover failure modes. Sometimes when we have failures, we attempt to fix the problem and want to re-run the test to see if we were successful. The most obvious question then and the one I get a lot is "How many more units do I need to test to be sure the problem is fixed?" or the way I look to phrase it, "What is the sample size of the next test to show the new sample to be significantly different than the first assuming 0 failures?" This is the bottom-line question I am trying to figure out.

I have read two textbooks on statistics for engineers and the closest thing I can find is using 2x2 contingency tables and "reverse-engineering" the chi-squared test to spit out a sample size (one of the occurrence values in the table). However, after more research on the internet, this does not seem accurate since we are dealing with occurrence values less than 5 in the table (in fact, occurrences of 0, since we assume 0 failures in the second group). I have also researched Fisher's exact test extensively on the web, but am afraid of using it since it assumes the row and column totals are fixed. I cannot quite figure out if this is valid for my situation or not.

Overall it seems to me there is not much information on this, and if there is information, it is highly debated among statisticians.

I guess I am looking for a current view on this topic or if anyone has run into this situation before. Any help or insight is greatly appreciated. ]]>

"When an analysis model includes an interaction effect between two or more quantitative variables, it is important to center predictor variables at their means (i.e., subtract the mean from each score) prior to analyzing the data..."

That is the first time I had read that advice on interaction (they are talking about the analytical model so this would apply even if you had no missing data - that is to interaction generally).

I wondered what others thought of this.

I also am confused what the author means by this:

"However, centering becomes difficult when one of the variables in the product term has missing data. One option is to center the variables prior to imputation, compute the necessary product term, and fill in the missing variables (including the product term) on their centered metrics. This approach requires estimates of the variable means, so maximum likelihood estimates (e.g., from an initial EM analysis are logical choice"

I assume the product term is the interaction term but I am not at all sure how you are centering here of how you use ML.... ]]>

In is roughly the budgets of ~1000 entities over a 15 year time frame. Some of those financial entities experienced both sharp budgetary spikes where they saw a large increase in cash while others saw budgetary falls.

The issue is to see how these budgetary windfalls and shortfalls affected a certain area of spending. These entities must spend a certain minimum dollars on one area of their budget but are free to increase spending at their discretion.

My research is to analyze whether windfalls equated to increased spending in this area of their budgets.

The issue is, not all entities had budget windfalls and shortfalls, is there a way to statistically define the degree of a windfall or shortfall and how that affects spending in this area, and the other issue is that the entities that did have windfalls and shortfalls had them at different times in the 15 year period. Though they all share the same 2008-2009 crunch.

My initial thought is using spline regressions, but I am unsure if this is the right route to pursue or if there is a more simplistic approach. ]]>