probability of failure in a multi-component system

I want to calculate the failure rate of a system that has multiple independent points of failure. If any one of these intermediate points fail then the entire system fails. I can determine the failure rates of the intermediate steps but am not sure how that can be extended to making any statements about the system as a whole.

For example, user A wants to send an email to user B. The outcome that I want to describe is that either user B receives the email or does not. The goal is to define some rate of success that describes how likely it is that user B will receive an email from user A. There are several intermediate steps in the process of sending the email such as one or more mail servers failing, routers not responding, power outages, floods, etc. Each of these intermediate steps has a known failure rate and as previously mentioned, it only takes one of these intermediaries to fail to prevent user B from receiving the email.

I was taking a Monte Carlo approach but was thinking that I may be missing a simpler and more fundamental approach to determining the likelihood of user B receiving that email. As you can tell, I’m not strong in stats so your help will be appreciated.



TS Contributor
This is interesting question, but I think you have a problem.

you know a bunch of probs like
P( step i fails ), for i = 1,2,...

you want to know
prob(no mailr for you) = p( union( step i fails ) ).

but to calc this need to know the intersections too.

Problem = no experimental evidence on intersections.

or is there?
If the failure points really are independent then you do know the probabilities of intersections. If A and B are independent, then P(A^B)=P(A)*P(B).

So what you can do is draw a diagram of the system, which may contain components in series and in parallel, depending on how it's set up. I'd then convert the probabilities of failure to probabilities of success for each component. Then the probability that a series of independent components A1, A2, A3 is successful is just P(A1)*P(A2)*P(A3). For components in parallel, the message is delivered successfully if any one path works. So for parallel components B1, B2, and B3 you want to find P(B1 U B2 U B3). Recall that P(X U Y) is just P(X)+P(Y)-P(X intersect Y).

The approach you'll want to take is reduce all your series components to single probabilities first so that you're left with just a bunch of parallel components, and then calculate that union. When you're done, you'll have the probability that the system works, so subtract that from 1 to find the overall probability of failure.

CRUCIALLY IMPORTANT POINT: This only works if the failure of the individual components is really independent. You'll want to think very carefully about this assumption in all cases before proceeding. If you can't defend it, you'll need to be able to say something about conditional probabilities. This may be hard to do, and there's a temptation to make the independence assumption because it seems like the only way to solve the problem. Please understand that if the assumption is inappropriate, you're not actually solving the problem--you're getting an answer, but it's wrong.