How do risk models deal with state changes over time?

Let's say you are trying to predict if a machine is going to explode (0) or successfully complete its job (1), and you are trying to determine the impact of certain states on the machine.

So say the machine has state A, B, C, and D. The machine starts in state A. Then... A-> B, B->C *OR* B->D, C->D. Then after D, the machine *may* successfully complete its job. The machine can explode in any of the states... but the likelihood of that happening is measurably different.

Say there's other "typical" variables at play, such as "job size" and "number of components", and you want to incorporate these into the prediction model as well.

Let's say you have a thousand machines that you have tracked with a log... and you view the patterns (e.g. A->A->A->B->B->C->explode) or (e.g. A->A->A->B->B->C->D->D->D->success).

So for the model, you want to predict the likelihood that a machine will successfully complete the job given its state and job size and number of components.

So a couple of options I can think of:

1) Build a model for each state:
Model 1: Get all the machines that have been in State A and do a logistic regression.
Model 2: Get all the machines that have been in State B and do a logistic regression.
Model 3: Get all the machines that have been in State C and do a logistic regression.
Model 4: Get all the machines that have been in State D and do a logistic regression.

Upside: Data is nicely cross sectional... no weird time series things going on.
Downside: This would be really arduous to manage more and more models as you increase the amount of states.

2) Build one model that incorporates the states as binaries... so if you see that a machine has passed through a state you tag the state binary as 1.

Upside: One model
Downside: I think there are issues here with the training set. If you train the model using data that the machines *ended up at*... then you can only regress using the final end point for a machine. But in reality, we are estimating based on "mid life" at an individual state. Alternatively, if you try to sample from "mid life" data points in the logs, you end up either (a) if you randomly sample from the log, you are biasing the model by oversampling machines that appear more often in the logs or (b) if you are sampling one data point from individual machines, you are biasing the model by overrepresenting the machines that have exploded (by the nature of the state transitions, machines will explode more quickly than they successfully complete).

Is this ringing any bells for anyone? Not sure where to go from here...