Assuming the failure only occur at the working state. (without this assumption the modelling will be a little bit more complicated, depends on which fits better to your actual situation)

In the following I assume all system work independently for different tasks (so that the starting working time, ending time and failure time will be all different for each system). If not then the question you ask may not make sense.

Now I describe one possible modelling:

For each system, assume initially started at serviceable and are idling. Each system will wait for a random exponential time with mean 23 hours to receive a task requested by users. Then the system will start to work. Next, it will work for an random exponential time with mean 1 hours to finish the task if there is no failure, and then will be switched off and becomes idle again. At the moment when it start to work, we have a random exponential time with mean 900/24 = 37.5 hours (re-scaled as the failure only occur at work) representing the failure time. If this failure time is less than the working time, then the failure occur and the system goes to repair for a deterministic 21*24 = 504 hours, and switch to idle state after repair. And the cycle continues.

If these descriptions are correct, then we can calculate the long-run fraction of time of a particular system in the repair loop. And in fact, as the systems are independent and have identically distributed random times, the number of systems in repair loop (out of

systems) will follows a binomial distribution and thus we just need to multiply the probability by

to obtain the expectation.

If the model looks formidable to you, one can always use simulation to estimate the probability first to obtain a rough idea.