Dear all,

I never sought for help on a forum since I feel that many times the answer for a problem can be found if one reads books and searches for long enough in publications. But I would be very grateful if some of you could provide some help with the following problem:

The issue at hand is the following:

An organization wishes to analyze repair costs for different machines. There is literally no data available about the parts the organization will have to repair (age, hours of use, …) when the analysis is done so a regression seems impossible to me since the characteristics of the part to repair are not know.

However, the history of reparation costs for such parts is known. So the average costs to repair this kind of part are known as well as the individual expenses for each repair.

For example:

The number of past repairs for a part varies from zero to more than 100.000; in the table I randomly put 4. Furthermore it is known which part of the total costs was spent for labor, for material etc.

How could one calculate the probability that the repair costs are higher than a certain value, e. g. for our example: “What is the probability that the repair costs more than 1000”? Would the best approach be to determine a cumulative distribution function based on the empirical data and thus predict the probability? This is not getting me the results I need since the ecdf is giving limited results when only few values from the past are available. In the example above the ecdf would give the same probability for costs lower than 704 as for 906 which does not make sense in my context.

Besides trying to fit a distribution to the value from the past, would be there another appropriate solution for the issue I described? What would be a good number of data points to switch from a parametric to a non-parametric analysis of the cdf? What is the ideal technique to fit a distribution to cost data?

Any help is appreciated,

Rashid