# Thread: Obtaining Probability for costs above a certain threshold

1. ## Obtaining Probability for costs above a certain threshold

Dear all,

I never sought for help on a forum since I feel that many times the answer for a problem can be found if one reads books and searches for long enough in publications. But I would be very grateful if some of you could provide some help with the following problem:

The issue at hand is the following:
An organization wishes to analyze repair costs for different machines. There is literally no data available about the parts the organization will have to repair (age, hours of use, …) when the analysis is done so a regression seems impossible to me since the characteristics of the part to repair are not know.

However, the history of reparation costs for such parts is known. So the average costs to repair this kind of part are known as well as the individual expenses for each repair.

For example:
Code:
``````Name of part	Repair number	Costs to repair the part
X       	1	         1250
X	        2	         1359
X	        3	         907
X	        4	         703``````
The number of past repairs for a part varies from zero to more than 100.000; in the table I randomly put 4. Furthermore it is known which part of the total costs was spent for labor, for material etc.

How could one calculate the probability that the repair costs are higher than a certain value, e. g. for our example: “What is the probability that the repair costs more than 1000”? Would the best approach be to determine a cumulative distribution function based on the empirical data and thus predict the probability? This is not getting me the results I need since the ecdf is giving limited results when only few values from the past are available. In the example above the ecdf would give the same probability for costs lower than 704 as for 906 which does not make sense in my context.

Besides trying to fit a distribution to the value from the past, would be there another appropriate solution for the issue I described? What would be a good number of data points to switch from a parametric to a non-parametric analysis of the cdf? What is the ideal technique to fit a distribution to cost data?

Any help is appreciated,

Rashid

2. ## Re: Obtaining Probability for costs above a certain threshold

I think you have a good knowledge about the issue. Non-parametric method in general has less power than the parametric counterparts, which in turns require you to have adequate amount of data.

When you have not enough data, the first thing you can do is to think of if there is any good parametric model (or semi-parametric model) for your data. For example you need to choose a positive distribution for your cost, say log-normal or other. The second thing to consider is try to obtain more data/estimate from outside sources. For example, can you access any data from organizations having similar repairs? So these are some suggestions to relieve your problem.

3. ## The Following User Says Thank You to BGM For This Useful Post:

rashidjelzin (06-14-2016)

4. ## Re: Obtaining Probability for costs above a certain threshold

Dear BGM,

What do you mean by the sentence "non-parametric method has less power than the parametric Counterparts" ? That a parametric distribution gives an exact estimate for each F(x) with F(x) as the cumulative Distribution function?

It is one of my main issues how to determine what is "an adequate amount of data" to use a non-parametric model. Do you have any Input for me concerning that Topic?

I will not be able to obtain more data and often I only have 5-10 data Points for the value to create a model for. I figure, that the uncertainty almost makes it worthless to create a model for such a Distribution, or am i mistaken?

Once again, thank you.

Rashid

5. ## Re: Obtaining Probability for costs above a certain threshold

Actually all method require you to have adequate amount of data. Parametric method has modelling assumptions, restricting the model within a certain kind of parametric form. So let say if you have a one-parameter model, you can obtain a reasonable estimate without too many data. This relies on whether you can make sure the assumption fit to the physical constraints / data well. If the assumptions are wrong, then the estimate you obtained is likely to be useless. This is the modelling risk, but this is the trade-off - you make some strong assumption to compensate the lack of data.

You do not need any golden rule of thumb here (I do not have also), because you are lacking data, and in most cases, we cannot estimate a function (CDF) reasonably well with 5-10 data only, in a non-parametric way.

Some real world industry, like those insurance company, are facing similar situation in their business. They need to model those rare events which are of course rare to observe. If you are still interested, you may have some research on how they circumvent the situation.

6. ## Re: Obtaining Probability for costs above a certain threshold

I understand that when I use a parametric way I create a modelling risk and that it is a trade-off but often I lack data.

Did I understand right, that you point out that with 5-10 to data points a non-parametric mode does not make much sense? Or did you say that it does not make much sense at all to create a model (be it parametric or non-parametric)?

Could you maybe provide a little more information about insurance companies and how they modell rare events or point me to any useful research publications?

Thank you!

Rashid

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts