# Thread: Fixing the low value and high value in a probabilistic simulation

1. ## Fixing the low value and high value in a probabilistic simulation

Hi,

When doing repeated random sampling in a monte carlo method, is there a defined procedure to fixing the low value and high value of the samples ?

More clearly, i have mean and SD for ParamterA, when i create random samples, i know that this sample is too high / too low to be true for this parameter. Is there a good method to identify and remove this ?

Thanks

2. ## Re: Fixing the low value and high value in a probabilistic simulation

If you put in that mean and std, well what ever you got is a possible realization of a sample given those are the population parameters. So it seems like you are trying to bias the random nature of the procedure. Why not just except the output?

Can you do say10000 runs and use that distribution? Perhaps you need to better describe your project and purpose for using simulation.

PS, I believe you can simulate based on mean and 10 and 90 percentile values. Perhaps that is what you want.

3. ## Re: Fixing the low value and high value in a probabilistic simulation

In the case of missing data analysis which uses random simulations there are procedures to set a value to be realistic (that is values that are not realistic are replaced with others). That is in SAS, I assume other software does the same. Essentially you are searching for a value beyond a certain range and replacing it with a value you believe is reasonable.

Or I think there is, its been a while. More generally software has a way to find values and replace them (for that matter you could do this in Excel). What the value that is too high is and what you replace it with would be your expert judgment. A method I have seen used in outlier analysis is to decide what the most reasonable extreme value is (the highest and lowest value that is acceptable) and use this to replace values beyond that range. So if -20 is the lowest value you believe is reasonable and you get -21 you replace it with -20.

4. ## Re: Fixing the low value and high value in a probabilistic simulation

Originally Posted by hlsmith
If you put in that mean and std, well what ever you got is a possible realization of a sample given those are the population parameters. So it seems like you are trying to bias the random nature of the procedure. Why not just except the output?

Can you do say10000 runs and use that distribution? Perhaps you need to better describe your project and purpose for using simulation.

PS, I believe you can simulate based on mean and 10 and 90 percentile values. Perhaps that is what you want.
Thanks for your reply. It would be more clear if i say exactly what i am working on. I have mean and SD of preferred walking speed of 30 people. When i try populate this data, i get the lowest sample as negative (definitely not realistic.. so i removed the negative samples in this case) so as zero, so as 0.1 Km/hr. This is true to the higher end say a walking speed of 50km/hr is not realistic.

I can fix the lower and higher limit from the lowest and highest walking speed among the 30 people. (or lowest and highest speed ever recorded in history). But it doesn't look sensible to me ... is there any statistical method to do that?

5. ## Re: Fixing the low value and high value in a probabilistic simulation

Thanks for your reply. My question now is how to find the most reasonable extreme value ? is there a statistical method that i can employ based on the available mean and SD?

It would be more clear if i say exactly what i am working on. I have mean and SD of preferred walking speed of 30 people. When i try populate this data, i get the lowest sample as negative (definitely not realistic.. so i removed the negative samples in this case) so as zero, so as 0.1 Km/hr. This is true to the higher end say a walking speed of 50km/hr is not realistic.

I can fix the lower and higher limit from the lowest and highest walking speed among the 30 people. (or lowest and highest speed ever recorded in history). But it doesn't look sensible to me ... is there any statistical method to do that?

6. ## Re: Fixing the low value and high value in a probabilistic simulation

There is no statistical method to determine what a reasonable value is. Statistics can not tell you what is reasonable, that is where expertise (domain knowledge) comes into play. It is common to combine domain knowledge and statistics however. Statistics can tell you, through outlier analysis, what an unlikely value is if you assume a specific distribution like a normal one. But I don't think that is much help here. One possibility is to look at past analysis and see if there is a reasonable lowest speed.

It is well accepted that you can remove mistakes from analysis. Obviously people can not walk at a negative speed, this violates physics, so if you see that you can remove it. But replacing it will rely on domain knowledge or past analysis.

7. ## Re: Fixing the low value and high value in a probabilistic simulation

Originally Posted by depakroshanblu
Hi,

When doing repeated random sampling in a monte carlo method, is there a defined procedure to fixing the low value and high value of the samples ?

More clearly, i have mean and SD for ParamterA, when i create random samples, i know that this sample is too high / too low to be true for this parameter. Is there a good method to identify and remove this ?

Thanks

Did you try to make use of the (rather crude) Chebychev's inequality??

8. ## Re: Fixing the low value and high value in a probabilistic simulation

Originally Posted by depakroshanblu

It would be more clear if i say exactly what i am working on. I have mean and SD of preferred walking speed of 30 people. When i try populate this data, i get the lowest sample as negative (definitely not realistic.. so i removed the negative samples in this case) so as zero, so as 0.1 Km/hr. This is true to the higher end say a walking speed of 50km/hr is not realistic.
I think the problem is the shape of the distribution - your data is probably left skewed, so when you use the mean and sd from the sample with a normal distribution you get unrealistic values. IMO the cleaner solution would be to use a better fitting mdel for the input data.

regards

9. ## Re: Fixing the low value and high value in a probabilistic simulation

I would guess that the user is sampling from a normal distribution and thereby sometimes gets negative values.

I suggest to use a distribution that can only take positive values like the gamma distribution, Weibull distribution or the log-normal distribution.

If there would be a few "too large values", (after having generated the distribution based on observed mean and standard deviation), I would suggest to delete the values that are larger than a maximum threshold. That is called to truncate the distribution. Look for the fastest walker in the olympics to get a maximum value.

10. ## Re: Fixing the low value and high value in a probabilistic simulation

If you have to many extreme values you could always Winsorize the data.

11. ## Re: Fixing the low value and high value in a probabilistic simulation

I imagine that any analytic approach is destined to be problematic. To get the extremes, you need to model the distribution and choose arbitrary extreme tail cutoff probabilities. What would they be? It's also hard to think of any real life data that can be modeled so precisely with a statistical distribution that you can rely on calculations of the extremes of the tails.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts