+ Reply to Thread
Results 1 to 11 of 11

Thread: Fixing the low value and high value in a probabilistic simulation

  1. #1
    Points: 12, Level: 1
    Level completed: 23%, Points required for next Level: 38

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Fixing the low value and high value in a probabilistic simulation




    Hi,

    When doing repeated random sampling in a monte carlo method, is there a defined procedure to fixing the low value and high value of the samples ?

    More clearly, i have mean and SD for ParamterA, when i create random samples, i know that this sample is too high / too low to be true for this parameter. Is there a good method to identify and remove this ?

    Thanks

  2. #2
    Omega Contributor
    Points: 38,374, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Fixing the low value and high value in a probabilistic simulation

    If you put in that mean and std, well what ever you got is a possible realization of a sample given those are the population parameters. So it seems like you are trying to bias the random nature of the procedure. Why not just except the output?

    Can you do say10000 runs and use that distribution? Perhaps you need to better describe your project and purpose for using simulation.

    PS, I believe you can simulate based on mean and 10 and 90 percentile values. Perhaps that is what you want.
    Stop cowardice, ban guns!

  3. #3
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Fixing the low value and high value in a probabilistic simulation

    In the case of missing data analysis which uses random simulations there are procedures to set a value to be realistic (that is values that are not realistic are replaced with others). That is in SAS, I assume other software does the same. Essentially you are searching for a value beyond a certain range and replacing it with a value you believe is reasonable.

    Or I think there is, its been a while. More generally software has a way to find values and replace them (for that matter you could do this in Excel). What the value that is too high is and what you replace it with would be your expert judgment. A method I have seen used in outlier analysis is to decide what the most reasonable extreme value is (the highest and lowest value that is acceptable) and use this to replace values beyond that range. So if -20 is the lowest value you believe is reasonable and you get -21 you replace it with -20.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  4. #4
    Points: 12, Level: 1
    Level completed: 23%, Points required for next Level: 38

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Fixing the low value and high value in a probabilistic simulation

    Quote Originally Posted by hlsmith View Post
    If you put in that mean and std, well what ever you got is a possible realization of a sample given those are the population parameters. So it seems like you are trying to bias the random nature of the procedure. Why not just except the output?

    Can you do say10000 runs and use that distribution? Perhaps you need to better describe your project and purpose for using simulation.

    PS, I believe you can simulate based on mean and 10 and 90 percentile values. Perhaps that is what you want.
    Thanks for your reply. It would be more clear if i say exactly what i am working on. I have mean and SD of preferred walking speed of 30 people. When i try populate this data, i get the lowest sample as negative (definitely not realistic.. so i removed the negative samples in this case) so as zero, so as 0.1 Km/hr. This is true to the higher end say a walking speed of 50km/hr is not realistic.

    I can fix the lower and higher limit from the lowest and highest walking speed among the 30 people. (or lowest and highest speed ever recorded in history). But it doesn't look sensible to me ... is there any statistical method to do that?

  5. #5
    Points: 12, Level: 1
    Level completed: 23%, Points required for next Level: 38

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Fixing the low value and high value in a probabilistic simulation

    Thanks for your reply. My question now is how to find the most reasonable extreme value ? is there a statistical method that i can employ based on the available mean and SD?

    It would be more clear if i say exactly what i am working on. I have mean and SD of preferred walking speed of 30 people. When i try populate this data, i get the lowest sample as negative (definitely not realistic.. so i removed the negative samples in this case) so as zero, so as 0.1 Km/hr. This is true to the higher end say a walking speed of 50km/hr is not realistic.

    I can fix the lower and higher limit from the lowest and highest walking speed among the 30 people. (or lowest and highest speed ever recorded in history). But it doesn't look sensible to me ... is there any statistical method to do that?

  6. #6
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Fixing the low value and high value in a probabilistic simulation

    There is no statistical method to determine what a reasonable value is. Statistics can not tell you what is reasonable, that is where expertise (domain knowledge) comes into play. It is common to combine domain knowledge and statistics however. Statistics can tell you, through outlier analysis, what an unlikely value is if you assume a specific distribution like a normal one. But I don't think that is much help here. One possibility is to look at past analysis and see if there is a reasonable lowest speed.

    It is well accepted that you can remove mistakes from analysis. Obviously people can not walk at a negative speed, this violates physics, so if you see that you can remove it. But replacing it will rely on domain knowledge or past analysis.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  7. #7
    Super Moderator
    Points: 13,151, Level: 74
    Level completed: 76%, Points required for next Level: 99
    Dragan's Avatar
    Location
    Illinois, US
    Posts
    2,014
    Thanks
    0
    Thanked 223 Times in 192 Posts

    Re: Fixing the low value and high value in a probabilistic simulation

    Quote Originally Posted by depakroshanblu View Post
    Hi,

    When doing repeated random sampling in a monte carlo method, is there a defined procedure to fixing the low value and high value of the samples ?

    More clearly, i have mean and SD for ParamterA, when i create random samples, i know that this sample is too high / too low to be true for this parameter. Is there a good method to identify and remove this ?

    Thanks

    Did you try to make use of the (rather crude) Chebychev's inequality??

  8. #8
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Fixing the low value and high value in a probabilistic simulation

    Quote Originally Posted by depakroshanblu View Post

    It would be more clear if i say exactly what i am working on. I have mean and SD of preferred walking speed of 30 people. When i try populate this data, i get the lowest sample as negative (definitely not realistic.. so i removed the negative samples in this case) so as zero, so as 0.1 Km/hr. This is true to the higher end say a walking speed of 50km/hr is not realistic.
    I think the problem is the shape of the distribution - your data is probably left skewed, so when you use the mean and sd from the sample with a normal distribution you get unrealistic values. IMO the cleaner solution would be to use a better fitting mdel for the input data.

    regards

  9. #9
    Human
    Points: 12,676, Level: 73
    Level completed: 57%, Points required for next Level: 174
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,362
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: Fixing the low value and high value in a probabilistic simulation

    I would guess that the user is sampling from a normal distribution and thereby sometimes gets negative values.

    I suggest to use a distribution that can only take positive values like the gamma distribution, Weibull distribution or the log-normal distribution.

    If there would be a few "too large values", (after having generated the distribution based on observed mean and standard deviation), I would suggest to delete the values that are larger than a maximum threshold. That is called to truncate the distribution. Look for the fastest walker in the olympics to get a maximum value.

  10. #10
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Fixing the low value and high value in a probabilistic simulation

    If you have to many extreme values you could always Winsorize the data.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  11. #11
    Points: 1,974, Level: 26
    Level completed: 74%, Points required for next Level: 26

    Location
    New Zealand
    Posts
    227
    Thanks
    3
    Thanked 48 Times in 47 Posts

    Re: Fixing the low value and high value in a probabilistic simulation


    I imagine that any analytic approach is destined to be problematic. To get the extremes, you need to model the distribution and choose arbitrary extreme tail cutoff probabilities. What would they be? It's also hard to think of any real life data that can be modeled so precisely with a statistical distribution that you can rely on calculations of the extremes of the tails.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats