How do you define a statistical way to ensure that the worst case are captured??

AZA

New Member
#1
I need a recommendation which addresses the variability of results. There is variability associated with these results.

What I specifically need is help to define a statistical way to ensure that the worst case is captured, which can then be used in future testing and we can be sure that we haven’t under-tested by using a “less than worst case” condition.

Im hoping you can help come up with something like this:
1. Take 3 independent samples
2. Average the 3 samples and multiply by 1.8
3. This calculated value has a 99.9% likely to represent the worst case streaming current

Please provide the numbers for me with the workings out, if you can look at a few cases so we can see how repeating the number of samples and the multiplication factor affects the statistical confidence that would be great.

Assume that it follows a normal Gaussian distribution
The maximum scatter is 5 x the mode.

Hope this makes sense. Having real trouble with this.

Any help is appreciated.
 

noetsi

Fortran must die
#2
I don't think there is any way you can statistically insure that "worse case" is captured in future samples because 1) that is not a statistical concept and 2) it will always depend on the sample. You can say for example that we will look for values so many standard deviations from the mean, but that is only a worse case if you define it as such. Statistics can not define it as such and having 3 samples won't guarantee you will find an extreme value relative to future samples.
 

AZA

New Member
#3
Thank you for your reply, really appreciate it. What if the scenario was like this,

Say I had an already existing normal curve, I got 3 values that I need to ensure fall into the 99.9% region however these 3 values fall on the left of the mean on the normal curve. How would you go about getting the values to be shifted into the 99.9% region. I'm thinking that you would only need to multiply the values by a factor of the standard deviations ie if one value fell on the -1 sigma point then to get it to the 99.9% mark i would need to multiply that value by 4 sigma to shift the value into the 99.9% region (Statement is only to give you an idea of what im trying to say).. I hope this makes some sense
 

noetsi

Fortran must die
#4
I suspect Miner would be of a help to you than I because he has, I believe, worked a lot with Six Sigma. I don't understand what you mean by "shifted into the 99.9% region." If you are trying to find something in a certain region of a distribution in a sample then of course you would simply look for a Z score that indicated that it was to the left (or right as the situation dictated) of that region. You don't need to multiply anything you just look for a z score that indicated it was in the region you were interested in.

The same would be true of any future sample as well. If you want to identify an extreme value then you just generate a z score and determine where it lay in the distribution. Just because a value lay in a certain region of a distribution in one sample would not mean it would in another - because the distribution could be different even with a normal distribution [two normal distributions can have very different means and standard deviations].
 

rogojel

TS Contributor
#5
hi,
just my quick 5 cents: if you already know the distribution why do you need to measure at all? Why not just calculate the 99.9% value from the distribution formula?

If you need to identify the distribution first then 3 points would definitely not be enough.

regards
rogojel
 

noetsi

Fortran must die
#6
I think rogojel and I are making the same basic point. What matters is not whether you include a point from a previous sample. It is that you identify a certain region of the curve you feel is too extreme (or that you are interested in) which can be done with z scores and statistical tables.
 

AZA

New Member
#7
Hey guys,

So statistically is there a way to ensure a value is "worst case" ie in the 99.9% region for this case?
That is to say a value I have must fall into the 99.9% region of a know normal curve.

I understand that you could do a z score and see if that value was >= the z score for 99% ie z would be >= 2.326348. Thus proving it was in the 99% region.

But I'm curious to know if say the value was found to have z = 1. would it be possible to make that same value fall into the 99% region??

By the way, I really am thankful for the feedback. so very helpful.
 

hlsmith

Less is more. Stay pure. Stay poor.
#8
If your data are really normally distributed then you will always likely have data outside of your confidence interval until you make it uber wide.

So for every population you have a couple of very extreme values and you want to approximately know how wide your range might be in subsequent population? So you know the full/complete population for one group now in the future you want to know if you can use this to help ensure you don't miss an extreme value in the new population when you conduct sampling? If you are randomly sampling say a 100 observations out of 1,000, everytime you are going to have a 9 out of 10 chance that you miss the extreme value (I believe), so you say hey, my data are normally distributed so I know the approximate percentage of data points that will land within said number of standard deviations. As mentioned earlier, if you have confidence that your data are normally distributed you just plug in the respective value from the standard normal table within your confidence interval calculation, then you have your potential coverage (99.9 or greater). The larger your sample, the rule of big numbers comes into play and you get better estimates given data are normally distributed.

Also comes down to how easy or costly it is for you to do sampling, base on strategies you can play around with. Also, since you know there can be a low value, lead us to think that these data may be slightly negatively skewed. If you are able to confirm this you may be able to put some kind of correction into place.
 

noetsi

Fortran must die
#9
There is a way to make a value show up so many standard deviations from the mean [which is what the 99.9 region really reflects]. Create a distribution, a mean, and a standard deviation for a sample that will result in that value being in that region. If you know what the value is you can work backwards to determine what mean and standard deviation for a normal distribution would result in that value being in the 99.9 or beyond range.

I suspect you really don't want to do that :p or perhaps more accurately I can't imagine why you would want to do that.