Probability highest value is not part of population

#1
Hi I've got a very simple problem but I don't know what test to use and I've been searching without success so I'd very much appreciate some help.

I have 61 values as shown below. The average is 0.159, the SD is (0.924).

What I want to know is what the chances the highest value (~2.46) is not part of the population. At around 2.5 standard deviations only 1% of samples should be as high as that. But as there are 61 samples I guess there are 61 different samples that could be high. So does this mean the chance of the one that happens to be highest being abnormal is 61 x ~0.01. Or only ~40% chance of it being abnormal?


-1.6345593829, -1.5581633336, -1.4823888212, -1.421146035, -1.2290139146, -1.0258771361, -1.0084030046, -0.8173114536, -0.8096347589, -0.6954133842, -0.6171224275, -0.5868395825, -0.5548036747, -0.5548036747, -0.5128185211, -0.4819947935, -0.4750909708, -0.3837028039, -0.3684129389, -0.3618601396, -0.3545792514, -0.2596575833, -0.2409973968, -0.2271637093, -0.1938046891, -0.1594514498, -0.141591803, -0.0946515456, -0.0560628385, -0.0418700087, 0.061089957, 0.1193727816, 0.2868669919, 0.2929861154, 0.3443860081, 0.3524546727, 0.3692278555, 0.3713252941, 0.4546186044, 0.4611046062, 0.5089340796, 0.5249520334, 0.576387555, 0.5781025167, 0.5970328258, 0.6902281938, 0.7235048414, 0.7382820554, 0.7517435571, 0.9195761695, 0.9525860839, 1.0181539803, 1.0790276194, 1.2044016597, 1.5313698078, 1.5602943241, 1.6178133402, 1.700087376, 1.7328513725, 1.7452288823, 1.8776671464, 2.4559432641
 
#4
...and why 10 figures after the decimal mark... (SCNR)
Thanks for the reply. I've spoken to two mathematicians (admittedly late at night) and neither could immediately tell me how to approach this. So, I've just worked this through for what I believe are first principles.

To answer your questions. There just happen to be 61 figures (I hope they are all there) and I've just copied and pasted from the spreadsheet after sorting (I thought real numbers would be useful and I don't know how to stop it pasting all digits).

The question I'm trying to answer is this: "is the highest figure within what would be expected"

After thinking about it overnight I believe the answer can be derived as follows:

There's about a 99% chance of any one being less than the 2.5 Standard deviations. So the chance that all 61 is less that 2.5SD would be (.99)^61 ~= 54%. So the chance of at least one being higher than 2.5SD in a sample of 61 = (1-p(all being lower)) = 1-54% = 44%

So there's a 44% chance of having at least one higher in a normal distribution. (And a ~70%) chance of having one either higher or lower than 2.5SD.

Is this the right way to tackle it?
 
Last edited:

hlsmith

Not a robit
#5
Lets see if I can still do the basics:

Data is normally distibuted per multiple tests (plus sample > 30) and you have an x-hat = 0.172197 and sigma = 0.922818

Not sure if you can just do this but here I go:

(2.46 - 0.172197) / 0.922818 = 2.479, meaning the value is 2.48 standard deviations away from the mean. This places the number to the right of 99.3% of the area under the standard normal distribution. Does this mean it is not a part of the population, well it is the most extreme value by quite a bit, but anomalies happen all of the time. Doe it seem implausible to you?

Others, feel free to correct my above approach if it is not correct, thanks!
 

Dason

Ambassador to the humans
#6
Data is normally distibuted per multiple tests (plus sample > 30)
No. The data doesn't change. A large sample doesn't lead to it being normally distributed. The CLT gives us that the sampling distribution of the mean will be approximately normally distributed when we take a large enough sample but the data doesn't magically become normal as you add more data.
 

hlsmith

Not a robit
#7
True, thanks.

Yeah, what I was trying to get at was sampling variation. The underlying distribution can be normal, but due to a small sample, data may not be reflective of true distribution, though with greater sample sizes (randomly selected) the observe data converges to the true distribution.