Computing how abnormal a data point is in a non-normal distribution

#1
Hi, folks. This is my first foray into posting on this forum, so apologies if there is a protocol I am unaware of.

Let's say I have a binomial data set - m number of data points with a vaule of 0 and m number of data points with a value of 1. I have one data point (z) with the value of 0.5 (see below).

y x
y x
y x
y z x
----------------------------------------
0 0.5 1

We want to recognize data point z as a "maximally abnormal" (i.e. the maximum outlier possible with the given distribution) data point b/c it is furthest outside of both clusters.

How do we do this? In general, how do we take a data point and, without assuming a normal distribution, compute how "different" it is from the given distribution?

Thanks in advance.
 

Dason

Ambassador to the humans
#2
Your general question makes sense but I have no idea what you're saying with your specific example. I don't understand the chart thing that you made and if the data is binomial then it could only take integer values. Assuming that you meant that the data was really either a 0, 1, or 2 and you divided everything by 2 to get 0, .5, 1 the data still isn't very binomial because you wouldn't see the pattern you describe in a binomial distribution. So could you describe your example some more because it's not clear to me what you're trying to say.
 
#3
Your general question makes sense but I have no idea what you're saying with your specific example. I don't understand the chart thing that you made and if the data is binomial then it could only take integer values. Assuming that you meant that the data was really either a 0, 1, or 2 and you divided everything by 2 to get 0, .5, 1 the data still isn't very binomial because you wouldn't see the pattern you describe in a binomial distribution. So could you describe your example some more because it's not clear to me what you're trying to say.
Ok, forget the notion about "binomial." My example more generally is intended to say: suppose I have a cluster of data points at one end of a spectrum and another cluster at the other end, how do I determine the uniqueness of a single data point somewhere in between?"

Is this clearer?

Thanks,
B-
 
#4
Another way of putting this is: given sample s in population P, what kind of measures can be used to determine how "surprising" (or conversely, how "normal") s is given P?
 

Dason

Ambassador to the humans
#5
Well I'd say it really depends on the structure of your data. In your toy example at the top the results you got wouldn't be surprising at all if the true probabilities were something like 4/9 for 0, 1/9 for 0.5, and 4/9 for 1.