PDA

View Full Version : finding non-randomness in binomial successes?

amarkb
01-02-2006, 02:10 PM
Hi,

I've only had one semester of Stats, so this is probably a real newbie question.

I'm having problems on deciding how to approach a specific probability problem. I'll describe the problem and what I've tried and got so far.

The question concerns categorical/binomial trials.

If I have:

n = number of trials
p = probability of success
q = 1-p = probability of failure

The probability that you do N trials with exactly "x" trials right and (N-x) trials wrong is
Pr(N,x) = N!/(x!*(N-x)!) * p^x * q^(N-x)

How can I determine the exact probability that a certain number of successes out of a certain number of trials indicate non-randomness? The equation above only gives me the probability of getting x number of successes out of n trials.

What I would like to know is
1.If I actually get x out of n, what is the exact probability that these successes indicate non-randomness?
2.What is the required sample size to make the result in part 1 meaningful?

One approach that I tried is using is null hypothesis testing for proportions.

Ex:
p = 50% - null hypothesis - randomness
x = number of successes
n = number of trials

z ~= ( x – np ) / ( squareRootOf ( np * (1-p) ) )

so for example p = 50, x = 6, n = 10:

z ~= ( 60 – 100 * 0.50 ) / squareRootOf ( ( 10 * 0.50 ) * ( 1 – 0.50))
z ~= 1 / squareRootOf( 5 * 0.5)
z ~= 1 / squareRootOf(2.5)
z ~= 1 / 1.58113883 = 0.632455532

The first thing I do is set the critical z to a value that will allow my z to fall in the rejection zone, thus rejecting the null hypothesis of 50% chance probability. Looking this up on the z table I see that setting the critical z to 0.63 (which is less than my z of 0.632) gives me 0.2357 on each side under the null hypothesis. So that would mean that there is only a 47.14% (2*0.2357) chance that 6 correct out of 10 trials indicates non-randomness.

So... can anyone tell me if this is the correct approach or is it just a red herring?:confused:

Thanks,
Mark

JohnM
01-02-2006, 02:40 PM
Mark,

I think what you're really trying to determine is, for example, when you have 10 binomial trials, and the probability of "success" is 50%, then at what number of "successes" (or higher) do you get suspicious?

Looking at it a different way, you could ask the question:

If I have 10 binomial trials, each with a 50% chance of success, then what is the probability of getting x or more successes?

You can answer this by using the cumulative binomial distribution. The formula you used gives you the probability of one particular outcome (i.e., 6 out of 10, with p = 0.5), but with a cumulative distribution, you could determine the probability of getting 6 or more out of 10 (in other words, the probability of getting 6 or 7 or 8 or 9 or 10 out of 10).

All you need to do is compute the individual probabilities and add them up. If you have a large number of trials, you can use Excel function =BINOMDIST. One of the function arguments is whether or not you want to compute the cumulative probability or just the probability of that particular outcome.

Let me know if this helps,
John

amarkb
01-02-2006, 06:57 PM
Thanks John,

your answer is extremely helpful. Oddly, my textbook doesn't mention binomial cumulative distribution functions but I found lots of related tutorials on the net after reading your post. I've also got a couple of other statistics books on order.

I ran the numbers manually and with BINOMDIST() in Excel and OpenOffice. One thing to watch out for is that BINOMDIST() actualy sums the probabilities of getting X or less out of 10. Ex: probability of 6, 5, 4, 3, 2 or 1 in this example. So you have the subtract the result from 1 and then add the probability for 6 back in.

The result is
1 - 0.82831 + 0.20508 = 0.37677. So if I understand things so far this result tells me the probability that a person coud get 6 or more out of 10 by chance alone. Now comes the question that will probably make everyone cringe! Since there is a 0.37677 probability this is chance, does that also mean that there is a (1 - 0.37677) or 0.62323 probability that this is not chance? Seems logical to me, but my intuition has a history of getting me in trouble with statistics!

If I use the approach above to calculate that the odds against someone getting 5 successes out of 5 trials is 0.969, can I then claim with 96% confidence that this sequence of events was casued by a non-random cause? Or, is there a next step to develop something like a confidence interval?

Thanks,
Mark

By the way, I bungled the numbers on my first post. The line that reads
z ~= ( 60 – 100 * 0.50 ) / squareRootOf ( ( 10 * 0.50 ) * ( 1 – 0.50))
z ~= ( 6 – 10 * 0.50 ) / squareRootOf ( ( 10 * 0.50 ) * ( 1 – 0.50))

JohnM
01-02-2006, 07:21 PM
Mark,

The .37677 is the probability that it will happen at all, over many, many experiments. It's really not possible to judge whether it happened by chance (randomness) or non-randomness, after the fact, just by the probability you calculate - you have to use your judgment as to whether or not you believe it occurred by chance or some real effect.

In traditional hypothesis testing, you start with a null hypothesis, which is basically a statement that asserts the "absence of non-randomness." The alternative hypothesis, or sometimes called the research hypothesis, is basically a statement that asserts the "presence of non-randomness." You then set the alpha level, which is basically the amount of risk you're willing to take in asserting "non-randomness" (the risk that your assertion is wrong).

Then you run an experiment, and if the probability of the outcome is less than or equal to your pre-set alpha-level, then you have evidence in support of non-randomness, but you really can't make a statement as to the probability of the outcome being non-random.

In other words, you're basically "betting" that there is non-randomness in the experiment you observed. Obviously, the lower the probability of the outcome, the greater the chance that it is non-random....but after all, it's just one experiment....

Hope I didn't confuse you, but there's a limit to what you may conclude from the outcome of an experiment, which is only a sampling of an entire population...

John

amarkb
01-02-2006, 09:05 PM
John,

what you said isn't confusing at all. Actually you say it much more clearly than how it is presented in my textbook. I get what your saying about drawing conclusions based on probability. I remember someone else in my class always saying “probability is not proof, it's just probability.”

One of the books I have on order is a stats book for the behavioral sciences. I'm guessing it will have a strong emphasis on hypothesis testing of experiments. Lots of interesting reading ahead.

Thanks again,
Mark

quark
01-02-2006, 09:16 PM
Since there is a 0.37677 probability this is chance, does that also mean that there is a (1 - 0.37677) or 0.62323 probability that this is not chance?

Mark,

I concur with John that it is not possible to judge randomness. The example below may help you understand. Suppose you have a box containng two red one blue marbles and you pick one marble at random from the box. In any single experiment, the probability of picking a red one is 2/3, and the process is random. This does not mean that the process would be non-randon with a probability of 1/3. It will be random if you do it randomly.

On the other hand, if you see a person picking marbles from the box, and he tells you the outcome of 10 experiments. There is really not much information on whether the experiment is random or not.

Hope this helps.

amarkb
01-03-2006, 10:44 PM
Thanks Quark,

that does help, especially in helping me understand what my actual question is, which is probably a good first step to understanding the answers;)

Maybe the real question is “what exactly is the relationship between probability and randomness, if any?”

If you flip a coin the outcome is random with 50% probability for heads, 50% for tails. But lets say you have a toggle button with a light that goes on when you push it, then goes off when you push it again, obviously not random.

Now, if the light is off and you ask me what the probability is of the light coming on with the next push I would say 100% based on my knowledge of the behavior of the button and its current state. Actually that wouldn't even be a probability, that would be a certainty.

But, if you blindfold me so I don't know the current state, I can still say there is 50% probability either way based just on the behavior. Or is it even correct to say that about a single sample? Can I only say that if I pushed it many times that it would exhibit 50% probability of on's over the course of many samples.

Now lets assume I know nothing about the behavior of the button or even it's current state. If you ask me to push it several times in a row I will quickly notice that it is going on – off – on – off – on - off... That would be a very good indicator that the behavior is probably not random. My textbook talks about Wald-Wolfowitz One-Sample Runs Tests which are based on this principle.

However, if someone else pushed the button many times and gave me the results without any information about what sequence the samples were acquired in, all I would know is that they got on's about 50% of the time and off's about 50% of the time. There's nothing there in the data that would tell me anything about randomness or non-randomness.

Mark

JohnM
01-04-2006, 07:04 AM
Maybe the real question is “what exactly is the relationship between probability and randomness, if any?”

Probability helps you explain the outcome of random events over the long run.

But, if you blindfold me so I don't know the current state, I can still say there is 50% probability either way based just on the behavior. Or is it even correct to say that about a single sample? Can I only say that if I pushed it many times that it would exhibit 50% probability of on's over the course of many samples.

Without knowing the on-off process, the first "guess" is just like a coin flip, but after that, there's no randomness at all.