Concept of P value

#1
Hi Guys

Recently I started working towards improving my understanding of basic concepts of statistics and have read a few posts/books online about hypothesis and more specifically the p value. Different sources word it differently however I have been really unclear about what p value really mean.

I was wondering whether someone was able to explain this concept in layman terms to me please.

More specifically I have trouble understanding this following sentence that I found in one of the sources " p value is a chance that someone may have gotten results as extreme as yours while the null hypothesis was still true".

I really don't understand what this last sentence means. What extreme results is it referring to? How do i know these are extreme results?

God help me !!1!!:yup:

Thanks Fellas
 

BGM

TS Contributor
#2
Let me try.

A hypothesis is a certain statement about the unknown but fixed parameters. We can never know the true value of the parameters, but we do have some observation of random sample, coming from a model related with these parameters. So we can make inference about the parameters when we observe the sample.

One example is the location parameter, e.g. mean of the normal distribution. Let say the alternative hypothesis is that the mean is larger than a certain value. The key fact here is that you expect the random sample to be larger when the true parameter is large. Therefore a larger random sample will give a stronger evidence to the alternative hypothesis.

In the experiment you have observe the random sample and have the "observed test statistics". Using the example above, whether you reject the null hypothesis depends on how large your observed test statistics is. If it is large, then you will be more willing to reject the null hypothesis.

Now assume the null hypothesis is actually true. Under this assumption, imagine you can repeat the experiment and you won't be surprised if you obtain a different result, as the test statistics before realized is still a random variable. The question is what is the probability that you will have observe a test statistics which is more extreme - i.e. more favor to the alternative hypothesis, and in my example will mean larger than the actual observed test statistics. And this is the p-value.
 
#3
thanks BGM - but I'm still confused. p value is a conditional probability. but what is it telling us -probability of what ? and how does the word 'extreme' come into play
 
#4
Here is another take on it: if p value is lets say < 0.05, it is a strong evidence against null hypothesis and hence we reject the null hypothesis. p value is a probability that an event happened by chance. Assuming that these two sentences are accurate, then looking at an example : pizza chain says that on an average their delivery time is 30 minutes or less (null). I believe that delivery time is more than 30 minutes on average (alternative). So lets say after running hypothesis on data gathered, I find p value to be 0.001 - that means there is strong evidence against null and hence we reject it i.e. delivery time is indeed not 30 minutes or less. But how do i associate the probability that this event happened by chance concept to this situation? Since probability that this event (delivery time 30 minutes or less) happened by chance is very small (0.001), this is a strong evidence that delivery time is not by chance but a truth. But then if its the truth why are we rejecting it? Doesn't make sense.
 

CowboyBear

Super Moderator
#5
p value is a probability that an event happened by chance.
I know this is a common interpretation, but it's probably best avoided. The p value tells you the probability of observing the "event" by chance if the null hypothesis is true. It does not tell you the probability that the data that you have observed occurred because of "chance" (as opposed to because the null hypothesis is actually false).

Anyway, criticising is easy but explanations aren't, so here's my go: :)

----------------

Imagine we are interested in some parameter - say, a correlation.

We would like to know whether the value of this parameter in some population is zero or not. So we specify a couple of hypotheses about this parameter that we will test:
The null hypothesis: The parameter is exactly equal to zero in the population
The alternative hypothesis: The parameter is not equal to zero in the population

Although our interest is in the population, we don't have unlimited time and money, so can't get data from every member of the population. So we draw a sample from the population, and calculate a test statistic that is an estimate of the population parameter. For example, a Pearson product-moment correlation coefficient.

(Note: Chances are, even if the population parameter is exactly zero, our sample statistic would not be exactly zero, due to "chance" / sampling error).

We are then able to ask the following question:
IF the null hypothesis is actually true, what is the probability of observing a test statistic as far or further from zero, in our sample of data? This is the p value.

E.g., if the true value of a correlation between two variables in a population is actually zero, the probability of observing a correlation of 0.2 or greater in a sample of 30 people is 0.289.

If the p value is "small" (usually the cutoff is 0.05) we say that we can reject the null hypothesis, and support the alternative hypothesis. If the p value is above 0.05, we cannot reject the null hypothesis. Note that a p value larger than the 0.05 cutoff is NOT evidence that the null hypothesis is true; it just means we haven't got enough evidence to reject it yet.

Essentially the logic here is that if the data would be unlikely if the null was true, we therefore think that the null hypothesis itself must be unlikely, and reject it. Formally, this is a logical fallacy known the probabilistic modus tollens: Just because a set of data would be unlikely if an hypothesis was true does not necessarily mean that the hypothesis itself is unlikely.

Now you may be left with questions such as:

-What good is a method that can only provide evidence in favour of one hypothesis, but not the other?
-A parameter such as a correlation could take any of an infinite number of values; why on earth would it be exactly zero? And if such an hypothesis isn't very plausible, why would we bother testing it?
-Why do we worry so much about whether a parameter is zero or not, instead of trying to figure out the most likely range of values for the parameter?
-Even if we did care about whether a parameter is zero or not, why don't we calculate the probability of the hypothesis given the data observed? Surely that's more interesting than the probability of the data observed given the hypothesis?
-Why is a statistical method that essentially relies on a simple logical fallacy so popular?

If you have concerns like this, you'd be in very good company.
 
Last edited:
#7
Dear CowboyBear,

Great job! And I agree with trinker, would you be so kind to make this a FAQ?
Simply use the format I used in the other questions.

Thanks!

TE
 

Karabiner

TS Contributor
#8
-A parameter such as a correlation could take any of an infinite number of values; why on earth would it be exactly zero? And if such an hypothesis isn't very plausible, why would we bother testing it?
But we don't now the direction of the nonzeroness.

With kind regards

K.
 
#9
IF the null hypothesis is actually true, what is the probability of observing a test statistic as far or further from zero, in our sample of data? This is the p value.
Thanks CowboyBear - just giving it further thought cleared up the air. So -p value is the probability that our finding was by chance error GIVEN that we are working on the assumption that NULL hypothesis is true. And if the p value is high it means that there is a higher probability that our finding is due to a chance and hence less of an evidence to reject null hypothesis so we say we FAIL to reject the null hypothesis. On the other hand, if the p value is lower, we say that it is enough of a statistically significant evidence that our finding is not by chance and that is why we reject the null hypothesis in favor of the alternative hypothesis. Correct ?

Also, alternative hypothesis will essentially be the claim we are trying to test/prove. The null would be the opposite of it?

Thanks again
 
Last edited:

CowboyBear

Super Moderator
#10
Dear CowboyBear,

Great job! And I agree with trinker, would you be so kind to make this a FAQ?
Thanks! Will do a bit later today - I will neaten up the formatting a bit.

But we don't now the direction of the nonzeroness.
Absolutely :) But let's say we actually just wish to determine the direction of an effect, and we're willing to say that an hypothesis of an exactly zero effect is implausible enough to disregard. Is a conventional 2-tailed significance test a good way to go about doing this?

So -p value is the probability that our finding was by chance error GIVEN that we are working on the assumption that NULL hypothesis is true.
Sounds good to me!

And if the p value is high it means that there is a higher probability that our finding is due to a chance
This isn't quite right. When you say "probability that our finding is due to chance", what you're really saying is "the probability that the null hypothesis is true". And that isn't what the test tells us (unfortunately). If the p value is high, it means that if the null hypothesis is true, then the finding we've come up with would not be too unlikely. (If by "finding" we mean a test statistic as or more extreme than that observed). So we don't reject the null hypothesis.

Also, alternative hypothesis will essentially be the claim we are trying to test/prove. The null would be the opposite of it?
Kind of! Often the actual hypothesis implied by the theory the researcher is testing will be that the relationship or effect goes in a particular direction - e.g., that a correlation is positive rather than negative. So the hypothesis the researcher is actually trying to test will often not line up exactly with the default alternative hypothesis (that the relationship or effect is not zero, regardless of direction). And by extension the null hypothesis is not quite the opposite of the hypothesis actually implied by the researcher's theory.

This is not the only possible scenario though. In other cases the null hypothesis may actually be the hypothesis that the researcher favours. For example, a researcher doing structural equation modeling may test a null hypothesis that the covariance matrix implied by her model is exactly equal to the population covariance matrix. In this case she wants the p value to be greater than 0.05.
 

Karabiner

TS Contributor
#12
Originally Posted by Karabiner
But we don't know the direction of the nonzeroness.
Absolutely :) But let's say we actually just wish to determine the direction of an effect, and we're willing to say that an hypothesis of an exactly zero effect is implausible enough to disregard. Is a conventional 2-tailed significance test a good way to go about doing this?
I thought so. What would be the alternatve?

With kind regards

K.
 
#13
FAQ'd

Suggestions for modifications are welcome :)
I would have liked it to be a little less "born-again Bayesian" ;-)... this is because I worry it may confuse people struggling with the concept of p-values: "If it's all so wrong then why on earth must I learn to do this?". However, ethically it is good that students learn the flaws in the methods early on and therefore I suggest you keep it as it is.

However, could you add the standard FAQ ending ;

HTML:
[I]If you're still having trouble with this topic feel free to start a thread on the [URL="http://www.talkstats.com/index.php"]forum[/URL], and be sure [URL="http://www.talkstats.com/showthread.php/14960-Forum-Guidelines-Smart-posting-behavior-pays-off"]to check out our guidelines for efficient posting[/URL].[/I]
 
Last edited:

CowboyBear

Super Moderator
#14
I would have liked it to be a little less "born-again Bayesian" ;-)... this is because I worry it may confuse people struggling with the concept of p-values: "If it's all so wrong then why on earth must I learn to do this?".
Heh. It's a tricky one. I quite deliberately wanted to provide an explanation of a p value that was aimed at beginners and notes the flaws with the system. Usually we only learn about the problems and misconceptions further down the track, and I'm not sure that's best - maybe it's one reason why there are so many misconceptions about p values floating around. But maybe I went too far the other way. The paragraph headed "Now you may be left with questions such as" could be deleted or softened, maybe?

It's not meant to be preaching the Bayesian message, btw ;) I think there are real problems with the specific form of significance testing that's dominant nowadays. But there are plenty of alternatives to it, some frequentist and some Bayesian, and it's an open question as to what the best replacement is (or whether there is a single best replacement)...

However, could you add the standard FAQ ending
Done :)
 

CowboyBear

Super Moderator
#15
I thought so. What would be the alternatve?
Hi Karabiner, I think this is a really interesting issue.

So let's say we're interested in two competing hypotheses about a parameter:
H1: The parameter is positive
H2: The parameter is negative

Then the immediate problem with a 2-tailed null hypothesis significance test is that it conditions on an hypothesis (that the parameter is zero) that isn't one of the hypotheses we are testing. More practically, it doesn't give us a direct measure of comparative evidence for one hypothesis in comparison to the other. The upshot is that when we get a non-significant result, all we can say is "I don't know", when if we had a quantitative measure of evidence for one hypothesis we might well be able to say that one is actually quite a lot better supported than the other.

One alternative for this specific scenario could be to select a prior distribution, use Bayesian estimation, and then see what proportion of the posterior probability distribution falls within the positive range.
 

Karabiner

TS Contributor
#16
The upshot is that when we get a non-significant result, all we can say is "I don't know",
That will happen if the population effect is small and/or the sample size is small.
But will a Bayesian procedure produce more evidence under such circumstances?
I could imagine that there have been direct comparisons / simulations / whatever
between the approaches and their abilty to produce useful results.
One alternative for this specific scenario could be to select a prior distribution, use Bayesian estimation, and then see what proportion of the posterior probability distribution falls within the positive range.
If this is more efficient than the frequentist hypothesis testing approach, then I hope that it will once become standard practice.

With kind regards

K.
 

CowboyBear

Super Moderator
#17
That will happen if the population effect is small and/or the sample size is small.
But will a Bayesian procedure produce more evidence under such circumstances?
I'm not quite sure if you could say that it produces 'more' evidence. But it would tell you what the evidence for each hypothesis actually is, instead of subjecting you to an arbitrary decision rule. :)
 

Karabiner

TS Contributor
#18
I'm not quite sure if you could say that it produces 'more' evidence. But it would tell you what the evidence for each hypothesis actually is, instead of subjecting you to an arbitrary decision rule. :)
I do not totally disagree, but I don't know if in some typical research situations
the Bayesian approach is much less arbitrary. The smaller the sample size, the
more the results depend on the prior distribution chosen?

Maybe one could say that, given the disasters within medicine, psychology,
fMRT, genetics etc., it would be better to honestly declare that a decision
is not possible instead of presenting irreproducible "results".

With kind regards

K.
 
#19
A very quick analogy is with a jury trial. The prosecution presents evidence, perhaps disputed by the defense. The jury must then decide if the defendant is guilty "beyond reasonable doubt." A p vaalue is simply a measure of that doubt. p = .08, the hypothesis is probably true (the defendant is probably guilty), but I've got reasonable doubt. p = .000001, the hypothesis might be false (the defendant might possibly be innocent) but I do not have reasonable doubt.
 

CowboyBear

Super Moderator
#20
A very quick analogy is with a jury trial. The prosecution presents evidence, perhaps disputed by the defense. The jury must then decide if the defendant is guilty "beyond reasonable doubt." A p vaalue is simply a measure of that doubt. p = .08, the hypothesis is probably true (the defendant is probably guilty), but I've got reasonable doubt. p = .000001, the hypothesis might be false (the defendant might possibly be innocent) but I do not have reasonable doubt.
No, sorry, but this is completely wrong. A p value does not tell you the probability that the hypothesis is true. Please see the rest of the posts on this thread.