Preface

The explanation below originally comes from this thread, with some modifications. It describes "conventional" null hypothesis significance testing: i.e. a hybrid of Fisherian and Neyman-Pearson testing, with 2-tailed p values used for inference.

What is a p value?

Imagine we are interested in some parameter: say, a correlation.

We would like to know whether the value of this parameter in a particular population is zero or not. So we specify a couple of hypotheses about this parameter that we will test:

Although our interest is in the population, we don't have unlimited time and money, so can't get data from every member of the population. So we draw a sample from the population, and calculate a

Now chances are, even if the population parameter is exactly zero, our sample statistic would

If the null hypothesis is actually true, what is the probability of observing a test statistic as far or further from zero than that observed in our sample of data?

An example of a p value: If the true value of a correlation between two variables in a population is actually zero, the probability of observing a correlation of 0.2 or greater in a random sample of 30 people from the population (p value) is 0.289.

If the p value is "small" (usually the cutoff or "alpha level" is

Essentially the logic here is that if the

Appendix 1: Things that a p value is

Appendix 2: Some papers critically discussing the use of p values

And for balance, a paper providing an epistemological justification for significance tests and showing how they can be used as a severe test of hypotheses:

Error statistics - Mayo & Spanos, 2011

The explanation below originally comes from this thread, with some modifications. It describes "conventional" null hypothesis significance testing: i.e. a hybrid of Fisherian and Neyman-Pearson testing, with 2-tailed p values used for inference.

What is a p value?

Imagine we are interested in some parameter: say, a correlation.

We would like to know whether the value of this parameter in a particular population is zero or not. So we specify a couple of hypotheses about this parameter that we will test:

**The null hypothesis**: The parameter is

*exactly*equal to zero in the population

**The alternative hypothesis**: The parameter is not equal to zero in the population

Although our interest is in the population, we don't have unlimited time and money, so can't get data from every member of the population. So we draw a sample from the population, and calculate a

**test statistic**that is an*estimate*of the population parameter. For example, a Pearson product-moment correlation coefficient.Now chances are, even if the population parameter is exactly zero, our sample statistic would

*not*be exactly zero, due to "chance" - or, more specifically, sampling error). So we ask the following question:If the null hypothesis is actually true, what is the probability of observing a test statistic as far or further from zero than that observed in our sample of data?

**This is the p value.**An example of a p value: If the true value of a correlation between two variables in a population is actually zero, the probability of observing a correlation of 0.2 or greater in a random sample of 30 people from the population (p value) is 0.289.

If the p value is "small" (usually the cutoff or "alpha level" is

**0.05**) we say that we can*reject*the null hypothesis. In turn this means that we can support the alternative hypothesis. When this happens, we often describe the finding as "statistically significant". On the other hand, if the p value is*above*0.05, we cannot reject the null hypothesis. Note that a p value larger than the 0.05 cutoff is**not**evidence that the null hypothesis is true; it just means we haven't got enough evidence to reject it yet.Essentially the logic here is that if the

*data*would be unlikely if the null hypothesis was true, we therefore think that the null hypothesis itself must be unlikely, and reject it. (This logic is admittedly questionable: see Gill, 1999).Appendix 1: Things that a p value is

*not*:- The probability that the null hypothesis is true
- The probability that the alternative hypothesis is false
- The probability that the finding will be replicated
- The probability that the finding was 'due to chance'
- The probability of incorrectly rejecting the null hypothesis (i.e. the probability of a 'Type I error')

Appendix 2: Some papers critically discussing the use of p values

- The earth is round (p < .05) - Cohen, 1994
- The insignificance of statistical significance testing - Johnson, 1999
- The null ritual - Gigerenzer, Krauss & Vitouch, 2004
- Scientific method: Statistical errors - Nuzzo, 2014
- The ASA's statement on p-Values: Context, process, and purpose - Wasserstein & Lazar, 2016
- Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations - Greenland et al., 2016

And for balance, a paper providing an epistemological justification for significance tests and showing how they can be used as a severe test of hypotheses:

Error statistics - Mayo & Spanos, 2011

*If you're still having trouble with this topic feel free to start a thread on the forum, and be sure to check out our guidelines for efficient posting.*
Last edited: