The explanation below originally comes from this thread, with some modifications. It describes "conventional" null hypothesis significance testing: i.e. a hybrid of Fisherian and Neyman-Pearson testing, with 2-tailed p values used for inference.
What is a p value?
Imagine we are interested in some parameter: say, a correlation.
We would like to know whether the value of this parameter in a particular population is zero or not. So we specify a couple of hypotheses about this parameter that we will test:The null hypothesis: The parameter is exactly equal to zero in the populationAlthough our interest is in the population, we don't have unlimited time and money, so can't get data from every member of the population. So we draw a sample from the population, and calculate a test statistic that is an estimate of the population parameter. For example, a Pearson product-moment correlation coefficient.
The alternative hypothesis: The parameter is not equal to zero in the population
Now chances are, even if the population parameter is exactly zero, our sample statistic would not be exactly zero, due to "chance" - or, more specifically, sampling error). So we ask the following question:
If the null hypothesis is actually true, what is the probability of observing a test statistic as far or further from zero than that observed in our sample of data? This is the p value.
An example of a p value: If the true value of a correlation between two variables in a population is actually zero, the probability of observing a correlation of 0.2 or greater in a random sample of 30 people from the population (p value) is 0.289.
If the p value is "small" (usually the cutoff or "alpha level" is 0.05) we say that we can reject the null hypothesis. In turn this means that we can support the alternative hypothesis. When this happens, we often describe the finding as "statistically significant". On the other hand, if the p value is above 0.05, we cannot reject the null hypothesis. Note that a p value larger than the 0.05 cutoff is not evidence that the null hypothesis is true; it just means we haven't got enough evidence to reject it yet.
Essentially the logic here is that if the data would be unlikely if the null hypothesis was true, we therefore think that the null hypothesis itself must be unlikely, and reject it. (This logic is admittedly questionable: see Gill, 1999).
Appendix 1: Things that a p value is not:
- The probability that the null hypothesis is true
- The probability that the alternative hypothesis is false
- The probability that the finding will be replicated
- The probability that the finding was 'due to chance'
- The probability of incorrectly rejecting the null hypothesis (i.e. the probability of a 'Type I error')
Appendix 2: Some papers critically discussing the use of p values
- The earth is round (p < .05) - Cohen, 1994
- The insignificance of statistical significance testing - Johnson, 1999
- The null ritual - Gigerenzer, Krauss & Vitouch, 2004
- Scientific method: Statistical errors - Nuzzo, 2014
- The ASA's statement on p-Values: Context, process, and purpose - Wasserstein & Lazar, 2016
- Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations - Greenland et al., 2016
And for balance, a paper providing an epistemological justification for significance tests and showing how they can be used as a severe test of hypotheses:
Error statistics - Mayo & Spanos, 2011
If you're still having trouble with this topic feel free to start a thread on the forum, and be sure to check out our guidelines for efficient posting.
Last edited by CowboyBear; 01-25-2017 at 05:00 PM. Reason: Fixed broken links, added article titles, added 3 new sources.
Advertise on Talk Stats