# Thread: Is it meaningful to calculate a p-value based on a convenience sample?

1. ## Is it meaningful to calculate a p-value based on a convenience sample?

Ignore the bootstrap as a possibility.

If the answer is no, then what is the point of stating the p-value (especially if it is, say, 0.10)?

2. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

It would depend on how biased the sample was or wasn't and whether you could speculate the directionality of the bias based on a validation sample using quantitative bias analysis.

3. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

I don't know if it is valid or not to calculate a p value. What is not valid is to say that a p value of a convenience sample tells you anything at all about a larger population. And usually that is what you are interested in.

People have a habit of confusing sampling with statistics like say p values. They really are very different topics.

4. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

Originally Posted by hlsmith
It would depend on how biased the sample was or wasn't and whether you could speculate the directionality of the bias based on a validation sample using quantitative bias analysis.
hl,

I'm assuming that there has been no validation sample and there is no information on bias. With that qualification, what do you think, and why?

5. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

Originally Posted by noetsi
What is not valid is to say that a p value of a convenience sample tells you anything at all about a larger population. And usually that is what you are interested in.
Can U clarify? Doesn't a p-value state the probability of getting a statistic more extreme than a certain value given the null hypothesis (assuming a classical model)? Perhaps I misunderstand your point.

6. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

Your interpretation of the p-value is correct, but I think Noetsi is referencing the generalizability a conclusion based on the p-value to another sample (with a different sampling strategy) or larger population - with the possibility that they could have different attributes.

7. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

It depends what you're trying to use the p value for. p values aren't always used to make inferences about populations. They can also be used to make inferences about causal effects within a sample, with random assignment to conditions.

E.g., you might interpret a p value from a randomized experiment as the probability of obtaining a test statistic as more extreme than that observed under a null hypothesis that the IV had no effect whatsoever on the sample (and with the auxiliary assumption of a lack of confounding).

If you're using a p value to make inferences about a larger population, well, you can, but only if the assumptions of the statistical method you're using hold true. For example, let's say we're interested in testing a very simple hypothesis: That the mean of variable Y in population P takes a particular value (say, 10). Then a one-sample t-test allows us to test this hypothesis, and the p value will be "valid" provided that the following assumptions hold true:

1. Observations are independent
2. If we were to conduct repeated samplings, the sample means would have the same expected value as the population mean. (I.e., some sample means will be higher than the true population mean, and some lower, but in the long run over a very large* number of samplings, the average of the sample means would be the same as the population mean).

If you use random sampling, assumption (2) will be true (barring the presence of systematic measurement error). But with convenience sampling, there is no reason whatsover to think that assumption (2) is true. In fact, it's barely even meaningful: What does it even mean to conduct repeated samplings, if the sampling method is not clearly defined?

TL;DR yes you can interpret p values based on convenience samples, but only by making assumptions, assumptions which are probably false and perhaps even meaningless in a convenience sample. But everyone does so anyway.

*Ok, infinite.

8. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

[QUOTE=CowboyBear;183914]It depends what you're trying to use the p value for. p values aren't always used to make inferences about populations. They can also be used to make inferences about causal effects within a sample, with random assignment to conditions.

E.g., you might interpret a p value from a randomized experiment as the probability of obtaining a test statistic as more extreme than that observed under a null hypothesis that the IV had no effect whatsoever on the sample (and with the auxiliary assumption of a lack of confounding).

If you're using a p value to make inferences about a larger population, well, you can, but only if the assumptions of the statistical method you're using hold true. For example, let's say we're interested in testing a very simple hypothesis: That the mean of variable Y in population P takes a particular value (say, 10). Then a one-sample t-test allows us to test this hypothesis, and the p value will be "valid" provided that the following assumptions hold true:

1. Observations are independent
2. If we were to conduct repeated samplings, the sample means would have the same expected value as the population mean. (I.e., some sample means will be higher than the true population mean, and some lower, but in the long run over a very large* number of samplings, the average of the sample means would be the same as the population mean).

If you use random sampling, assumption (2) will be true (barring the presence of systematic measurement error). But with convenience sampling, there is no reason whatsover to think that assumption (2) is true. In fact, it's barely even meaningful: What does it even mean to conduct repeated samplings, if the sampling method is not clearly defined?

TL;DR yes you can interpret p values based on convenience samples, but only by making assumptions, assumptions which are probably false and perhaps even meaningless in a convenience sample. But everyone does so anyway.

COWBOY,

It appears that we agree as to the meaningless nature of a p-value derived from a convenience sample (in the absence of support for certain assumptions). How odd, that everyone does so anyway. As in, "Let's exercise precise reasoning based on unverifiable assumptions!" Whoo hoo!

9. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

I was referring to how generalizable your sample was to a larger sample (commonly a population). For a convenience sample you can not generalize to anything outside the convenience sample. Its not that the p value is wrong, its that the sample its calculated on tells you nothing. So the p value tells you nothing (well other than what the convenience sample tells you).

Personally I consider convenience samples worthless although they are commonly used. Systematic errors, for example people who care strongly about the topic are more likely to comment, are inherent in them.

10. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

Originally Posted by woodbomb
It appears that we agree as to the meaningless nature of a p-value derived from a convenience sample (in the absence of support for certain assumptions). How odd, that everyone does so anyway. As in, "Let's exercise precise reasoning based on unverifiable assumptions!" Whoo hoo!
Yeah, it's interesting. I suspect that a small part of the problem may be the fact that random sampling isn't explicitly an assumption of most statistical analyses, so it's easier to ignore. E.g., if we think of the assumptions of a linear model estimated via OLS:
1. Errors have conditional mean zero for any combination of values of the predictors
2. Errors have the same variance for any combination of values of the predictors
3. Error terms are independent
4. Error terms are normally distributed.

Random sampling doesn't appear anywhere on the list. But assumption 1 - by far the most important assumption - is almost certainly going to be breached if you don't have random sampling: It could well be the case that you systematically tend to select people who have positive errors for a particular combination of values of the predictors, or whatever. And then your estimates will be biased.

But although people tend to have a hazy idea that random sampling is a good thing, that connection between sampling and assumptions isn't usually drawn explicitly in most texts, so it's easier to ignore.

On a different level, I think you can also look at significance testing as a kind of weird social practice. Gerd Gigerenzer calls significance testing "the null ritual", which makes a lot of sense to me. Most researchers aren't quite sure what a significance test actually tells them even when its assumptions are met, and wouldn't care about the answer even if they did; but they do significance tests anyway, because that's what you do when you a statistical analysis. I.e., it's a ritual - not an investigation.

11. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

Random sampling, generalizability, really has nothing to do with statistics per se. That is why it is not part of say the Gauss Markov assumptions. All the assumptions go to is whether the method estimates correctly on a given data base. They do not address, statistics does not address generally, whether you can use a given statistic to analyze a specific sample or population from another sample.

Sampling is a distinct field from statistics and has its own entirely separate set of assumptions. People tend to assume they are the same field when they are not.

12. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

Also for consideration is the size of the convenience sample given population, etc.

13. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

I am sure there is disagreement on this, but I don't think a convenience sample ever can be used to generalize to a larger population regardless of N. Nor does having a larger n increase its usefulness (at least in theory). They key is how you sample not how many you sample. If your sample is biased, having more biased people (in terms of estimating a true mean value - not that they have personal bias )does not make the situation better.

14. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

Equating a convenience sample as biased is unfounded. It definitely has a tendency to be biased in regards to the population's characteristics. Remember you can generalize results to members of other samples comparable based on the same types of collection strategies. Also, data from convenience samples can provide good information and pvalues can be tested within them.

Say I was only able to sample wealth people who work at a single type of business, does it not hurt to examine these individuals and their traits?

15. ## Re: Is it meaningful to calculate a p-value based on a convenience sample?

Originally Posted by hlsmith
Also, data from convenience samples can provide good information and pvalues can be tested within them.
hl,

Not sure what you mean by "pvalues can be tested within them". A p-value can be used to decide whether to reject or accept a null hypothesis, if that's what you mean.

Do you disagree with me (and COWBOY, I think) that a p-value derived from a convenience sample (in the absence of support for certain assumptions) is meaningless (since there is no evidence that the data were obtained from a RANDOM sample)?

+ Reply to Thread
Page 1 of 2 1 2 Last

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts