#1
I was speaking with a classmate about subject heading for this thread and it made me a bit confused. Here are my thoughts:
the way I explain p value: probability of observing effect when, in reality, there is no effect. This translates to probability of data given the null (null=there is no effect!). I was thinking that this is the same as saying "the probability of a false positive". But then I was told that it's wrong, because p value is NOT probability of an error. This is where I am confused. Type 1 error is probability of false positive, and alpha is accepted max level of Type 1 error.

With this definition of p value: prob(data) | null, let's say that we have our cutoff alpha at .05. We observe p = .04, and so we say there is a 4% probability that we observed our data when there isn't an effect (null is true). I can reword this to say "there is a 4% probability of a false positive", or "every 4 out of 100 times I do experiment I will find effect of this size or more, supporting my alternative hypothesis".

To me, saying "p=prob(data) | null" is exactly the same as saying p is the probability of observing a false alarm (which by definition is noise incorrectly identified as signal). What is my misunderstanding?
 
#2
Your definition of p value is incorrect and conflates alpha. The pvalue is the probability of obtaining a result at least as extreme as the observed one, assuming a true null hypothesis. This tells you, in simple terms, how much your observation disagrees with the null hypothesis. As you can see it mentioned nothing about errors or observing something “fake”. It’s a summary of evidence compatibility with a particularly null hypothesis.

Using this definition, what do you think now?
 
#3
Your definition of p value is incorrect and conflates alpha. The pvalue is the probability of obtaining a result at least as extreme as the observed one, assuming a true null hypothesis. This tells you, in simple terms, how much your observation disagrees with the null hypothesis. As you can see it mentioned nothing about errors or observing something “fake”. It’s a summary of evidence compatibility with a particularly null hypothesis.

Using this definition, what do you think now?
Hmm I think it's starting to clear up, but I'm still uncertain about some things. So if we had p=.04 for example, then would we state that there is 96% chance that if you replicated the study you would observe an effect of at least this? OR in other words (for simplicity), if we ran the study 100 times, we would expect to find at least this effect 96 times?

Regarding the error thing I think that makes sense now, because we can't use p to say anything about an error as it is simply a probability. So in order to say something about an "error" or "false positive" then we need to declare the significance threshold to compare with our p value? In which case if it exceeds the threshold then our evidence is deemed too low to reject the null. So then my question is, when p doesn't exceed alpha, what can we conclude about type 1 error in this case, and how meaningful is this given that alpha is set arbitrarily? I am trying to understand at what point we start to consider type1 errors.
 

Karabiner

TS Contributor
#4
So if we had p=.04 for example, then would we state that there is 96% chance that if you replicated the study you would observe an effect of at least this?
If H0 is true, then 96% of replications would have results less extreme.
If H0 is not true, then you do not know the percentage of replications this extreme (or less extreme).

Unfortunately, we do not know whether H0 is true (ok, usually it is not exactely true, I suppose), or, if it is not true, how strong the actual effect is.

when p doesn't exceed alpha, what can we conclude about type 1 error in this case, and how meaningful is this given that alpha is set arbitrarily?
If by "doesn't exceed" you mean (e.g.) p > 0.05, then there's no type 1 error possible, since H0 is retained.

With kind regards

Karabiner
 

hlsmith

Omega Contributor
#5
I will also add what a usually tack on:

"probability of obtaining a result at least as extreme as the observed one, assuming the null hypothesis is true" and correct model specification.

Most all models are assumed incorrect, but can be informative in some regards. Unless you know all of the factors affecting the outcome and their structures and relationships - you are fitting a functional form that you think is representative.
 
#6
if we had p=.04 for example, then would we state that there is 4% chance that if you replicated the study you would observe an effect of at least this?
I mistyped this, it should be as quoted now.

By "doesn't exceed" I meant p < .05.

I think it's beginning to make sense. The p value is a probability of obtaining result of at least the detected size, given that the null is true. It doesn't say anything about error because there is not yet an alpha threshold applied to it. But then when we do apply threshold, why can we suddenly start talking about type 1 errors?


I will also add what a usually tack on:

"probability of obtaining a result at least as extreme as the observed one, assuming the null hypothesis is true" and correct model specification.

Most all models are assumed incorrect, but can be informative in some regards. Unless you know all of the factors affecting the outcome and their structures and relationships - you are fitting a functional form that you think is representative.
Ok, so accepting that we may be biased in some unknown way, or that we may be unaware of some factor that we should include in our model, or I misunderstand?

I apologize being slow with all this, and many thank you's for the discussion!
 
#7
I will also add what a usually tack on:

"probability of obtaining a result at least as extreme as the observed one, assuming the null hypothesis is true" and correct model specification.

Most all models are assumed incorrect, but can be informative in some regards. Unless you know all of the factors affecting the outcome and their structures and relationships - you are fitting a functional form that you think is representative.
I figure that a true null encompasses all the assumptions, including model specification? If I test b1=0 for a slope in SLR, I’m assuming that b1 is the slope of a model where only x1 is in the true model. But yeah, it’s less obvious that your model needs to be correct, but really won’t be in most cases.
 

hlsmith

Omega Contributor
#8
Closer to what I am getting at using your example is, say the SLR model does not meet assumptions and there is a linear relationship with a curvature.

So for you, imagine fitting a linear model to the associations between alcohol and heart disease or maybe all-cause mortality. Which may be best represented by a "J" distribution and cubic spline model. Given misspecification, this changes things on whether the true effect is identifiable.
 
#9
It doesn't say anything about error because there is not yet an alpha threshold applied to it. But then when we do apply threshold, why can we suddenly start talking about type 1 errors?
It doesn't say anything about an error at any time, pre or post alpha, by itself. The statistical theory behind a p-value has nothing to do with error rates.
Once you make a decision based on alpha and a p-value, you're correct or incorrect. In Frequentist statistics, it's irrelevant that you most often don't know whether you're wrong in the sense that it's either probability 0 or 1; just because you don't know if you made an error doesn't mean the probability is between 0 and 1.

You should dissociate the concepts of p-value and type I errors, in general. Type I errors are associated with the distinct concept of alpha.