What exactly was your goal? And we would pretty much never expect a p-value distribution to be normally distributed.
I found an old piece code I wrote. I had a logistic model with a multiplicative interaction term consisting of two binary variables plus model covariates. For simplicity, I will say the sample size was 200. The interaction term was approaching significance and made contextual sense.
I ran a bootstrap of a few thousand resamples and made the sample rate 300% or 600 observations instead of the original 200. I then reran the model with the 3000 samples and graphed out all generated interaction term p-values. Next, I looked to see if the Bootstrap 95% CI excluded values > 0.05. I did all of this as a pseudo sample size calculation. I will mention that the created p-value distribution was not no normally distributed. It was very positively skewed, with most values near 0 and a very few trickling down toward 1. Obviously, it was bounded. The mean of the distribution was 0.07 and median 0.01.
I believe my rationale for this approach was that I had difficulties with calculating the required sample size given more traditional approaches based on the multiple logistic model. What are critiques of this approach and alternatives?
What exactly was your goal? And we would pretty much never expect a p-value distribution to be normally distributed.
I don't have emotions and sometimes that makes me very sad.
The goal was to see if the sample size was three times larger, whether the SE would shrink up and the same effect would be statistically significant. Based on the SE should reduce in the presence of a larger sample regardless of the approximate stagnant effect size.
I know the approach is flawed in some regard, but I need some one to point out the issues.
Some variables are not symmetric under the sampling distribution. Are p-values an issue since they are derived from another distribution and/or bounded?
Ah, the p-value is a random variable (which is conditional on the null hypothesis), which means it has a sampling distribution. I read that when the null is true it will take on a uniform distribution, however when the alternative is true it takes on a positively skewed distribution. Like mine, see below. So what issues may arise from my approach in general? I am wondering about the sampling rate (e.g., 300) and using 95% on a non-normal distribution?
For fun, I log transformed the above distribution and got close to normal. Though, as expected that did not change my 2.5% and 97.5 estimates (when transforming back using e).
I'm not entirely sure why you're worrying about quantiles here. Typically if you generate the p-value distribution for this reason you look at the estimated power. So you would want to look at what percentage of those p-values fall below your cutoff (0.05).
I don't have emotions and sometimes that makes me very sad.
EDITED: You told me this once before, 1.5 years ago - perhaps.
So I wrapped my head around this. Would I be concerned about Type I Error, and how do you address that?
Tweet |