Statistical comparisons for large sample sizes (n>1000)

I am comparing the drug exposures across two different groups, consisting of 1000 simulated drug exposures per group. Drug exposures are continuous variables following a normal distribution.
I want to know if different doses yield a statistically significant difference in mean drug exposure across the two groups. I am observing that even if I calibrate "artificially" the doses to generate very similar mean exposures in both groups, all the statistical tests will always return very low p-values despite the very low difference in the groups' means. I guess this is due to the very large sample size (n = 1000 per group).
However, if I reduce the sample size (to 50 virtual drug exposures, let's say) the exposure is very sensitive to the sampling procedure because the samples are taken from a distribution with high standard deviation compared to the mean, and repeating the same analysis on different datasets can give very different means in exposure.
Is this a case where I should focus more on the "biological relevance" of the difference rather than the significance of such difference? Can you suggest a different approach to judging the relevance of the difference based on robust criteria?


Well-Known Member
Hi Javier,

You are in the correct direction.
We should never look only at the p-value.

I want to know if different doses yield a statistically significant difference in mean drug exposure across the two groups
The significance result is only part of the equation.
This is not what you want to know :)

I don't know your area of research, but probably different dose has different drug exposures that asymptote to a constant.
So increasing the doses will increase the "drug exposure" but the difference will be smaller and smaller, limit to zero.
So at least until specific dozes, the question is not if the difference is significantly different, because it is different, and with a large enough sample size you will find it ...

The solution is not to take a smaller sample size ...
Usually, if a larger sample size cost more, you will choose the smaller sample size that will be able to identify the required effect size.
But if you already have large sample size you should use it.

The question is what is the "effect size" and is the effect is significant.
Please look for the standardized - Cohen effect size d=(avg(x1)-avg(x2))/S
Cohen define also descriptions of the values of the effect ("very small", "small", ..., "huge"), but I assume it is also related to the area of research.

You may as well decide yourself what non-standardized effect size (avg(x1)-avg(x2)) is meaningless, even if the result is a significant result.

You may look at the following example for the balance between p-value and effect size
For any trial, sample size calculation warrants the sample size archiving the pre-specified power and type 1 error. Apparently n=1000 for each exposure group is too large, even for an observational study. After the minimum sample size reaches, the more subjects enrolled the smaller p-value tends to be. For simulation study, you can choose appropriate sample size given pre-specified effect size, such as odds ratio, risk ratio and proportion difference, say 50~100 subjects in each group, then run the simulation 1000 times.