Comparison of two data sets

Dafrst

New Member
Hello,
I'm trying to compare two normally distributed data sets that have roughly equivalent standard deviations (at most a factor of 2 different). I'm using the T test to determine a statistical difference between these populations but the p value being returned is often on the order of 10^-200 or smaller. Each data set has 300 elements so there are 598 degrees of freedom. I've read that large data sets can lead to a Type-I error, showing that there is a very significant difference in some small parameter. Can anyone suggest a way to show a statistical difference between two large populations that avoids errors?
Thanks,
Dan

Miner

TS Contributor
You are confusing statistical significance with practical significance. Given a large enough data set, ALL differences will be significant. What you want is to design a study where the practical significance = the statistical significance. There is only one approach to ensure that the two are equal. Start with establishing the Delta. Delta is that difference between two samples that is of practical significance.

What is practical significance? It is the minimum difference that would generate enough interest that someone (other than the government) is willing to fund additional research. Example, a new therapy might reduce cancer by 0.004%. Ho hum. On the other hand, another might reduce it by 30%. Okay, NOW I'm interested!

Once you have determined delta, establish alpha, power and the baseline standard deviation, then calculate the required sample size. This will ensure that statistical significance = practical significance.