Hi all I hope all is great with everyone. I'm new on here so sorry to get straight down to my query and hope that someone might be able to point me in the right direction. Thank you.
So:
I have a prospects direct mail file and I'm testing a new message, (One change). To ensure it’s the message change and not the differences in the files that drive my results I want to make sure that the test and control files are similar / not significantly different from each other for my key variable/s.
The test & control files will be randomly split 50:50, using a random number generator, from the total prospect file which could range from 200 to 5000 records in total.
I believe I have two independent samples for which I want to measure the difference in the means using a two tailed test (H0: x̅1=x̅2 H1: x̅1 ≠x̅2 ) with sample sizes >30 therefore using a Z-test.
Q1: How to choose an acceptable critical value - How close to ‘similar’ or how different is acceptable? –
As long as the p-value is greater than 0.05 (95%) is that enough to accept (H0) that there are no differences between the two means or for this particular check would 90% or lower be better as the means are closer and therefore less likely to be different? Thoughts?
Q2: With that in mind do I really only want to make sure they are not significantly different from each other or is there merit in forcing them to be as similar as possible? (i.e sorting on key variable and selecting every 2nd record as the test 50%) Does adding such bias render the overall test meaningless or would it still be valid?
Q3: Would Stratified sampling be a better solution and maintain proportions within key variables/
Many thanks in advance?
Gary
So:
I have a prospects direct mail file and I'm testing a new message, (One change). To ensure it’s the message change and not the differences in the files that drive my results I want to make sure that the test and control files are similar / not significantly different from each other for my key variable/s.
The test & control files will be randomly split 50:50, using a random number generator, from the total prospect file which could range from 200 to 5000 records in total.
I believe I have two independent samples for which I want to measure the difference in the means using a two tailed test (H0: x̅1=x̅2 H1: x̅1 ≠x̅2 ) with sample sizes >30 therefore using a Z-test.
Q1: How to choose an acceptable critical value - How close to ‘similar’ or how different is acceptable? –
As long as the p-value is greater than 0.05 (95%) is that enough to accept (H0) that there are no differences between the two means or for this particular check would 90% or lower be better as the means are closer and therefore less likely to be different? Thoughts?
Q2: With that in mind do I really only want to make sure they are not significantly different from each other or is there merit in forcing them to be as similar as possible? (i.e sorting on key variable and selecting every 2nd record as the test 50%) Does adding such bias render the overall test meaningless or would it still be valid?
Q3: Would Stratified sampling be a better solution and maintain proportions within key variables/
Many thanks in advance?
Gary
Last edited: