I have a historical dataset of about 500 measurements. Datapoints that turned out to indicate A or B, each have a distribution from which I computed mean and std. If I plot a histogram of these measurements, I have two semi-overlapping gaussian distributions. Those measurements that indicated A are significantly different from those that indicate B, using a T test, with some incredibly low p value, due to the high n.
I believe this tells me the accuracy with which I can predict A or B from measurements. For future tests, I cannot make 500 measurements - probably more like 5 to 10, from a sample that I know must indicate either A or B, but I don't know which. I would like to know how many measurements I need to make, to be able to predict if those samples were taken from A or B, at a 95% confidence (p<0.05).
How do I go about formulating this problem?
Please be kind... statistics is not my field... But I really need help! Thank you.
I believe this tells me the accuracy with which I can predict A or B from measurements. For future tests, I cannot make 500 measurements - probably more like 5 to 10, from a sample that I know must indicate either A or B, but I don't know which. I would like to know how many measurements I need to make, to be able to predict if those samples were taken from A or B, at a 95% confidence (p<0.05).
How do I go about formulating this problem?
Please be kind... statistics is not my field... But I really need help! Thank you.