How many measurements needed to distingish between two outcomes?

I have a historical dataset of about 500 measurements. Datapoints that turned out to indicate A or B, each have a distribution from which I computed mean and std. If I plot a histogram of these measurements, I have two semi-overlapping gaussian distributions. Those measurements that indicated A are significantly different from those that indicate B, using a T test, with some incredibly low p value, due to the high n.

I believe this tells me the accuracy with which I can predict A or B from measurements. For future tests, I cannot make 500 measurements - probably more like 5 to 10, from a sample that I know must indicate either A or B, but I don't know which. I would like to know how many measurements I need to make, to be able to predict if those samples were taken from A or B, at a 95% confidence (p<0.05).

How do I go about formulating this problem?

Please be kind... statistics is not my field... But I really need help! Thank you.


Less is more. Stay pure. Stay poor.
Just to clarify, you have a:

Sample of 500 observations.
500 observations can be classified as "A" or "B".
You stratified the sample based on the two groups
Then compared the means of a different variable (continuous) between the two groups and found a significant difference.
So now you are thinking that the value of this variable (continuous) may provide information about whether the observation is an "A" or "B"?

Is this correct? And you want to know if you acquire subsequent small samples, how you can tell if they came from "A" or "B"? Can the sample have some As and Bs?

In my mind, I have been assuming that this post is related to your prior post, since you are using a sample of 500. Do you just want to see if a value of the continuous variable predicts outcome (A or B)? If so, this may get back to looking for a threshold for the continuous variable. Have you tested to make sure the continuous variable is normal and suitable for the TTest?
This post is related to my other question.
I was trying to simplify my question somewhat: basically, my method just needs to distinguish between maximum disease (for any given experiment, the control group that didn't receive treatment), and at least 20% effect (drop of at least 20% of the disease score).
I plotted my historical data that could easily be framed into this format (not all experiments were designed this way) and asked as follows: what were the scores (new method) that were given to the sample that had a Gold standard score that corresponded to maximum disease, vs at least 20% effect? This gave me two gaussian distributions of the new method scores. They overlap quite a bit, but they are "normal" in appearance. I fitted a normal distribution to them in MATLAB and it looks like a good fit- maybe a bit of a tail to the right.

So now my question was this post: how many samples do I need, to be happy that my ability to distinguish between 100% and maximum 80% disease is sufficient with the new method? I wrote a simulation: using Matlab, where I used the mu and std from the fitted gaussians to generate 2 normally distributed random samples, where one was from the 100%, and other, from the 80% diseased groups. I ran the T test with each of these groups (and did this 10,000 times), to find the number of samples for which the average p value is just below 0.05.

I think there might be a more formal way to do this using 'power analysis' but I can't find a formula, and was wondering if my simulation would be acceptable.