# Probability on a normal curve

#### AJ Laura

##### New Member
Hey all - It's been a long time since I was in school looking at probability and statistics. i have a bit of a difficult question for me. I have two sets of data: Data Set A has a Mean of 7.9 and a standard deviation of 0.79. The other set of Data, Set B, has a mean of 10.1 and a STD Dev of 1.12. What I found interesting about this is that if you take one std dev above the mean of Set A you get 8.69. If you take one Std Dev below the mean of set B you get 8.98. I was wondering if you had a normal set of data and random, independent results, what is the probability that as you ran the calculations that Set A numbers would be larger then Set B? There is overlap obviously on the two sets if they were normal curves (which i assume) but i am not sure how to figure out the probability that Set A would be larger then Set B. Thanks for any help.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
You should investigate using the t-test. If the residuals (the individual observation differences from the combined sample mean) are not normally distributed, the non-parametric equivalent to the ttest would be the Wilcoxon Rank Sum test. The former is fairly easy to calculate by hand.

#### AJ Laura

##### New Member
Thanks HL. I thought the t-test was used to determine if the sets of Data are statistically different from each other? I was going on the assumption that they were (although not totally correct, it works for what i am doing). Assuming that, is there an easy way to calculate the probability of a random number from Set A being larger then a random number from set B?

#### BGM

##### TS Contributor
If you are talking about two independent normal random variables $$X, Y$$(which they will be jointly follows a multivariate normal distribution), then you can definitely calculate

$$\Pr\{X < Y\} = \Pr\{X - Y < 0\}$$

as $$X - Y$$ just follows another normal distribution.

If you are talking about uniformly sample a number from each of two sets, then it is a different story.

#### AJ Laura

##### New Member
BGM - That is exactly what i am looking for...the probability that X-Y<0. However, I don't know how to calculate that. Is there a simple way to do that? I see your point that it would follow a normal curve but what would be the mean and std dev of this new curve and how is that calculated? or do i take my sample set and subtract them and just have excel calculate? I feel like i am missing something

#### BGM

##### TS Contributor
You just need to use the standard results from affine transformation of multivariate normal.

If $$X \sim \mathcal{N}(\mu_X, \sigma^2_X), Y \sim \mathcal{N}(\mu_Y, \sigma^2_Y)$$ and $$X, Y$$ are independent, then

$$X - Y \sim \mathcal{N}(\mu_X - \mu_Y, \sigma^2_X + \sigma^2_Y)$$

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Not trying to gum up the thread, but still a little unclear on your intentions. Side note, if you are trying to see if the differences between two matched variables are or are not equal to "0" you can use the Wilcoxon Signed Rank Test.

Otherwise, if not, I will let you follow the grand wisdom of BGM.

#### BGM

##### TS Contributor
Yes it depends on the intention of OP.

If he wants to compare the mean/median of the two population, then it is hypothesis testing;

If he wants to calculate the probability based on a theoretical model, he will follow my approach;

If he wants to have an estimation of the probability that which of the next observation is larger, based on the current observation of two set of data, then it also require to estimate the parameter first.

#### AJ Laura

##### New Member
Thanks All. I guess i was not good at explaining. Assuming my two sets of data follow a normal curve with a different medians and Std Dev's I was just trying to figure out if there was a way to determine what the probability is that a random point in Set A will be be greater then a random point in Set B. My goal was to say (and I know this is not entirely correct) "From this data, a random number from, Set B is higher then Set A, x% of the time". It sound like it can be done mathematically but perhaps not done easily through excel or my similar tools.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Read a description of the Wilcoxon Rank Sum test and see if this is close to what you want to do.