Maximum of a set of independent normal random variables

Consider a set of random variables \( X_1, X_2, X_3, \ldots, X_n \) with unit varience but different means: \( \mu_1, \mu_2, \mu_3, \ldots, \mu_n \). What is the probability that \( X_i \) will have the maximum value of the set.

My initial guess was:
\( P[X_i > X_{i'}|_{\forall i\neq i'}] = P[X_i>X_1]\cdot P[X_i>X_2]\cdots \)

But if one considers a simple case where there are 3 random variables with mean 0 then obviously the probability of one of them being the maximum is 1/3.
But using the formula above will give a probablility of 0.5*0.5=0.25.
\( P[X_1 > X_2] = 0.5 \) when \( \mu_1=0, \mu_2=0 \)

What am I doing wrong here? In the simple 3 variable example I suspect that the probabilities aren't independent or something, but I can't wrap my head around it.
Anyone have any thoughts?


TS Contributor
Here is the thing I be thinkin

pr( X1 > X2 AND X1 > X3 ) = pr( X1 > X2 > X3 ONION X1 > X2 > X3)

Is the sum of the two disjoint events;

This accounts for two of the 3! equally likely outcomes of the eperiment



Ninja say what!?!
LOL. Its a vegetable. Mightymergz, I'm a little confused about how you set up the problem though. If the distribution of each X has a different mean, wouldn't the probability of selecting the same number in each X be different?

Going back to your example, if you have 3 random variables, and lets say that their means are 0,1,2, then the probability of one of them being the max would not be 1/3.


Ambassador to the humans
They were working on the simplified example where each has a mean of 0 to test his result against intuition.
Indeed, it is a simplified version of the main problem. Also I've never heard of the ONION operator.. Onions are delicious to be sure, but they are less useful for calculating probabilities?

In the simplified example all events are equally proabable so there are only \(3!=6\) possible outcomes.

\(P[X1>X2,X3]\) is two out of the 6 possibilities so the probability is 1/3. Unfortunately this method doesn't work if the random variables have different means. Is there another method or can I modify the above method to work when all choices aren't equally probable..

thanks for your help so far!


Ambassador to the humans
Can I ask what this is for? Just interested? Is it homework? Required for research?

Do we know anything else about the variables? What distribution they have maybe?
Hey Dason, ya, this problem is related to some research I am involved with for the summer. I am an undergrad co-op student on a summer research position :)

The problem I posted is just one way to see the problem, I suppose. The main problem is: consider a segment of a discrete signal from t=-T to T. The signal f(t) = A(t) + y(t) where A(t) is some known function and y(t) is a normal zero mean unit varience random variable.

The way I posed the problem is to just take A(t) as the mean of the random variables at each point and then calculate the statistics from there. We want to know with what probability will each point in the time series be the maximum. \(P[f(t)>f(t')|_{\forall t'\neq t}]\)

Some things I would expect would be that the point with the highest probability of being the maximum would be the point with the highest mean/max(A(t)) (duh)
and normalization: ie the peak is somewhere:
\(\sum_{t=-T}^T P[f(t)>f(t')|_{\forall t'\neq t}]\approx 1\)

But the initial method I tried in the OP definitely doesn't satisfy the normalization requirement. I suppose I could try just adding a normalization constant, but that doesn't seem right at all..


Ninja say what!?!
#10's my reasoning...though I'd love for someone to chime in:
\(Given \; \mu _{1}<\mu _{2}<\mu _{3} \; and \; \sigma _{1}=\sigma _{2}=\sigma _{3}=\sigma\)

So, if the mu's were 0,1,2 and the sigma was 1, then we'd have
\(P(X_{3}>X_{2},X_{1}) = P(Z<\frac{x_{3}-1}{1})*P(Z<\frac{x_{3}-0}{1}) \)

\(If \; x_{3}=\mu_{3}=2, \; then \; P(X_{3}>X_{2},X_{1}
\(\approx 0.841345*0.977250=0.8222 \)
Hi Link, Thanks for your thoughts. I think I follow your train of thought but I numerically calculated the probabilities for \(\mu\)'s equal to 0,1,2 and the probability for \(X_3\) to be the maximum is closer to 0.73ish


Ninja say what!?!
really? And that's with a sigma of 1? I'm having a problem seeing where I might have went wrong. Though I relied on an online calculator to get me the p-values, I don't see them noticeably wrong.


Ambassador to the humans
I think I've got an answer. It's not necessarily pretty but I think I can do some work to simplify it. It matches intuition on that if all have the same mean and if there are 3 r.v.s then the probability is 1/3, if there are n then it's 1/n. But like I said when the means aren't the same it's not pretty so I'm going to see if it simplifies at all.


Ambassador to the humans
Hi Link, Thanks for your thoughts. I think I follow your train of thought but I numerically calculated the probabilities for \(\mu\)'s equal to 0,1,2 and the probability for \(X_3\) to be the maximum is closer to 0.73ish
I did some simulations and I the probability around .728 so I think something went wrong Link. I think that P(X1 > X2) and P(X1 > X3) aren't necessarily independent because if you know that X1 > X2 then it's probably more likely that X1 > X3. But that's just a thought.
ya with var/stddev of 1. Someone suggested to me that \(P(X_{3}>X_{2},X_{1})=P(X_1<x_3)*P(X_2<x_3)\) aren't independent events. I.e. the probability of the second event depends on the result of the first one. i.e. the probability should be more like:
\(P[X_3>X_2,X_1]=P[X_2<X_3]*P[X_1<X_3 | (X_2<X_3)]\)
though I'm not sure how to calculate that second conditional probability.. In the case of the simplified example I know it should be 2/3 simply because the \(P[X_2<X_3]=0.5\) and 1/2 * 2/3 = 1/3 which is what we expect :D

Ya I think it is something like that


Ambassador to the humans
Not gonna go into the derivation but I did some simulations and it appears to match what we want. Maybe when I have some free time I'll type up how I did it but essentially I said let X1 be the variable of interest and we want to know P(X1 > max{X2,...,Xn})

I say that this probability is equal to:

let \( f_i(x) \) denote the pdf of the ith random variable
let \( F_i(x) \) denote the cdf of the ith random variable
\(P(X_1 > \)max\(\{X_2,\ldots,X_n\}) = \ \)\(
1 - \int_{-\infty}^\infty \)\(F_1(x)f_2(x)F_3(x)\cdots F_n(x) \)\(\ +\ F_1(x)F_2(x)f_3(x)F_4(x)\cdots F_n(x) + \ldots + F_1(x)F_2(x)\cdots F_{n-1}(x)f_n(x) \ dx\)

You can simulate this pretty easily by noticing that each term in the integral is just an expected value. So you could simulate the first part of the integral by generating a bunch of observations from \(f_2(x)\) and then apply \(F_1(x)F_3(x)\cdots F_n(x)\) to it, do that a whole bunch of times and take the mean. Do that for each term and you've got an estimate.

Notice that if all the r.v.s have a common mean then each of the \(F_i(x)\) is the same and each of the \(f_i(x)\) are the same so this simplifies to
\(P(X_1 > \)max\(\{X_2,\ldots,X_n\}) = \ \)\(
1 - (n-1) \int_{-\infty}^\infty \)\(f_o(x)(F_o(x))^{n-1}dx\) where \( f_o(x) \) is the common pdf

but \(\int_{-\infty}^\infty \)\(f_o(x)(F_o(x))^{n-1}dx = \frac{1}{n}\) so the whole expression is equal to 1/n which matches our intuition.
AHA! I think I managed the deriviation of your result with the help of the article above and of course your end result..
A moment please, and I shall post it here and you can let me know if I did it right :)


Given a set of nonidentically distributed random varialbes \(X_0, X_1, X_2,\ldots,X_n\) what is the probability that \(X_i\) is the maximum value.
\(P[X_i>Y_i]\) where \(Y_i = max(X_{i'}|_{\forall i'\neq i})\)
\(P[X_i>Y_i]=P[X_i-Y_i>0]=1-F_{X_i-Y_i}(0)\) where \(F_{X_i-Y_i}(0)\) is the CDF of \(X_i-Y_i\)

Take the convolution.. \(F_{X_i-Y_i}(0)=\int_{-\infty}^{\infty} F_i(0-(-y))\cdot f_y(y)dy=\int_{-\infty}^{\infty} F_i(y)\cdot f_y(y)dy\) where \(F_i\) is the CDF of \(X_i\)
\(f_y(y)\) is the PDF of Y so we can calculate Y's CDF and differentiate.

\(P[Y<a] = P[X_{i'}<a]\cdot P[X_{i'+1}<a]\cdots = \prod_{\forall i'\neq i}P[X_i'<a] = F_y(a)\)
\(F_y(y)=\prod_{\forall i'\neq i}F_i(y)\)
\(f_y(y)=\frac{d}{dy} F_y(y)= \frac{d}{dy}F_i \cdot F_{i+1} \cdots = f_i \cdot F_{i+1} \cdot F_{i+2} \cdots + F_i \cdot f_{i+1} \cdot F_{i+2} \cdots + \cdots\)

ok so my notation is a bit crazy... but I think it turns out the same as yours, Dason.
I'm just wondering now if there is a sneakier way to do it. I suppose it would be different depending on the distributions used for the random variables. I could take a hint from the article posted for the normal distributions..
Last edited:


Ambassador to the humans
Looks pretty good. At first I thought of doing the convolution to look at the difference between X_i and Y_i but then decided to just look at the double integral. Turns out pretty nice either way it looks like.
Wonderful :) thanks guys!

But now I want to complicate the problem some more :)

Consider the same problem but now the variables aren't independent but are correlated. Say that the correlation is described by some function \(A(\Delta t)\) where the correlation between two variables only depends on the distance between them. i.e. the \(\Delta t\) for variables \(X_i, X_j\) would be \(j-i\)

I'm going to start thinking about it now. If anyone has any insights they would be greatly appreciated!