Distribution of Extreme Spread for (n, sigma)

#1
Given a Rayleigh process R(s) generating samples \(X_i\) -- which is equivalent to a bivariate normal process with 0 correlation and both sigmas = s -- what is the distribution of the Extreme Spread of n samples?

Extreme spread \(ES\{X\}_n \equiv max_{i, j}|X_i - X_j|\)

E[\(ES_n(\sigma)\)], and Pr(\(ES_n(\sigma)\) > a), seems like it should have a chi(n) distribution, but I am not good enough to derive the exact relationship.

I did find this paper from 1975 which on p.8 suggests as much, but which focuses on empirical solution instead of on the pure math. In contrast, I want to take the parameters of the distribution of X as given and find a formulaic distribution for ES.

Any guidance appreciated!
 

BGM

TS Contributor
#2
Seems like you want to compute the CDF/pdf of the sample range (assume the underlying sample \( X \) is a continuous random variable) - the difference between the sample maximum and sample minimum, \( X_{(n)} - X_{(1)} \)

http://en.wikipedia.org/wiki/Range_(statistics)

By elementary multinomial-type arguments, the joint pdf of \( (X_{(1)}, X_{(n)})\) is

\( f_{X_{(1)}, X_{(n)}}(x_1, x_n) \)

\( = \frac {n!} {0!1!(n-2)!1!0!} F_X(x_1)^0f_X(x_1)^1[F_X(x_n) - F_X(x_1)]^{n-2}f_X(x_n)^1[1 - F_X(x_n)]^0 \)

\( = n(n-1)f_X(x_1)f_X(x_n)[F_X(x_n) - F_X(x_1)]^{n-2} \)

where \( f_X, F_X \) are the pdf and CDF of the orginal unordered random sample.

Using elementary transformation tricks, we know that the pdf of sample range

\( f_R(r) = \int_{-\infty}^{+\infty} n(n-1)f_X(x)f_X(x+r)[F_X(x+r) - F_X(x)]^{n-2}dx,
r > 0 \)

By integration, the CDF is

\( F_R(r) = \int_{-\infty}^{+\infty} nf_X(x)[F_X(x+r) - F_X(x)]^{n-1}dx, r > 0 \)

as listed in the above wiki link.

Therefore one just need to apply the above integral to find the answer. However many of them are quite ugly and have no closed-form solution. For your case, lets try to calculate the CDF:

\( n\int_0^{+\infty} \frac {x} {\sigma^2} \exp\left\{-\frac {x^2} {2\sigma^2}\right\}
\left[\exp\left\{-\frac {x^2} {2\sigma^2}\right\}
- \exp\left\{-\frac {(x+r)^2} {2\sigma^2}\right\}\right]^{n-1} dx \)

\( = \frac {n} {\sigma^2} \int_0^{+\infty} x \exp\left\{-\frac {nx^2} {2\sigma^2}\right\} \left[1 - \exp\left\{-\frac {2rx + r^2} {2\sigma^2}\right\}\right]^{n-1}dx\)

\( = \frac {n} {\sigma^2} \int_0^{+\infty} x \exp\left\{-\frac {nx^2} {2\sigma^2}\right\} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k \exp\left\{-\frac {2krx + kr^2} {2\sigma^2}\right\} dx\)

\( = \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k \int_0^{+\infty} x \exp\left\{-\frac {nx^2 + 2krx + kr^2} {2\sigma^2}\right\} dx\)

\( = \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k \) \( \int_0^{+\infty} x
\exp\left\{- \frac {n} {2\sigma^2} \left(x^2 + 2 \frac {kr} {n} x + \frac {k^2r^2} {n^2}\right)
+ \frac {1} {2\sigma^2} \left(\frac {k^2r^2} {n} - kr^2 \right)\right\} dx\)

\( = \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k \) \( \int_0^{+\infty} x
\exp\left\{- \frac {n} {2\sigma^2} \left(x^2 + 2 \frac {kr} {n} x + \frac {k^2r^2} {n^2}\right)
- \frac {1} {2\sigma^2} \left(kr^2 - \frac {k^2r^2} {n}\right)\right\} dx\)

\( = \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k
\exp\left\{-\frac {kr^2} {2\sigma^2} \left(1 - \frac {k} {n}\right) \right\} \) \( \int_0^{+\infty} x \exp\left\{- \frac {n} {2\sigma^2} \left(x + \frac {kr} {n}\right)^2 \right\}dx \)

Finally we consider:

\( \int_0^{+\infty} \left(x + \frac {kr} {n} \right) \exp\left\{- \frac {n} {2\sigma^2} \left(x + \frac {kr} {n}\right)^2 \right\}dx = \frac {\sigma^2} {n} \)

and

\( \int_0^{+\infty} \exp\left\{- \frac {n} {2\sigma^2} \left(x + \frac {kr} {n}\right)^2 \right\}dx \)

\( = \sqrt{2\pi\frac {\sigma^2} {n}} \int_0^{+\infty} \frac {1} {\sqrt{2\pi\frac {\sigma^2} {n}}} \exp\left\{- \frac {n} {2\sigma^2} \left(x + \frac {kr} {n}\right)^2 \right\}dx\)

\( = \sqrt{2\pi\frac {\sigma^2} {n}} \Pr\left\{\frac {\sigma} {\sqrt{n}} Z - \frac {kr} {n} > 0 \right\} \)

\( = \sqrt{2\pi\frac {\sigma^2} {n}}\left[1 - \Phi\left(\frac {kr} {\sigma\sqrt{n}}\right)\right] \)

where \( Z \) is the standard normal random variable and \( \Phi \) is its CDF.

As a result, the CDF is

\( = \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k
\exp\left\{-\frac {kr^2} {2\sigma^2} \left(1 - \frac {k} {n}\right) \right\} \) \( \int_0^{+\infty} \left(x + \frac {kr} {n} - \frac {kr} {n} \right) \exp\left\{- \frac {n} {2\sigma^2} \left(x + \frac {kr} {n}\right)^2 \right\}dx \)

\( = \frac {n} {\sigma^2} \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k
\exp\left\{-\frac {kr^2} {2\sigma^2} \left(1 - \frac {k} {n}\right) \right\} \) \( \left\{\frac {\sigma^2} {n} - \frac {kr} {n} \sqrt{2\pi\frac {\sigma^2} {n}}\left[1 - \Phi\left(\frac {kr} {\sigma\sqrt{n}}\right)\right] \right\} \)

But as you see it is quite a mess.
 
#3
Wow. Just curious: are there any continuous distributions that have clean range distributions?

As for this case: Can you see any way to pull sigma outside of the outer sum?

I ask because in practice I will only be concerned with N in the range [2, 100], but for arbitrary sigma. If we can establish a relationship between the CDF(given n) and sigma, then I could compute the values for each N and sigma = 1 just once and then use those reference values to compute the probability given a different sigma.

Or is it known that there's no better way to get values than to do the full sum you derived?
 
#4
Another way of putting my follow-up question: Intuitively we expect some scale invariance of the extreme spread w.r.t. sigma. I.e., holding n constant, if \(\mathrm{E}[ES_n(\sigma)] = X\) then I would expect something like \(\mathrm{E}[ES_n(a \sigma)] = aX\) or = \(\sqrt{a} X\)
 

BGM

TS Contributor
#5
Yes you have raised a very good point which I missed - \( \sigma \) is the scale parameter.

We have the following:

If \( U \sim \text{Rayleigh}(1) \), then \( X = \sigma U \sim \text{Rayleigh}(\sigma) \)

Since the scaling does not affect the order (remember \( \sigma > 0 \)), we have

\( X_{(n)} - X_{(1)} = \sigma U_{(n)} - \sigma U_{(1)} = \sigma (U_{(n)} - U_{(1)}) \)

i.e. again \( \sigma \) is the scale parameter of the sample range.

Note that \( U_{(n)} - U_{(1)} \) is independent of \( \sigma \);

and we have the relationship

\( E[X_{(n)} - X_{(1)}] = \sigma E[U_{(n)} - U_{(1)}] \)

\( F_{X_{(n)} - X_{(1)}}(r) \)

\( = \Pr\{X_{(n)} - X_{(1)} \leq r\} \)

\( = \Pr\left\{U_{(n)} - U_{(1)} \leq \frac {r} {\sigma} \right\}\)

\( = F_{U_{(n)} - U_{(1)}}\left(\frac {r} {\sigma} \right) \)

and you can see this relationship in the answer of my previous post:

\( \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k
\exp\left\{-\frac {k} {2} \left(\frac {r} {\sigma}\right)^2
\left(1 - \frac {k} {n}\right) \right\} \) \(
\left\{1 - k \left( \frac {r} {\sigma} \right)\sqrt{\frac {2\pi} {n}}\left[1 - \Phi\left(\frac {k} {\sqrt{n}} \times \frac {r} {\sigma}\right)\right] \right\} \)

- it can be written as a function of \( \frac {r} {\sigma} \)
 
#6
Ah ha, I guess this makes sense. So there is scale invariance, but the probability I'm looking for is a function of both n and how far from the sigma I'm looking. I.e., a lookup table would have to provide precomputed values for two dimensions: \((n, \frac{r}{\sigma})\).

So to summarize
  • There's a separate CDF for each n
  • For a Rayleigh process the CDF of the extreme spread is the sum you derived:
    \( \sum_{k=0}^{n-1} \binom {n-1} {k} (-1)^k
    \exp\left\{-\frac {k} {2} \left(\frac {r} {\sigma}\right)^2
    \left(1 - \frac {k} {n}\right) \right\} \) \(
    \left\{1 - k \left( \frac {r} {\sigma} \right)\sqrt{\frac {2\pi} {n}}\left[1 - \Phi\left(\frac {k} {\sqrt{n}} \times \frac {r} {\sigma}\right)\right] \right\} \)
  • When looking for extreme spread probabilities from a Rayleigh process with n samples, once we compute a particular point on the CDF(n), the same value holds for all probabilities at the same scale distance \((\frac{r}{\sigma})\).

Is this correct?
 

BGM

TS Contributor
#7
I think you got it. Hopefully there is no careless mistakes in the above calculations :p I did not check for many times.