# calculating similarity between boxplots pair-wisely

#### gianmarco

##### TS Contributor
Hello,
I was trying to implement a calculation of similarity between data distribution based on box-plots and related five-number summaries.
The distance is based on difference between Median, 1Q, 3Q, minimum, and Maximum across variables (see attached picture).

My issue is that i have hard trying working out a viable strategy to calculate the distance pair-wise across n. variables.

Assuming I have a toy dataset as follows:
C-like:
a <- rnorm(30, 30,5)
b <- rnorm(30,40,5)
c <- rnorm(30,50,5)

df <- data.frame("a"=a, "b"=b, "c"=c)
what would be way to work out the similarity index pair-wise across the 3 variables?

Any pointer to the rght direction would be appreciated.

Thank you
Gm

EDIT
I came up with a code to calculate the distance between a pair of variables:

C-like:
get.distance <- function(x,y) {
d <- 0.5 * (abs(min(x)-min(y)) + 2 * abs(quantile(x, probs=0.25)-quantile(y, probs=0.25)) + 2 * abs(median(x)-median(y)) + 2 * abs(quantile(x, probs=0.75)-quantile(y, probs=0.75)) + abs(max(x)-max(y)))
return(d)
}
Now I am wondering how can use that to automatically work out the distance pairwisely across any number of variables at hand?

#### Attachments

• 21 KB Views: 3
Last edited:

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I get that you have something in your mind that you want to achieve, but quantile regression addresses this. In QR you can pick any percentile value and it will give you the difference with confidence intervals. I also like plotting the cumulative plots for two groups, which also better visualizes differences. Boxplots are nice for a quick comparisons, but there is a reason people don't traditionally do a lot more with them.

#### gianmarco

##### TS Contributor
I get that you have something in your mind that you want to achieve, but quantile regression addresses this. In QR you can pick any percentile value and it will give you the difference with confidence intervals. I also like plotting the cumulative plots for two groups, which also better visualizes differences. Boxplots are nice for a quick comparisons, but there is a reason people don't traditionally do a lot more with them.
Thank you @hlsmith....I am just trying to replicate something I found in literature.