# Thread: Comparison of one outcome to all the other outcomes

1. ## Comparison of one outcome to all the other outcomes

Here is some data to play with

Code:
``````    set.seed(1e4)
color = factor(c(rep("blue",20), rep("red",20), rep("yellow",150)), levels=c("red","blue","yellow"))
x1 = rnorm(20+20+150,12,3)
x2 = c(rnorm(20,5), rnorm(20,5), rnorm(150,7))
x3 = c(rnorm(20,7), rnorm(20,5), rnorm(150,5))
x4 = c(rnorm(20,5), rnorm(20,7), rnorm(150,5))
RedOrNot = ifelse(color=="red","Red","NotRed")

df = data.frame(
color = color,
RedOrNot = RedOrNot,
x1 = x1,
x2 = x2,
x3 = x3,
x4 = x4
)``````

We are interested in the color `red` in particular. We would like to understand which explanatory variables (which among `x1`,`x2`,`x3` and `x4`) affects the probability `red` "differentially that it affects the probability of outcome `blue` or `yellow`". The question might be a little unclear but I hope the following will make my point very obvious.

Below graphs show 95% bootstrap confidence intervals.

Code:
``````    p1 = ggplot(df,aes(x=color,y=x1)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p2 = ggplot(df,aes(x=color,y=x2)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p3 = ggplot(df,aes(x=color,y=x3)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p4 = ggplot(df,aes(x=color,y=x4)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
multiplot(p1,p2,p3,p4,layout=matrix(1:4,ncol=2))``````

"red" seem to differ from the two other groups only in respect to `x4`. I would be tempted to group `yellow` and `blue` into a single group called `NotRed` as shown below.

Code:
``````    p1 = ggplot(df,aes(x=RedOrNot,y=x1)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p2 = ggplot(df,aes(x=RedOrNot,y=x2)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p3 = ggplot(df,aes(x=RedOrNot,y=x3)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p4 = ggplot(df,aes(x=RedOrNot,y=x4)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
multiplot(p1,p2,p3,p4,layout=matrix(1:4,ncol=2))``````

Because we sampled way more `yellow` than `blue`, it seems that `Red` differ from the other colors for `x2` as well as `x4` but this is misleading. I would rather not consider `x2` as having an effect of interest when I see the first series of plots as I am looking for variables that make `red` different than both other groups.

Let's make a test with `color` as response variable.

Code:
``````    k = length(levels(df\$color))
I = diag(k-1)
J = matrix(rep(1, (k-1)^2), c(k-1, k-1))
priors = list(R = list(fix=1, V=(1/k) * (I + J), n = k - 1))
m = MCMCglmm(color ~  -1  + trait:(x1) + trait:(x2) + trait:(x3) + trait:(x4) , rcov = ~ us(trait):units, data = df, family = "categorical", prior = priors, verbose = FALSE, burnin = 10000, nitt = 60000, thin = 50)
summary(m)

post.mean l-95% CI u-95% CI eff.samp  pMCMC
traitcolor.blue:x1    -0.30261 -0.60335  0.02378    72.04  0.048 *
traitcolor.yellow:x1  -0.02728 -0.24525  0.19561   129.73  0.832
traitcolor.blue:x2    -0.60401 -1.20394  0.04258    32.72  0.118
traitcolor.yellow:x2   2.05721  1.35060  2.94990    26.03 <0.001 ***
traitcolor.blue:x3     2.27062  1.36630  3.27224    30.57 <0.001 ***
traitcolor.yellow:x3  -0.30359 -0.97326  0.27461    86.95  0.306
traitcolor.blue:x4    -1.28126 -2.11725 -0.55014    41.71 <0.001 ***
traitcolor.yellow:x4  -1.41233 -2.05106 -0.80523    68.91 <0.001 ***``````

Would it be wise to consider only the highest p.value for each explanatory variable to decide (relative to a threshold) whether a variable affects the probability of the outcome `red` differentially than both `blue` and `yellow`?

How can I go about testing what variables affect the probability of the outcome `red` "differentially than both `blue` and `yellow`"?

Similarly, when performing an LDA, I would have wished `x4` to pop up as important driver of the first axis but this is not made obvious due to `x2` again.

Code:
``````
lda(data=df,RedOrNot~x1+x2+x3+x4)\$scaling

LD1
x1  0.004390466
x2 -0.571945258
x3 -0.241371706
x4  0.784905799

lda(data=df,color~x1+x2+x3+x4)\$scaling

LD1         LD2
x1 -0.0361047  0.03607743
x2 -0.8110918 -0.09303029
x3  0.4472190 -0.69882190
x4  0.3820449  0.73839111``````

 Tweet