# Comparison of one outcome to all the other outcomes

#### Remi

##### New Member
Here is some data to play with

Code:
    set.seed(1e4)
color = factor(c(rep("blue",20), rep("red",20), rep("yellow",150)), levels=c("red","blue","yellow"))
x1 = rnorm(20+20+150,12,3)
x2 = c(rnorm(20,5), rnorm(20,5), rnorm(150,7))
x3 = c(rnorm(20,7), rnorm(20,5), rnorm(150,5))
x4 = c(rnorm(20,5), rnorm(20,7), rnorm(150,5))
RedOrNot = ifelse(color=="red","Red","NotRed")

df = data.frame(
color = color,
RedOrNot = RedOrNot,
x1 = x1,
x2 = x2,
x3 = x3,
x4 = x4
)

We are interested in the color red in particular. We would like to understand which explanatory variables (which among x1,x2,x3 and x4) affects the probability red "differentially that it affects the probability of outcome blue or yellow". The question might be a little unclear but I hope the following will make my point very obvious.

Below graphs show 95% bootstrap confidence intervals.

Code:
    p1 = ggplot(df,aes(x=color,y=x1)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p2 = ggplot(df,aes(x=color,y=x2)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p3 = ggplot(df,aes(x=color,y=x3)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p4 = ggplot(df,aes(x=color,y=x4)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
multiplot(p1,p2,p3,p4,layout=matrix(1:4,ncol=2))

"red" seem to differ from the two other groups only in respect to x4. I would be tempted to group yellow and blue into a single group called NotRed as shown below.

Code:
    p1 = ggplot(df,aes(x=RedOrNot,y=x1)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p2 = ggplot(df,aes(x=RedOrNot,y=x2)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p3 = ggplot(df,aes(x=RedOrNot,y=x3)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
p4 = ggplot(df,aes(x=RedOrNot,y=x4)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
multiplot(p1,p2,p3,p4,layout=matrix(1:4,ncol=2))

Because we sampled way more yellow than blue, it seems that Red differ from the other colors for x2 as well as x4 but this is misleading. I would rather not consider x2 as having an effect of interest when I see the first series of plots as I am looking for variables that make red different than both other groups.

Let's make a test with color as response variable.

Code:
    k = length(levels(df$color)) I = diag(k-1) J = matrix(rep(1, (k-1)^2), c(k-1, k-1)) priors = list(R = list(fix=1, V=(1/k) * (I + J), n = k - 1)) m = MCMCglmm(color ~ -1 + trait:(x1) + trait:(x2) + trait:(x3) + trait:(x4) , rcov = ~ us(trait):units, data = df, family = "categorical", prior = priors, verbose = FALSE, burnin = 10000, nitt = 60000, thin = 50) summary(m) post.mean l-95% CI u-95% CI eff.samp pMCMC traitcolor.blue:x1 -0.30261 -0.60335 0.02378 72.04 0.048 * traitcolor.yellow:x1 -0.02728 -0.24525 0.19561 129.73 0.832 traitcolor.blue:x2 -0.60401 -1.20394 0.04258 32.72 0.118 traitcolor.yellow:x2 2.05721 1.35060 2.94990 26.03 <0.001 *** traitcolor.blue:x3 2.27062 1.36630 3.27224 30.57 <0.001 *** traitcolor.yellow:x3 -0.30359 -0.97326 0.27461 86.95 0.306 traitcolor.blue:x4 -1.28126 -2.11725 -0.55014 41.71 <0.001 *** traitcolor.yellow:x4 -1.41233 -2.05106 -0.80523 68.91 <0.001 *** Would it be wise to consider only the highest p.value for each explanatory variable to decide (relative to a threshold) whether a variable affects the probability of the outcome red differentially than both blue and yellow? How can I go about testing what variables affect the probability of the outcome red "differentially than both blue and yellow"? Similarly, when performing an LDA, I would have wished x4 to pop up as important driver of the first axis but this is not made obvious due to x2 again. Code:  lda(data=df,RedOrNot~x1+x2+x3+x4)$scaling

LD1
x1  0.004390466
x2 -0.571945258
x3 -0.241371706
x4  0.784905799

lda(data=df,color~x1+x2+x3+x4)\$scaling

LD1         LD2
x1 -0.0361047  0.03607743
x2 -0.8110918 -0.09303029
x3  0.4472190 -0.69882190
x4  0.3820449  0.73839111

Last edited: