+ Reply to Thread
Results 1 to 1 of 1

Thread: Comparison of one outcome to all the other outcomes

  1. #1
    Points: 2,679, Level: 31
    Level completed: 53%, Points required for next Level: 71

    Posts
    3
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Comparison of one outcome to all the other outcomes




    Here is some data to play with


    Code: 
        set.seed(1e4)    
        color = factor(c(rep("blue",20), rep("red",20), rep("yellow",150)), levels=c("red","blue","yellow"))
        x1 = rnorm(20+20+150,12,3)
        x2 = c(rnorm(20,5), rnorm(20,5), rnorm(150,7))
        x3 = c(rnorm(20,7), rnorm(20,5), rnorm(150,5))
        x4 = c(rnorm(20,5), rnorm(20,7), rnorm(150,5))
        RedOrNot = ifelse(color=="red","Red","NotRed")
        
        df = data.frame(
          color = color,
          RedOrNot = RedOrNot,
          x1 = x1, 
          x2 = x2,
          x3 = x3,
          x4 = x4
         )

    We are interested in the color `red` in particular. We would like to understand which explanatory variables (which among `x1`,`x2`,`x3` and `x4`) affects the probability `red` "differentially that it affects the probability of outcome `blue` or `yellow`". The question might be a little unclear but I hope the following will make my point very obvious.

    Below graphs show 95% bootstrap confidence intervals.

    Code: 
        p1 = ggplot(df,aes(x=color,y=x1)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
        p2 = ggplot(df,aes(x=color,y=x2)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
        p3 = ggplot(df,aes(x=color,y=x3)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
        p4 = ggplot(df,aes(x=color,y=x4)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
        multiplot(p1,p2,p3,p4,layout=matrix(1:4,ncol=2))


    "red" seem to differ from the two other groups only in respect to `x4`. I would be tempted to group `yellow` and `blue` into a single group called `NotRed` as shown below.

    Code: 
        p1 = ggplot(df,aes(x=RedOrNot,y=x1)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
        p2 = ggplot(df,aes(x=RedOrNot,y=x2)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
        p3 = ggplot(df,aes(x=RedOrNot,y=x3)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
        p4 = ggplot(df,aes(x=RedOrNot,y=x4)) + stat_summary(fun.data=mean_cl_boot) + coord_flip()
        multiplot(p1,p2,p3,p4,layout=matrix(1:4,ncol=2))



    Because we sampled way more `yellow` than `blue`, it seems that `Red` differ from the other colors for `x2` as well as `x4` but this is misleading. I would rather not consider `x2` as having an effect of interest when I see the first series of plots as I am looking for variables that make `red` different than both other groups.


    Let's make a test with `color` as response variable.

    Code: 
        k = length(levels(df$color))
        I = diag(k-1)
        J = matrix(rep(1, (k-1)^2), c(k-1, k-1))
        priors = list(R = list(fix=1, V=(1/k) * (I + J), n = k - 1))    
        m = MCMCglmm(color ~  -1  + trait:(x1) + trait:(x2) + trait:(x3) + trait:(x4) , rcov = ~ us(trait):units, data = df, family = "categorical", prior = priors, verbose = FALSE, burnin = 10000, nitt = 60000, thin = 50)
        summary(m)
    
                             post.mean l-95% CI u-95% CI eff.samp  pMCMC    
        traitcolor.blue:x1    -0.30261 -0.60335  0.02378    72.04  0.048 *  
        traitcolor.yellow:x1  -0.02728 -0.24525  0.19561   129.73  0.832    
        traitcolor.blue:x2    -0.60401 -1.20394  0.04258    32.72  0.118    
        traitcolor.yellow:x2   2.05721  1.35060  2.94990    26.03 <0.001 ***
        traitcolor.blue:x3     2.27062  1.36630  3.27224    30.57 <0.001 ***
        traitcolor.yellow:x3  -0.30359 -0.97326  0.27461    86.95  0.306    
        traitcolor.blue:x4    -1.28126 -2.11725 -0.55014    41.71 <0.001 ***
        traitcolor.yellow:x4  -1.41233 -2.05106 -0.80523    68.91 <0.001 ***

    Would it be wise to consider only the highest p.value for each explanatory variable to decide (relative to a threshold) whether a variable affects the probability of the outcome `red` differentially than both `blue` and `yellow`?

    How can I go about testing what variables affect the probability of the outcome `red` "differentially than both `blue` and `yellow`"?

    Similarly, when performing an LDA, I would have wished `x4` to pop up as important driver of the first axis but this is not made obvious due to `x2` again.

    Code: 
      
        lda(data=df,RedOrNot~x1+x2+x3+x4)$scaling
        
                    LD1
        x1  0.004390466
        x2 -0.571945258
        x3 -0.241371706
        x4  0.784905799
    
        lda(data=df,color~x1+x2+x3+x4)$scaling
    
                  LD1         LD2
        x1 -0.0361047  0.03607743
        x2 -0.8110918 -0.09303029
        x3  0.4472190 -0.69882190
        x4  0.3820449  0.73839111
    Last edited by Remi; 10-23-2016 at 07:03 PM.

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats