dunn.test

#1
Hi everyone!!

I'm quite new with R, so I hope you can help me :)

I'm working with bacteria pyrosequencing data and I'm doing the statistical analysis with R. I have 21 samples and 7 different treatments. I loaded my data into R phyloseq obtaining:

> psR
phyloseq-class experiment-level object
otu_table() OTU Table: [ 7498 taxa and 21 samples ]
sample_data() Sample Data: [ 21 samples by 8 sample variables ]
tax_table() Taxonomy Table: [ 7498 taxa by 6 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 7498 tips and 7497 internal nodes ]


Since I found that there are statistically significant differences between the treaments (with the adonis function), I wanted to know which OTUs have a different abundance in the different treatments. To do that I used the function dunn.test (with Kruskal-Wallis test incorporated), swapping first the rows and the columns in the OTU table in order to apply the test:

swap_otu_table <- t(otu_table(psR))
treatment <- c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D', 'E', 'E', 'E', 'F', 'F', 'F', 'G', 'G', 'G')
swap_otu_tableDF <- as.data.frame(swap_otu_table)
ncol(swap_otu_tableDF)
[1] 7498
lapply(swap_otu_tableDF[1:7498], function(x) kruskal.test(x ~ treatment, data=swap_otu_tableDF))


The output of this recursive function is quite difficult to read, especially for all 7498 OTUs.

Is there some way to apply a Kruskal-Wallis + Dunn's test in a recursive way that gives a table as an output, preferably in order of significance, and with not only the OTU code but also the taxonomic identification contained in tax_table(psR)?

Thank you very much!

Lidia
 

Lazar

Phineas Packard
#2
You will have to look at the elements that make up the kruskal.test:
Code:
x <- c(2.9, 3.0, 2.5, 2.6, 3.2) # normal subjects
y <- c(3.8, 2.7, 4.0, 2.4)      # with obstructive airway disease
z <- c(2.8, 3.4, 3.7, 2.2, 2.0) # with asbestosis
test<- kruskal.test(list(x, y, z))
str(test) # suggests you need to extract to statistic, parameter, and p.value compnents
#e.g.
c(test$statistic, test$parameter, test$p.value)
So my guess would be something like:
Code:
sapply(swap_otu_tableDF[1:7498], function(x) {
             temp <- kruskal.test(x ~ treatment, data=swap_otu_tableDF)
            c(temp$statistic, temp$parameter, temp$p.value)
           }
)
Might give you what you want but I have not tested it. You can then order the data by p.value if you like.
 
#3
Thanks for your reply Lazar!

Unfortunately the code does not work with the function dunn.test... Following your suggestion I tried:

dunn.test.1 <- dunn.test(swap_otu_tableDF[,1], treatment, kw=TRUE, label=TRUE, method='bonferroni', alpha=0.05)
str(dunn.test.1)
List of 4
$ chi2 : num 4.46
$ Z : num [1:21] -1.08 0 1.08 -1.08 0 ...
$ P : num [1:21] 0.14 0.5 0.14 0.14 0.5 ...
$ P.adjusted: num [1:21] 1 1 1 1 1 1 1 1 1 1 ...
sapply(swap_otu_tableDF[1:7498], function(x) {
temp <- dunn.test( x ~ treatment, data=swap_otu_tableDF)
c(temp$p.value)
}
)

The big problem with this recursive dunn.test output is also that is somehow divided into two parts. Firstly there is the table for all the OTUs, without the code:

Kruskal-Wallis rank sum test

data: x and treatment
Kruskal-Wallis chi-squared = 12.0548, df = 6, p-value = 0.06


Comparison of x by treatment
(Bonferroni)
Row Mean-|
Col Mean | 0 B C CA L SO
---------+------------------------------------------------------------------
B | 0.000000
| 1.0000
|
C | -1.813193 -1.813193
| 0.7329 0.7329
|
CA | -0.604397 -0.604397 1.208795
| 1.0000 1.0000 1.0000
|
L | -2.417591 -2.417591 -0.604397 -1.813193
| 0.1640 0.1640 1.0000 0.7329
|
SO | 0.000000 0.000000 1.813193 0.604397 2.417591
| 1.0000 1.0000 0.7329 1.0000 0.1640
|
TR | -1.510994 -1.510994 0.302198 -0.906596 0.906596 -1.510994
| 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000


And then after all these tables (for all the OTUs) there are the values with the OTUs code:

$`C11||IVN7XNJ04I9G4B`
$`C11||IVN7XNJ04I9G4B`$chi2
[1] 12.05479

$`C11||IVN7XNJ04I9G4B`$Z
[1] 0.0000000 -1.8131937 -1.8131937 -0.6043979 -0.6043979
[6] 1.2087958 -2.4175915 -2.4175915 -0.6043979 -1.8131937
[11] 0.0000000 0.0000000 1.8131937 0.6043979 2.4175915
[16] -1.5109947 -1.5109947 0.3021989 -0.9065968 0.9065968
[21] -1.5109947

$`C11||IVN7XNJ04I9G4B`$P
[1] 0.500000000 0.034900979 0.034900979 0.272789571 0.272789571
[6] 0.113370658 0.007811802 0.007811802 0.272789571 0.034900979
[11] 0.500000000 0.500000000 0.034900979 0.272789571 0.007811802
[16] 0.065394899 0.065394899 0.381250205 0.182310020 0.182310020
[21] 0.065394899

$`C11||IVN7XNJ04I9G4B`$P.adjusted
[1] 1.0000000 0.7329206 0.7329206 1.0000000 1.0000000 1.0000000
[7] 0.1640478 0.1640478 1.0000000 0.7329206 1.0000000 1.0000000
[13] 0.7329206 1.0000000 0.1640478 1.0000000 1.0000000 1.0000000
[19] 1.0000000 1.0000000 1.0000000


The two parts are not connected and I can't understand why...