Dear all,

I have situation in which I compare many genomic regions between two or more cell lines (CL). Each region is covered by n probes measuring the level of methylation (continuous variable). The coverage (value of n = sample size per region, generally varies between 4 and 20) is different for different regions (array property, property of the base pair sequence) . However, for 1 region n is the same for all CLs.

For example (n=2, A & B)

region 1 has 5 measurements in each cell line.

region 1, CL A: 23 9 80 62 31

region 1, CL B: -98 -65 -19 -95 -23


region 2 has 10 measurements within the region (bigger region and/or better coverage)

region 2, CL A: 66 7 31 89 100 81 63 93 33 0

region 2, CL B: 17 -50 -89 -46 -52 -80 -7 -26 -62 -26

The data has one additional property: the measurements WITHIN one cell line & region are correlated, e.g. if one of the measurements (one genomic locus) gives a high methylation value, the adjacent locus is more probable to also be high. I have illustrated this by the sign of the numbers in the demo data. (There is no repeated measurement involved.)

The number of regions of interest concerns several thousand regions.

I want to do 2 things

1. Test for each region whether A differs from B. Currently I use a Mann
Whitney U test for this if n=2 (A,B) or Kruskal Wallis if n>2
(A,B,C,...) . I have been told that because of the correlation, the
assumption of independence of the measurements (=samples) fails and
I should use a permutation test. Is the approach as implemented in
the "coin" package in R
applicable to this problem (conditional counterpart of unconditional
2. Correct for multiple testing. As my sample sizes vary between the
regions I am at a loss here. The p-values do not seem to be
comparable because of this, so application of default FWER based correction might not be possible, right?

I would very much appreciate any insight. Thanks a lot.

Kind regards,