I have situation in which I compare many genomic regions between two or more cell lines (CL). Each region is covered by n probes measuring the level of methylation (continuous variable). The coverage (value of n = sample size per region, generally varies between 4 and 20) is different for different regions (array property, property of the base pair sequence) . However, for 1 region n is the same for all CLs.

For example (n=2, A & B)

region 1 has 5 measurements in each cell line.

region 1, CL A: 23 9 80 62 31

region 1, CL B: -98 -65 -19 -95 -23

--

region 2 has 10 measurements within the region (bigger region and/or better coverage)

region 2, CL A: 66 7 31 89 100 81 63 93 33 0

region 2, CL B: 17 -50 -89 -46 -52 -80 -7 -26 -62 -26

The data has one additional property: the measurements WITHIN one cell line & region are correlated, e.g. if one of the measurements (one genomic locus) gives a high methylation value, the adjacent locus is more probable to also be high. I have illustrated this by the sign of the numbers in the demo data. (There is no repeated measurement involved.)

The number of regions of interest concerns several thousand regions.

I want to do 2 things

1. Test for each region whether A differs from B. Currently I use a Mann

Whitney U test for this if n=2 (A,B) or Kruskal Wallis if n>2

(A,B,C,...) . I have been told that because of the correlation, the

assumption of independence of the measurements (=samples) fails and

I should use a permutation test. Is the approach as implemented in

the "coin" package in R

(http://cran.r-project.org/web/packages/coin/vignettes/coin.pdf)

applicable to this problem (conditional counterpart of unconditional

tests).

2. Correct for multiple testing. As my sample sizes vary between the

regions I am at a loss here. The p-values do not seem to be

comparable because of this, so application of default FWER based correction might not be possible, right?

I would very much appreciate any insight. Thanks a lot.

Kind regards,

Martin