Compare independent gene lists after statistical test (limma)

#1
Dear all,
I have a general question about the statistical analysis approach that I intend to use, which basically boils down to "is this feasible to do or not?".

But first let me describe the setting.

I am currently analyzing 3 independent data sets that were derived from cancerous tissues. What I intend to do is analyze 5 biological subgroups within one data set and then compare/validate my results between different data sets. 3 of these subgroups shall in turn be compare as a "metagroup" against the other 2 subgroups.

subgroups
cancer1, cancer2, cancer3, metastasis, normal
metagroup
cancer (comprises cancer1, cancer2 and cancer3)
comparisons
cancer1 vs cancer2
cancer1 vs cancer3
cancer2 vs cancer3
cancer vs normal
metastasis vs cancer

I implemented this by using two separate contrast matrices in limma, one for the cancer subgroups and on for the meta-comparisons.

My questions now are the following:
1. Can I compare the same group-comparisons (e.g. cancer vs normal) between different data sets directly?
2. Or should I rather create separate contrast matrices for each comparison and then compare the resulting lists?

I am asking this because one of the data sets is missing the metastasis samples, which means I would need to compare list derived from a "pure" cancer vs normal contrast matrix with one derived from a contrast matrix including both meta comparisons.
My previous experiences with summing up groups to obtain a metagroup comparison showed differing results when compared to an analysis where the groups were relabelled,
i.e. (cancer1+cancer2+cancer3)/3 vs normal)
and cancer vs normal

Therefore I expect that both described approaches will return different results (number of significant genes with p<0.05) and this will affect further processing.

However, I am also not certain that using only subgroups of the whole data set is feasible, since all samples of one data sets have been normalized as a batch and I guess this would introduce additional bias / lower the statistical power of the analysis.
Thus, I assume it would be most reasonable to renormalize the selected samples for every comparison I conduct, though I would like to avoid this as much as possible.

I hope someone can give me recommendations how to proceed or which approach is most reasonable. Thank you very much in advance.

Best regards,
bontus
 
Last edited: