Dear all,

I was wondering whether you could help me with the following biostatistics question. I have consulted several people, who gave very different answers, so perhaps you could help me make a decision:

I am studying a cohort of obese women, one group with normal blood glucose levels (NGT) and one group with Type 2 Diabetes glucose levels (T2D). In my experiments I found differences in the fatty acid metabolism between NGT vs T2D. And it just so happens to be, I found out now, that on this same cohort, RNA sequencing data was generated and the gene expression levels were determined.

In order to verify my finding I would now like to test whether the fatty acid metabolism pathway is also differentially expressed. Therefore I used KEGG to isolate the genes from the pathway (around 30 genes) and I looked up the gene expression levels for the genes in the pathway from all the genes measured. Do I now...

1) perform a repeated measures ANOVA for "DiabetesStatus" and "GenesInPathway" for the repeated measures (the expression levels of the 30 genes measured in each subject), to see if "DiabetesStatus" is significantly different. The problem is, is that I have never seen this done in publications before. Also, how can you say if the pathway expression as a whole is higher / equal / lower for the NGT then for the T2D?

2) perform multiple T-tests for each gene between NGT and T2D, correcting for the number of genes in the pathway (30), to isolate the differentially expressed genes between NGT and T2D. Then do a Chi-square test to test whether the number of differentially expressed genes in this pathway is more then the number of differentially expressed genes in the total amount of genes measured.

Any help or suggestion would be greatly appreciated! Thanks