Dear all,

I have question concerning statistical tests to compare 2 datasets without replicas.

i will explain my problem into more detail:

I have to large dataset (ca. 2000 genes per dataset) like this:
dataset1: control_gene_1 atgcggaggtttatgcgcaag... (up to 2000 genes)
dataset2: Treatment_gene_1 atgcaacgcgaaaaggagct... (up to 2000 genes)

the name of gene_1 in both dataset have nothing in common.

I have determined some features of each data set, for example the average occurence of A T C and G giving me something like this
Dataset1: A (20%), T (20%), G (30%), C (30%)
Dataset1: A (15%), T (15%), G (35%), C (35%)

I would like to compare this statistically.
One option would be to determine the average occurence of A T G and C for each gene and treating each gene in a dataset as a replica. The large amount of genes makes almost any minute difference between both datasets significant.

Question: is it allowed to divide each dataset into 3 groups (of course randomized)?? And treat them as replica's

Best wishes,