    Hello biostatisticians,

    I have run a very standard RNA-Seq protocol using TopHat - Cufflinks - Cuffmerge and Cuffdiff to measure differential gene expression from muscle samples (4 groups, 3 replicates per group). After analysis of the adjusted p value distribution, I am surprised to see mostly discrete (sparse) values rather than a continuous distribution. Particularly when it comes to small p values. For instance the lowest p is found for more than 100 transcripts and corresponds to -log(p) = 2.32589977, then nothing until 19 transcripts with -log(p) = 2.089811175, etc. Picture of the graph attached.

    Do you have any insight or explanation on what is going on? Note: I am using Galaxy to perform my analysis, and I don't seem to have the hand on which statistics is used...

    Thanks a lot if you have any advice!
