I hope you're well. I have an elementary question that I hope someone can help me with.

I am reading this protocol:

Love MI, Anders S, Kim V and Huber W.

**RNA-Seq workflow: gene-level exploratory analyses and differential expression [version 1; referees: 2 approved]**.

*F1000Research*2015

**4**:1070.

There, I have reached this paragraph:

"In high-throughput biology, we are careful to not use the p values directly as evidence against the null, but to correct for multiple testing. What would happen if we were to simply threshold the p values at a low value, say 0.05? There are 5722 genes with a p value below 0.05 among our 29391 genes [This result comes from an analysis they did previously], for which the test succeeded in reporting a p value.

Now, assume for a moment that the null hypothesis is true for all genes, i.e., no gene is affected by the treatment with dexamethasone. Then, by the definition of the p value, we expect up to 5% of the genes to have a p value below 0.05. This amounts to 1470 genes. If we just considered the list of genes with a p value below 0.05 as differentially expressed, this list should therefore be expected to contain up to 1470/5722 = 26% false positives."

What I don't understand is the last result: How does 1470 (expected p < 0.05) / 5722 (obtained p < 0.05 ) define the false positive fraction? I would have thought, intuitively, that the false positives should be the other 74% "non-expected" ones.

Can anyone help me understand this, please? I would be very thankful

Thanks!!