# Thread: Test if non-negative measurements are significantly different from 0 (5 replicates)

1. ## Test if non-negative measurements are significantly different from 0 (5 replicates)

I have measurements basically indicating how much a gene is used in specific individuals. The way you do this is by counting how much gene product (the thing a gene produces) is present for each gene. However, you do not count all gene products, you sample x times from the total pool of products (e.g. 30 million times) and you count how much of each gene product (called the number of reads) was present, which will always be an integer number. So you’re basically sampling from a big bag of gene products X times. This way you can make relative comparisons between individuals but not absolute comparisons.

As with all measurements, these are also not perfect. So sometimes you might measure a few reads, but you cannot say the value you measure is significantly different from zero. Since I measured 5 replicates for each individual I am able to calculate a standard deviation for each gene per individual (I have ~30.000 genes, 9 individuals). What I want to do is determine whether or not the five replicate measurements I have for each individual are significantly different from zero.

I thought of doing a one sample t-test (or a nonparametric alternative) to determine whether these values are significantly different from zero. However, I am not sure whether this is correct since any deviation from zero will always be positive. There is also the possibility of calculating confidence intervals for the measurements and determining whether zero is in these confidence intervals. However, I think this is basically the same thing as doing a one sample t-test. Also, I do not think I can assume normality of my data, since values have to be positive.

To be clear: The decision of the number if reads is significantly different from 0 should be made separately for each individual.

One other question I have: is it somehow possible to pool the standard deviations of the different genes? You do expect genes with a higher average read count to have a bigger standard deviation, but the standard deviation should scale with the read count (I think).

Any help would be greatly appreciated!

Cheers

2. ## Re: Test if non-negative measurements are significantly different from 0 (5 replicate

This is an interesting problem.

If each sample has a certain probability of returning a successful read, then the total number of successful reads will follow a Poisson distribution. A Poisson distribution becomes normal in the limit of a large number of expected successful reads, but it highly non-normal if that expected number is small. Given a histogram of the number of reads, it is quite straightforward to fit a Poisson distribution to that histogram, obtaining a best value of the probability of success P and a confidence interval around it. (See http://en.wikipedia.org/wiki/Poisson...mum_likelihood.)

Now in an ideal world, that probability would exactly equal the fraction f of samples that contains the gene, so you could just look at the error bars on P from your fit and if they contained zero you would say that the gene might not be present. But life isn't quite so simple. The only way the confidence interval for P will contain zero is if all the counts are in the zero bin -- after all, any successful read at all is proof that there must be some non-zero probability of success. What is happening is that you actually have a (hopefully small) error rate e at which you get a successful read even if the gene is not present, and if fact P = f + e, so you are measuring the sum (f + e).

The only way to get f from (f + e) is to have a measurement for e. So repeat your procedure on a sample that you know does not contain the gene in question and measure the histogram. Fit a Poisson to that histogram (which will be mostly zeros but have a handful of non-zero counts) to get a value (and error bar) for P = e. Repeat the procedure many times to obtain a really high-count histogram and a really good value for e with a really small error bar. Then subtract that value from all the measurements of (f + e) from your other runs (keeping track of how error bars propagate) to obtain a value with error bar for f alone. If that error interval includes zero, then your measurement is compatible with that gene not being present.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts