# Thread: Counting Error (with a twist)

1. ## Counting Error (with a twist)

Hello all. I'd love some help with a problem I'm having, which is not for homework but for my research (I'm a new graduate student in biophysics). Here is the problem:

I'm measuring the number of times I "observe" a particular gene in an experiment. Let's say I observe it N times. I want to compute the error in the measurement. HOWEVER, there is a twist: each time I observe the gene, it gets a fractional score (<1) based on how certain I am that I saw that gene (this is due to the experimental details). So I have a series of scores s1,s2,s3,...,sN for each gene. What I want is the error in the total score S=s1+s2+...+sN.

The way I've been thinking about this is, if I used just a score of 1 for each measurement, it would be a simple counting error problem: N measurements, sqrt(N) error. Should perhaps the error be S/sqrt(N) (the "standard" error)?

Thank you very much for your help.

2. Let me assume that the s_i are independant. This is far from a slam dunk.

then S = nS_bar -----{n times the sample mean}

an estimate of the variance of the S_bar is

var(s_i)_hat/n --{var(s_i)_hat is the square of the standard deviation of the s_i}.

by variance rules

Var(n S_bar) = n^2 Var(S_bar) = n var(s_i)_hat.

---------------------------------------------------------

Of course, this has nothing to do with the actual number of genes, just the average value of your score.

3. Originally Posted by fed1
Var(n S_bar) = n^2 Var(S_bar) = n var(s_i)_hat.
Thanks so much! However I'm not sure this encompasses all the error I want it to. The error in S should encompass both the variance of the s_i as well as the counting error. So if all the s_i are 1 (score for each measurement is the same) then the error in S should just be sqrt(N), where N is the number of those scores (whereas in the variance you wrote above, if all the s_i are identical then the error in S is 0). Does that make sense?

4. Yeah, i see what you are saying.

Lets back up. Suppose, instead of genes, we were counting heads in n trials of coin flips so

Y_i ~ bernoulli(p), where Y_i is indicator of heads on ith flip.

now we have these scores s_i assigned to each flip that tell us how sure we are that it is heads. Hopefully s_i depends on Y_i (is useless if it doesn't) so

s_i = f( Y_i, Z_i ). Where Z_i is random

Why introduce Z_i? if not s_i is deterministicly related to Y_i, which is dumb. One model that is inutivel appealing is a mixture

s_i = Y_i Z1_i + (1 - Y_i) Z2_i

Maybe choose Z1_i and Z2_i normal with some means?

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts