# Thread: How to calculate variance on pooled data

1. ## How to calculate variance on pooled data

I have an array of pairs.
Each pair represents a "bin" in a range.
Each pair stores a sum and a count.
Data is added to a pair/bin by adding the value to the sum and incrementing the count.

I need to calculate the variance of the whole data set. Is this possible?

Can I use the Weighted incremental algorithm from
http://en.wikipedia.org/wiki/Algorit...ating_variance

...
When the observations are weighted, West (1979) suggests this incremental algorithm:

def weighted_incremental_variance(dataWeightPairs):
n = 0
mean = 0
S = 0
sumweight = 0
for x, weight in dataWeightPairs: # Alternately "for x in zip(data, weight):"
n = n + 1
temp = weight + sumweight
Q = x - mean
R = Q * weight / temp
S = S + sumweight * Q * R
mean = mean + R
sumweight = temp
Variance = S * n / ((n-1) * sumweight) # if sample is the population, omit n/(n-1)
return Variance

Felix

2. Bump.

No one has any idea?

3. I think this approach will underestimate the real variance. My reasoning is this: 1 bin gives a variance of zero. An infinite number of bins means that each discrete value will have its own bin therefore the calculated variance should be correct. Can anyone give me a function that will calculate the error between my calculated variance and the true variance with the number of bins as the variable?

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts