I have an array of pairs.
Each pair represents a "bin" in a range.
Each pair stores a sum and a count.
Data is added to a pair/bin by adding the value to the sum and incrementing the count.
I need to calculate the variance of the whole data set. Is this possible?
Can I use the Weighted incremental algorithm from
http://en.wikipedia.org/wiki/Algorit...ating_variance
...
When the observations are weighted, West (1979) suggests this incremental algorithm:
def weighted_incremental_variance(dataWeightPairs):
n = 0
mean = 0
S = 0
sumweight = 0
for x, weight in dataWeightPairs: # Alternately "for x in zip(data, weight):"
n = n + 1
temp = weight + sumweight
Q = x - mean
R = Q * weight / temp
S = S + sumweight * Q * R
mean = mean + R
sumweight = temp
Variance = S * n / ((n-1) * sumweight) # if sample is the population, omit n/(n-1)
return Variance
Thanks for your help!
Felix





Reply With Quote