+ Reply to Thread
Results 1 to 3 of 3

Thread: How to calculate variance on pooled data

Hybrid View

  1. #1
    Points: 1,646, Level: 23
    Level completed: 46%, Points required for next Level: 54

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    How to calculate variance on pooled data

    I have an array of pairs.
    Each pair represents a "bin" in a range.
    Each pair stores a sum and a count.
    Data is added to a pair/bin by adding the value to the sum and incrementing the count.

    I need to calculate the variance of the whole data set. Is this possible?

    Can I use the Weighted incremental algorithm from
    http://en.wikipedia.org/wiki/Algorit...ating_variance

    ...
    When the observations are weighted, West (1979) suggests this incremental algorithm:

    def weighted_incremental_variance(dataWeightPairs):
    n = 0
    mean = 0
    S = 0
    sumweight = 0
    for x, weight in dataWeightPairs: # Alternately "for x in zip(data, weight):"
    n = n + 1
    temp = weight + sumweight
    Q = x - mean
    R = Q * weight / temp
    S = S + sumweight * Q * R
    mean = mean + R
    sumweight = temp
    Variance = S * n / ((n-1) * sumweight) # if sample is the population, omit n/(n-1)
    return Variance


    Thanks for your help!
    Felix

  2. #2
    Points: 1,646, Level: 23
    Level completed: 46%, Points required for next Level: 54

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Bump.

    No one has any idea?

  3. #3
    Points: 1,646, Level: 23
    Level completed: 46%, Points required for next Level: 54

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I think this approach will underestimate the real variance. My reasoning is this: 1 bin gives a variance of zero. An infinite number of bins means that each discrete value will have its own bin therefore the calculated variance should be correct. Can anyone give me a function that will calculate the error between my calculated variance and the true variance with the number of bins as the variable?

+ Reply to Thread

Similar Threads

  1. Pooled Variance Distribution
    By fcc in forum Probability
    Replies: 3
    Last Post: 11-29-2010, 06:59 PM
  2. PCA: Manually calculate variance?
    By eyeballjunk in forum Statistics
    Replies: 0
    Last Post: 04-28-2010, 10:30 AM
  3. How to calculate a 'pooled SEM' for an ANOVA?
    By Silvanus in forum Statistics
    Replies: 2
    Last Post: 06-05-2009, 04:22 AM
  4. Pooled Data- the right stats?
    By PiledHigh in forum Psychology Statistics
    Replies: 0
    Last Post: 06-28-2008, 02:50 PM
  5. Replies: 1
    Last Post: 06-19-2008, 12:54 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats