+ Reply to Thread
Results 1 to 8 of 8

Thread: Measure Variability of data points

  1. #1
    Points: 134, Level: 2
    Level completed: 68%, Points required for next Level: 16

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Measure Variability of data points



    I'm struggling to find the appropriate solution to a project that I'm working on. I am trying measure the variability of lead time on products that we purchase I can't figure out what an appropriate calculation would be. Below are some of the details.

    - I'm trying to compare vendors. Some tend to have short lead times (3-5 days) some are long (60-90 days). Whatever measure I use I need to be able to compare these vendors on the same scale.

    - The data tends to have a lot of outliers.

    My end goal is to be able to evaluate suppliers against each other. Are their lead times all over the place? Or do they tend to be relatively consistent.

    Sample A - 32, 6, 9, 16, 3, 30, 34, 36, 33, 35, 42, 25, 38, 10

    Sample B - 6, 7, 6, 6, 10, 8, 10, 8, 5, 8, 8, 7, 8, 6, 6, 6

    Sample C - 7, 35, 6, 33, 31, 6, 32, 35, 30

    Sample D - 28, 28, 20, 34, 16, 26, 27, 28, 15, 21, 23

    Sample E - 2, 2, 4, 2, 1, 3, 4, 7, 11, 2, 1, 3, 2, 2, 2, 5

    Any suggestions would be greatly appreciated!

  2. #2
    TS Contributor
    Points: 6,808, Level: 54
    Level completed: 29%, Points required for next Level: 142

    Posts
    775
    Thanks
    0
    Thanked 70 Times in 69 Posts

    Re: Measure Variability of data points

    Coefficient of Variation?

    Advantages
    Whereas the standard deviation of data must always be understood in the context of the mean of the data. Instead, the actual value of the CV is independent of the unit in which the measurement has been taken, so it is a dimensionless number. For comparison between data sets with different units or widely different means, one should use the coefficient of variation instead of the standard deviation.

    Disadvantages
    When the mean value is close to zero, the coefficient of variation will approach infinity and is hence sensitive to small changes in the mean. This is often the case if the values do not originate from a ratio scale.

  3. #3
    Points: 134, Level: 2
    Level completed: 68%, Points required for next Level: 16

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Measure Variability of data points

    Thanks Mean Joe. It seems that the CV is heavily influenced by outliers. Do you know of a good way to handle that?

    Ex. Sample A - 75, 68, 61, 63,88, 84, 50, 77 - CV = .16929

    Sample B - 5, 5, 71, 3, 6 - CV = 1.4732

    Sample B is more consistent but because of the outlier they score very poorly.

    Thanks

  4. #4
    Points: 2,026, Level: 27
    Level completed: 18%, Points required for next Level: 124

    Posts
    80
    Thanks
    21
    Thanked 0 Times in 0 Posts

    Re: Measure Variability of data points

    Here are a bunch of ideas:

    1) Draw a histogram for each sample. If they are generally normally distributed and the outliers are visually separable (e.g. they all sit more than 2 Standard Deviations away from the mean), you can simply define a cutoff and exclude data beyond that cutoff (e.g. mean ± 2*SD). This is a simplified version of Chauvenet's criterion

    2) You can also have a look at Grubbs' test for outliers. This also assumes that your data is normally distributed.

    3) If your data is not normally distributed, there are also options. If you can get rid of outliers and the only concerns is regarding the statistical test that you are using for comparing the groups, then Central limit theorem is your friend; have a large number of data points in each group and you can assume noramlity. If you haven't gotten rid of outliers because of the weird distribution, you might look into the option converting the scales. Sometimes using log(x) instead of (x) gives you more felxibility.

  5. #5
    Banned
    Points: 3,528, Level: 37
    Level completed: 19%, Points required for next Level: 122
    GretaGarbo's Avatar
    Posts
    419
    Thanks
    130
    Thanked 139 Times in 122 Posts

    Re: Measure Variability of data points

    @mschweinzger.

    Is this real data? Or have you or a teacher or someone made it up, as an example?

    Are there any other explanatory factors, for example if some units were deliveries from a central inventory from far away? Like when the data goes from around 7 to suddenly 35.

    How did you make the selection to choose units? Randomisation? I guess the vendors delivers thousands of units.

  6. #6
    Points: 134, Level: 2
    Level completed: 68%, Points required for next Level: 16

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Measure Variability of data points

    @GretaGarbo

    This is real data. There aren't any visible explanatory factors. The vendors delivery lead times simply vary. I'm trying to measure and evaluate vendors on how consistent (or inconsistent) the lead time is as this has significant implications on stock we keep on hand and the lead times we us when placing orders. I've attached a small sample of the data set that I'm working with. I'm trying to evaluate 12 months worth of deliveries. LT VAR SAMPLE.xls

  7. #7
    Banned
    Points: 3,528, Level: 37
    Level completed: 19%, Points required for next Level: 122
    GretaGarbo's Avatar
    Posts
    419
    Thanks
    130
    Thanked 139 Times in 122 Posts

    Re: Measure Variability of data points

    What does this expression mean (from your first post):
    “Are their lead times all over the place? “
    From the xl-file: what is “rec_amout”? Is that not an explanatory variable?

    What have you tried so far?

  8. #8
    Points: 134, Level: 2
    Level completed: 68%, Points required for next Level: 16

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Measure Variability of data points


    What I meant by this was - Are their lead times highly variable?

    rec_amount is the number of pieces that we received in the shipment (each row represents a receipt).

    Currently, I take the median for each item number.

    I then determine the acceptable lead time range for each item number based on the median. For items with a median lead time less than 20 days I've defined the acceptable range as +/- 2 days. For items with a median lead time greater than 20 days I've defined the acceptable range as +/- 10%.

    For each receipt I then determine if the lead time fell inside or outside of the acceptable range. I then take the sum of all pieces received outside of the acceptable range / all pieces received total to determine the % of receipts out of range.

    Other than that I've tried coefficient of variation but the significant outliers skew the results to heavily.

+ Reply to Thread

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats