+ Reply to Thread
Results 1 to 4 of 4

Thread: Median or Mean vs Data distribution

  1. #1
    Points: 10, Level: 1
    Level completed: 19%, Points required for next Level: 40

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Median or Mean vs Data distribution




    Hi,

    I am stuck at this point in my thesis in selected between Median and Mean.

    My data and work:

    I have a set of data X, and for each data y in X I am using two algorithms (A1, A2) to compute something lets call it C. For each y I am running A1 and A2 and store the time they take (A1, A2) to finish the computation (C).
    At this point -> I have table with 3 columns (Each row: y1, timeFor(A1,y1), timeFor(A2,y2)).
    • Column1: X (contains y's)
    • Column2: A1 time for each y
    • Column3: A2 time for each y

    Problem:

    There are some data points (very few) that the time for one of the approaches takes away too much time, so if we use the Mean the time will not represent the actual time. To fix this we use the Median, but to prove that we need to prove first that the data are not normally distributed and for this I am following these steps:

    1- I am using Skewness and Kurtosis in deciding if the data are normally distributed or not. If the value of Skewness and Kurtosis != 0 then they are not normally distributed.

    2- I am using Wilcoxon Test, to get the p-value (0.05 significant value/level) and to evaluate my hypothesis in term of using Mean and Median, and if my data are normally or not normally distributed.

    As I don't know exactly how to use that and to be specific, when:
    • Skewness and Kurtosis != 0, and Wilcoxon > 0.05 then I know it is Median.
    • Skewness and Kurtosis != 0, and 0 < Wilcoxon < 0.05, then not sure what to do next (Median or Mean).

    (!=) -> not equal

    Any help?

    Many thanks in advance,

  2. #2
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Median or Mean vs Data distribution

    Hi,
    you could use a truncated mean, i.e. cut off the 1st and 99th percentile for example and calculate mean for the rest. Or simply use the median, there is no reaquirement to only use the median for non-normal data.

  3. #3
    Points: 10, Level: 1
    Level completed: 19%, Points required for next Level: 40

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Median or Mean vs Data distribution

    Quote Originally Posted by rogojel View Post
    Hi,
    you could use a truncated mean, i.e. cut off the 1st and 99th percentile for example and calculate mean for the rest. Or simply use the median, there is no reaquirement to only use the median for non-normal data.
    Thank you for the reply,

    The thing I need to reason why Median and not Mean, this is why I am doing these analysis.
    The only problem is in "Skewness and Kurtosis != 0, and 0 < Wilcoxon < 0.05, then not sure what to do next (Median or Mean)."

    Thanks again,

  4. #4
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Median or Mean vs Data distribution


    Hi,
    you do not need to base your decision on any measurement. If it is the sensitivity to outliers that is the you can either take a truncated mean or the median, up to you and your audience.

    regards

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats