# Thread: Median or Mean vs Data distribution

1. ## Median or Mean vs Data distribution

Hi,

I am stuck at this point in my thesis in selected between Median and Mean.

My data and work:

I have a set of data X, and for each data y in X I am using two algorithms (A1, A2) to compute something lets call it C. For each y I am running A1 and A2 and store the time they take (A1, A2) to finish the computation (C).
At this point -> I have table with 3 columns (Each row: y1, timeFor(A1,y1), timeFor(A2,y2)).
• Column1: X (contains y's)
• Column2: A1 time for each y
• Column3: A2 time for each y

Problem:

There are some data points (very few) that the time for one of the approaches takes away too much time, so if we use the Mean the time will not represent the actual time. To fix this we use the Median, but to prove that we need to prove first that the data are not normally distributed and for this I am following these steps:

1- I am using Skewness and Kurtosis in deciding if the data are normally distributed or not. If the value of Skewness and Kurtosis != 0 then they are not normally distributed.

2- I am using Wilcoxon Test, to get the p-value (0.05 significant value/level) and to evaluate my hypothesis in term of using Mean and Median, and if my data are normally or not normally distributed.

As I don't know exactly how to use that and to be specific, when:
• Skewness and Kurtosis != 0, and Wilcoxon > 0.05 then I know it is Median.
• Skewness and Kurtosis != 0, and 0 < Wilcoxon < 0.05, then not sure what to do next (Median or Mean).

(!=) -> not equal

Any help?

2. ## Re: Median or Mean vs Data distribution

Hi,
you could use a truncated mean, i.e. cut off the 1st and 99th percentile for example and calculate mean for the rest. Or simply use the median, there is no reaquirement to only use the median for non-normal data.

3. ## Re: Median or Mean vs Data distribution

Originally Posted by rogojel
Hi,
you could use a truncated mean, i.e. cut off the 1st and 99th percentile for example and calculate mean for the rest. Or simply use the median, there is no reaquirement to only use the median for non-normal data.

The thing I need to reason why Median and not Mean, this is why I am doing these analysis.
The only problem is in "Skewness and Kurtosis != 0, and 0 < Wilcoxon < 0.05, then not sure what to do next (Median or Mean)."

Thanks again,

4. ## Re: Median or Mean vs Data distribution

Hi,
you do not need to base your decision on any measurement. If it is the sensitivity to outliers that is the you can either take a truncated mean or the median, up to you and your audience.

regards

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts