Hi,
you could use a truncated mean, i.e. cut off the 1st and 99th percentile for example and calculate mean for the rest. Or simply use the median, there is no reaquirement to only use the median for non-normal data.
Hi,
I am stuck at this point in my thesis in selected between Median and Mean.
My data and work:
I have a set of data X, and for each data y in X I am using two algorithms (A1, A2) to compute something lets call it C. For each y I am running A1 and A2 and store the time they take (A1, A2) to finish the computation (C).
At this point -> I have table with 3 columns (Each row: y1, timeFor(A1,y1), timeFor(A2,y2)).
- Column1: X (contains y's)
- Column2: A1 time for each y
- Column3: A2 time for each y
Problem:
There are some data points (very few) that the time for one of the approaches takes away too much time, so if we use the Mean the time will not represent the actual time. To fix this we use the Median, but to prove that we need to prove first that the data are not normally distributed and for this I am following these steps:
1- I am using Skewness and Kurtosis in deciding if the data are normally distributed or not. If the value of Skewness and Kurtosis != 0 then they are not normally distributed.
2- I am using Wilcoxon Test, to get the p-value (0.05 significant value/level) and to evaluate my hypothesis in term of using Mean and Median, and if my data are normally or not normally distributed.
As I don't know exactly how to use that and to be specific, when:
- Skewness and Kurtosis != 0, and Wilcoxon > 0.05 then I know it is Median.
- Skewness and Kurtosis != 0, and 0 < Wilcoxon < 0.05, then not sure what to do next (Median or Mean).
(!=) -> not equal
Any help?
Many thanks in advance,
Hi,
you could use a truncated mean, i.e. cut off the 1st and 99th percentile for example and calculate mean for the rest. Or simply use the median, there is no reaquirement to only use the median for non-normal data.
Hi,
you do not need to base your decision on any measurement. If it is the sensitivity to outliers that is the you can either take a truncated mean or the median, up to you and your audience.
regards
Tweet |