increasing the statistical significance of outliers

#1
Can someone give me some assistance on the use case below?

Summary
To create a Key Performance Indicator (as a percent of server utilization where 0% is the best and 100% the worst) that will alert a user when one or more servers in a 30 server environment needs attention.

Details
Each server has three separate metrics shown below that will be calculated as a weighted average to come up with the final KPI. Each server reports its metrics at 5 minute intervals. I would like if possible for the formula to be flexible so I could report the KPI based on various time ranges (ie. 1 hour - 7 days)

1. CPU Utilization (45% weight)
2. Memory Utilization (45% weight)
3. Disk Input/Outpt (10% weight)

I've tried to use straight averages and with this for example if 1 server has 100% CPU and Memory it will quickly get drowned out if the other 29 servers are performing optimally

I've also tried using range, but found in larger time samples the range becomes too significant and could indicate a larger problem than it is in reality.
 

noetsi

Fortran must die
#2
I am not entirely sure what you are trying to do here. Do you simply want to know when one of the three metrics is far above the median/mean level for that metric ? Tukey boxplot works for that and has the advantage of being robust, simple to calculate, and widely used. It is subject to some problems with skew (there is a modification for that I lack the expertise to tell you if it actually works). :p

This does not increase the "statistical significance" of an outlier. I am not sure how you would do that or why you would want to.
 
#3
I will show an example of what i'm trying to accomplish here with sample values. I have 10 systems with the following values

System 1: 10% CPU
System 2: 5% CPU
System 3: 6% CPU
System 4: 8% CPU
System 5: 4% CPU
System 6: 6% CPU
System 7: 98% CPU
System 8: 6% CPU
System 9: 5% CPU
System 10: 4% CPU

If you take the average it's 14.8% but the problem is that 98% is significant even if it only occurs on one system. How can I make the outlier (98%) more significant in my final metric?
 

rogojel

TS Contributor
#4
hi,
the standard solution would be to use a so called Xbar-S chart. You would pick groups of the servers based on some rational criteria and some time interval ( or several combinations of groups and intervals, depending on what you need) and calculate the mean over each group and interval and the correspondind standdard deviation and present the pairs on a chart.

So, you can see the average behavior for each group and also get some warning if some servers deviate , but you would not get spurious alarms for short peaks. Of course this would have to be fine -tuned as to the periods and group sizes.

regards