Don't worry about distributions yet. You're getting way ahead of yourself.
What is your research question? You haven't actually told us what question/problem your trying to solve.
Hello,
My first thread on the board. It is my hope that I am posting this to the right thread
I was wondering if you could help me with a data analysis problem. I am still learning my ways around statistics. Here is a description of the data and of the issue and I am attaching couple of descriptive diagrams.
I have over 150K incident ticket information and I have done the required descriptive statistics. The data distribution is not normal and probably, I would need to use nonparametric statistics. Poisson distribution might be one of the solution but I am having hard time figuring how to use it in SAS. I also would like to investigate Kruskal Wallis and Weibull. Does somebody have SAS code to do distribution fit to check which distribution will work for my data.
My dependent variable is the duration or resolution time (time between ticket start and resolved times)
The data is not normal and I am seeing many outliers. I have thousands of outliers where it seems like the ticket was never closed and is definitely skewing my data. I have read enough about outlier detection and deletion but I am not sure if I can delete any of them.
I will appreciate any help regarding how to deal with outliers especially the ones like these ones dealing with help desk tickets.
The mean for the ticket resolution time or duration is over 10 days because of few thousand outliers some as high as 1000 days while the median is less than one day (0.90). Standard deviation is 56. Skewness of the resolution time is 7 with kurtosis of 62. 95% quantile is 61 days while 100% MAX quantile is 1073 days. This shows the effect of outliers.
I have many other ticket categories such as tier, responsible team, the ticket problem etc.
After I complete the statistical analysis, my goal is to model the results for process improvement research.
Thanks in advance for all your help.
Don't worry about distributions yet. You're getting way ahead of yourself.
What is your research question? You haven't actually told us what question/problem your trying to solve.
I don't have emotions and sometimes that makes me very sad.
Thanks Dason for the quick reply.
Sorry about the oversight. I am using Lean thinking methodologies to reduce help desk ticket resolution time. My main research question centers around that. That is the problem, I am trying to solve. Reduce resolution time. I need to figure out using statistical analysis where the bottleneck is. I have done many of the mean, mode and median across many independent variables against dependent variable (resolution time). Also I have done many descriptive statistics analysis as well in order to see trend lines.
hi,
if you want to usw Lean in this context, which is very sensible, you will only need some light-weight stats. Your data will be quite skewed, so you better look at medians instead of means, then look at some grouping variables, aka segmentation factors, to see whether you can find groups that have a systematically longer median resolution time then other groups. (Examples of such factors could be: contractors vs. local guys, different hw (e.g. printers, iphones, laptops, hw vs. sw issues etc.) You dont even need statistical tests here really - if two groups have high median resolution times and are not statistically significantly different from each other it hust means you can work on both in any order
Ans the best advice I can give: talk to the people involved in the resolution - see what they think.
regards & good luck
hayaag (07-15-2017)
Thanks Rogojel. Greatly appreciated it!!
This is great help. Thanks.
Few final questions. How to best check if two groups aren't statistical different? Since I am using medians, should I ignore tickets who have long resolution time or possibly outliers.
Finally once I am done with the analysis, I am trying to model the results since I don't have to do lean pilot. Any help regardimg these questions is greatly appreciated it.
Thanks again.
hi,
you could use a Kruskal-Wallis test if you really care about statistical significance. As the diatributiona are akewed long resolution times are not ignored, but their inpact is less then if you looked at the mean values.
Could you describe a bit what and why you want to simulate?
regards
Thanks again Rogojel. Kruskal-Wallis is where I was leaning on although my advisor mentioned, I might want to use Poisson distribution.
I need to come up practical solution to solve the long resolution time. I am at the stage of finding out through medians and statistical analysis, the issue using the groups that you mentioned and others. After I complete this, I need to be able to prove my hypothesis or the solution that I came up. Usually Lean, you can do pilot but I don't have the time and bandwidth to do pilot now so I was thinking if model or statistical simulation might do the trick. I haven't done all the research yet so I might be using the wrong word etc. So now I am at the Analysis phase of DMAIC framework and I am thinking ahead what to use when I reach the improve and control phase of the DMAIC framework.
Please let me know if this makes sense. Thanks again for all your help.
Regards
hi,
the benefit of the KW is that you need no assumptions about the distribution. Poisson is very probably not the right choice because it will not model the high variance. In my experience with ticket data even better models, like the negative binomial will fail here. You could search for techniques under the heading "zero inflated regression" but this is probably an overkill.
As for the modelling - I would wait until I had a good collection of root causes to eventually model the impact of improvements, though it will probably turn out to be something very simple. And BTW DMAIC as a methodology is quite far from lean thinking - I guess part of your problem is that you try to enforce DMAIC on a problem where it is not suitable to give you a solution.
regards
hayaag (07-17-2017)
Yeah DMAIC is lean six Sigma and I am definitely figuring out that DMAIC is not suitable what I am trying to do. After doing initial research, I definitely agree with you that KW is the way to go. I am almost done with getting good collection of root causes and then I will investigate "zero inflated regression" and KW more. You definitely know your stuff.
You have great help and I greatly appreciated it.
Rogojel, I sent you private message. Please take a look when you get a chance.
Thanks
Hi,
there is some PB with my messaging - or the talkstats website. My email is rogojel@gmail.com just send me message. i tried to answer the private message but either I miss something or it is not working.
Regards
Thanks. I just sent an email to your Gmail account.
Tweet |