Monte Carlo Simulation for Predicting Agile Stories Completed

trinker

ggplot2orBust
#1
A team at work saw this post on using MC for predicting a forcast of stories completed: http://scrumage.com/blog/2015/09/agile-project-forecasting-the-monte-carlo-method/

I have a series of questions as I know of Monte Carlo Simulation but have not used them.

1. Is the basic gist: Get mean and sd from data (can be very small n = 5). Assume the data is a normal distribution and draw n random samples from that distribution. Take the average of these draws and use quantiles to calculate a confidence interval?

2. Does the sampling distribution have to be normal? So for example the outcome for agile stories is number of stories for a sprint. This tells me the distribution is likely Poisson or negative binomial. Should I instead try to draw from one of these distributions (though I don't think I can get all the parameters from a very small data set to plug in to the sampling)?

3. What if I wanted to include months as a variable? Would I instead run 12 MC simulations for each month and compute the forcast that way?

4. Given a small data set (~4 months or 8 sprints [8 observations]) and count outcome is there a better technique or other techniques that could be investigated?

5. It seems that MC is very similar to bootstrapping except you draw from an assumed distribution rather than resampling from empirical data. Is this true that the techniques are similar with the exception of what's being sample? If so it seems bootstrapping would need more data?
 
#2
Monte Carlo simulation (MC) is very similar to bootstraping. In bootstrap you simulate from your own sample (with replacement).

In usual simulation you simulate from one distribution (it can be any distribution; normal or not, Cauchy or whatever, or for example your table, or a mixed distribution of two distributions so it becomes bi-modal) and you have your favorite complicated formula (e.g. the correlation coefficient or your own "quality index") and the you want to evaluate the random properties of the formula.

So you generate some chi-squared numbers and compute the formula.You do that many times and you can look at anything you are interested in, maybe the mean, the standard deviation, the whole histogram, is there a bias, does the bias decrease as the sample size increases and so on.

Of course you can have x-variables (like months) but then the generated random variables are often the disturbance term (often normally distributed).
 

hlsmith

Omega Contributor
#3
Yes, there definitely are similarities to bootstrapping, but instead of creating the distribution out of your own data set resampled, you pre-specify your data generating function or source. Yup, then use percentile intervals for confidence.

And as Greta mentioned, your data generating function can be a mixture. The process is obviously also similar to using MC or MCMC in Bayesian statistics.