# Which probability distribution do I use for this, and how?

#### danlightbulb

##### New Member
Hi all,

Im not a statistitian, although I have some reasonable education in mathematics.

This isn't a homework question (I'm 37 lol), its just something I have been working on as a personal project lately.

The following graph represents the % payout of an online video slot game. I have collected data points comprising of 25 spins on this slot, recording the return to player (RTP) of each group of spins. This has then been charted, grouping into 10% bands using a pivottable. There are 150 data points forming this data set. As you can see, it has begun to resemble some kind of probability distribution. I was thinking initially a Poisson distribution. However, my attempts to map a true Poisson curve to this data have failed.

So I would like to request some help to:
a) find out which type of probability distribution this resembles; and
b) help me estimate the parameters that will enable me to plot the theoretical probability curve onto this experimental data set.

To my mind, the curve I need will not be symmetrical. As you can see from the chart, there is a significant one sided tail to the distribution. There is no tail on the left hand side because obviously there is a hard threshold at zero RTP.

Any help appreciated.

Thanks
Dan

#### dthiaw

##### New Member
cant you use the normal distribution by the central limit theorem since your sample size is big enough? 150 data points right?

#### rogojel

##### TS Contributor
hi,
this looks like a good candidate for a logistic or loglogistic distribution. You could run a distribution identification on it to identify the parameters. How to do that will depend on what kind of SW you have.

regards

#### danlightbulb

##### New Member
I don't have any software other than excel unfortunately. I'm aware some software is free, like R, but I don't know how to use it.

I just had a quick look at lognormal and that appears to fit the overall shape I'm expecting. Is that what was meant by logistic?

#### katxt

##### Active Member
Perhaps you might want to post the raw data in Excel and we could have a fiddle with it.

#### danlightbulb

##### New Member
Perhaps you might want to post the raw data in Excel and we could have a fiddle with it.
That would be cool thanks. I have tried to attach file hopefully it works.

File contains 2 sheets. One for starburst which is a 'low variance' slot using casino terminology. That contains 150 data points. The 2nd sheet is still being compiled but is for a different slot classed as 'medium variance'. It displays (although still in development) a different shape of curve.

The key data is in the columm labelled 'RTP', the rest of the columns are the data collected to calculate what the RTP was for that 25 spin group.

#### katxt

##### Active Member
For the starburst sheet, if you eliminate points 3 and 11, the data is very well fitted to log normal. For the twin spin daya, log normal in the middle, not so much in the tails.
The two diagrams are normal probability plots of the logged data.
I think you would be lucky to get improved two parameter distributions.
Possibly there is a three parameter distribution which would fit the odd points in the tails.
And, as John von Neumann famously said "With four parameters I can fit an elephant."

#### danlightbulb

##### New Member
Thats really cool thanks!

Would you mind explaining what your axis represents in those charts please, as its moved away from my 0-400% RTP. I would want to be able to plot the theoretical curve on the same axis as my sample data, so that it is easily interpreted visually.

#### katxt

##### Active Member
The vertical axis is the LOG() of the RTP. Log(400) is about 2.6
The horizontal axis is the standard normal which goes from -3 to 3, because just about all of the standard normal values are between these two.
For each value in the set you are checking, its quantile (position in the whole set) is calculated and plotted against the corresponding quantile in the normal distribution. If the set is normal (more or less), the quantiles will match and the graph will be straight. Usually, of course you are just checking a data set for normality, not the log of the set for log normality.
Google "normal probability plot" and look at images. The same idea can be used to check for distributions other than normal.
Raw Excel can't do normal probability plots, but I have a simple Excel sheet which can if you are interested. kat