1. ## Distribution of Data

Hello forum members,

I wonder if someone can kindly help. I have a dataset which I've uploaded and I'm trying to work out a sensible distribution. It represents the number of throws a darts player needs before he can aim for a double. The minimum possible is 8 and the maximum possible is theoretically infinite although good players would very rarely go beyond 30 or so.

I thought a lognormal distribution might fit best but would be very grateful for a second opinion. You will see that the data peaks on certain numbers of darts, presumably because certain scores (eg 180) are more common than others due to the fact that darts players have particular habits and scoring is not random.

Any thoughts/advice would be most welcome and appreciated.

2. ## Re: Distribution of Data

What is your ultimate goal? Why are you trying to fit a distribution to this?

3. ## Re: Distribution of Data

Can you tell us what dart game you are referencing and the general rules?

Not big on opening files, can you post a histogram with overlaid kernel density.

4. ## Re: Distribution of Data

Hi,

Thanks for the responses. OK maybe I should take a step back and give a bit more context.

My raw data is darts scores over the first 9 darts (or 3 visits). The maximum possible score is 501. The minimum score in the data is 75. The mean is 298.

My ultimate goal is to be able to say, given a player has a mean of x over the first 9 darts, what are the probabilities of him or her getting each score over the first 9 darts.

From there I hope to be able to calculate the probabilities for the numbers of darts a player would need to get within a double (ie scoring at least 461).

I'm now going to try to attach a couple of histograms of the raw data. Bear with me as I'm not a statistician and I also don't know how to embed images so a couple of hurdles to tackle!

5. ## Re: Distribution of Data

Hopefully this works. Simple histogram of raw scores:

6. ## Re: Distribution of Data

And here's the data grouped into bands of 10:

Hope this helps? Apologies if it's too basic, I'm working in Excel and not a statistician so am a little limited in what I can do. Happy to purchase software though if there's anything that people recommend and if anyone has any videos/articles/books they think would assist me in my task I'd also appreciate that.

7. ## Re: Distribution of Data

I'm doubting there is a simple parametric distribution that would meet your needs. You could just use your empirical distribution to make those calculations though I would think.

8. ## Re: Distribution of Data

Thanks Dason.

Silly question perhaps but how would one go about that? Happy to read up on it/watch videos etc if you could point me in the right direction or give me something to start with?

9. ## Re: Distribution of Data

Thanks for the description, that helped.

General question, where is your data coming from? Also, it is assumed that a person only contributes one set of scores to the dataset. Thus, the scores are independent and not correlate within a person. Is this the case for your data?

10. ## Re: Distribution of Data

Hi,

Yes the data is biased in the sense that it comes from multiple layers but some players feature more than others and obviously some are better than others etc so it's not really uniform.

I have quite a lot of data so there would be scope to use a sub-sample if that would be advisable.

11. ## Re: Distribution of Data

Originally Posted by jazzfish
The minimum possible is 8 and the maximum possible is theoretically infinite although good players would very rarely go beyond 30 or so.
In the first post and the attached data the data seems to vary between 8 and 30.
But in the later shown histogram the data seems to be around 200 - 300.

Which one is correct? (And also which sheet is correct in the attached file?)

If you take your data and do (data - 8) so that the data can be 0 or larger, then maybe a Poisson model or a negative binomial distribution could be useful.

12. ## Re: Distribution of Data

Yeah, I might go with Dason on this one. Though I wonder if this problem has already been solved somewhere given the longevity of darts.

My slight issue is the data generating process. I would imagine if people are just trying to get the biggest score, there are certain numbers that are targeted, which have other numbers right next to them. So I am guessing those that miss 20 get 1, etc. I also would imagine that left-handed versus right-handed throwers have different strategies given English or variability tendencies. I know that if I miss it is more likely to float a certain direction. Also, you have the issue of certain scores being more probable. If you have a whole lot of data, you could just parcel out each person's data to themselves. So my prior scores function to predict my future scores!

13. ## Re: Distribution of Data

Hi GretaGarbo,

The original file had the raw scores over the first darts converted to estimate the number of darts required to get to within range of a double (ie the number of darts required to score 461 or more) rounded up to the nearest integer. The second file stripped it back to the raw scores as I was worried that the rounding might distort things. However, using the number of darts rather than raw scores does make the data look a bit more normal. Please see below for the distribution of darts rather than scores. Perhaps this is a better way to go afterall?

14. ## Re: Distribution of Data

Hismith

I did think about your idea but for some players data will be very limited.

I wonder if a hybrid approach would work where players of a certain standard are grouped. If I did that how should I go about adjusting for players within each group (i.e. If they are slightly better or worse than the group average).

I guess what I'm saying is when you create your own distribution how do you calculate the distribution of players that do not adhere to the mean of that group?

Any thoughts much appreciated as always.

15. ## Re: Distribution of Data

If you take your data and do (data - 8) so that the data can be 0 or larger, then maybe a Poisson model could be useful.

y = data - 8

Lets assume that you have players in four levels x= 1, 2, 3 and 4.

Then you can do a Poisson regression with y as dependent variable and x as independent variable. Each x will give you a new mu (expected value) and therefore also the distribution at of y at that skill level of x.

Page 1 of 2 1 2 Last

 Tweet