+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 20

Thread: Distribution of Data

  1. #1
    Points: 190, Level: 3
    Level completed: 80%, Points required for next Level: 10

    Posts
    16
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Distribution of Data




    Hello forum members,

    I wonder if someone can kindly help. I have a dataset which I've uploaded and I'm trying to work out a sensible distribution. It represents the number of throws a darts player needs before he can aim for a double. The minimum possible is 8 and the maximum possible is theoretically infinite although good players would very rarely go beyond 30 or so.

    I thought a lognormal distribution might fit best but would be very grateful for a second opinion. You will see that the data peaks on certain numbers of darts, presumably because certain scores (eg 180) are more common than others due to the fact that darts players have particular habits and scoring is not random.

    Any thoughts/advice would be most welcome and appreciated.

    Thanks in advance.
    Attached Files

  2. #2
    Devorador de queso
    Points: 97,539, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent PosterActivity Award
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,987
    Thanks
    309
    Thanked 2,640 Times in 2,255 Posts

    Re: Distribution of Data

    What is your ultimate goal? Why are you trying to fit a distribution to this?
    I don't have emotions and sometimes that makes me very sad.

  3. #3
    Omega Contributor
    Points: 39,045, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,070
    Thanks
    402
    Thanked 1,192 Times in 1,153 Posts

    Re: Distribution of Data

    Can you tell us what dart game you are referencing and the general rules?

    Not big on opening files, can you post a histogram with overlaid kernel density.
    Stop cowardice, ban guns!

  4. #4
    Points: 190, Level: 3
    Level completed: 80%, Points required for next Level: 10

    Posts
    16
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Distribution of Data

    Hi,

    Thanks for the responses. OK maybe I should take a step back and give a bit more context.

    My raw data is darts scores over the first 9 darts (or 3 visits). The maximum possible score is 501. The minimum score in the data is 75. The mean is 298.

    My ultimate goal is to be able to say, given a player has a mean of x over the first 9 darts, what are the probabilities of him or her getting each score over the first 9 darts.

    From there I hope to be able to calculate the probabilities for the numbers of darts a player would need to get within a double (ie scoring at least 461).

    I'm now going to try to attach a couple of histograms of the raw data. Bear with me as I'm not a statistician and I also don't know how to embed images so a couple of hurdles to tackle!

  5. #5
    Points: 190, Level: 3
    Level completed: 80%, Points required for next Level: 10

    Posts
    16
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Distribution of Data

    Hopefully this works. Simple histogram of raw scores:


    Name:  Frequency 9 Dart Scores.jpg
Views: 76
Size:  16.4 KB

  6. #6
    Points: 190, Level: 3
    Level completed: 80%, Points required for next Level: 10

    Posts
    16
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Distribution of Data

    And here's the data grouped into bands of 10:

    Name:  Frequency 9 Dart Scores Grouped.jpg
Views: 75
Size:  16.8 KB

    Hope this helps? Apologies if it's too basic, I'm working in Excel and not a statistician so am a little limited in what I can do. Happy to purchase software though if there's anything that people recommend and if anyone has any videos/articles/books they think would assist me in my task I'd also appreciate that.

    Thanks in advance.

  7. #7
    Devorador de queso
    Points: 97,539, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent PosterActivity Award
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,987
    Thanks
    309
    Thanked 2,640 Times in 2,255 Posts

    Re: Distribution of Data

    I'm doubting there is a simple parametric distribution that would meet your needs. You could just use your empirical distribution to make those calculations though I would think.
    I don't have emotions and sometimes that makes me very sad.

  8. #8
    Points: 190, Level: 3
    Level completed: 80%, Points required for next Level: 10

    Posts
    16
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Distribution of Data

    Thanks Dason.

    Silly question perhaps but how would one go about that? Happy to read up on it/watch videos etc if you could point me in the right direction or give me something to start with?

  9. #9
    Omega Contributor
    Points: 39,045, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,070
    Thanks
    402
    Thanked 1,192 Times in 1,153 Posts

    Re: Distribution of Data

    Thanks for the description, that helped.


    General question, where is your data coming from? Also, it is assumed that a person only contributes one set of scores to the dataset. Thus, the scores are independent and not correlate within a person. Is this the case for your data?
    Stop cowardice, ban guns!

  10. #10
    Points: 190, Level: 3
    Level completed: 80%, Points required for next Level: 10

    Posts
    16
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Distribution of Data

    Hi,

    Yes the data is biased in the sense that it comes from multiple layers but some players feature more than others and obviously some are better than others etc so it's not really uniform.

    I have quite a lot of data so there would be scope to use a sub-sample if that would be advisable.

  11. #11
    Human
    Points: 12,955, Level: 74
    Level completed: 27%, Points required for next Level: 295
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,401
    Thanks
    460
    Thanked 474 Times in 414 Posts

    Re: Distribution of Data

    Quote Originally Posted by jazzfish View Post
    The minimum possible is 8 and the maximum possible is theoretically infinite although good players would very rarely go beyond 30 or so.
    In the first post and the attached data the data seems to vary between 8 and 30.
    But in the later shown histogram the data seems to be around 200 - 300.

    Which one is correct? (And also which sheet is correct in the attached file?)

    If you take your data and do (data - 8) so that the data can be 0 or larger, then maybe a Poisson model or a negative binomial distribution could be useful.

  12. #12
    Omega Contributor
    Points: 39,045, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,070
    Thanks
    402
    Thanked 1,192 Times in 1,153 Posts

    Re: Distribution of Data

    Yeah, I might go with Dason on this one. Though I wonder if this problem has already been solved somewhere given the longevity of darts.


    My slight issue is the data generating process. I would imagine if people are just trying to get the biggest score, there are certain numbers that are targeted, which have other numbers right next to them. So I am guessing those that miss 20 get 1, etc. I also would imagine that left-handed versus right-handed throwers have different strategies given English or variability tendencies. I know that if I miss it is more likely to float a certain direction. Also, you have the issue of certain scores being more probable. If you have a whole lot of data, you could just parcel out each person's data to themselves. So my prior scores function to predict my future scores!
    Stop cowardice, ban guns!

  13. #13
    Points: 190, Level: 3
    Level completed: 80%, Points required for next Level: 10

    Posts
    16
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Distribution of Data

    Hi GretaGarbo,

    The original file had the raw scores over the first darts converted to estimate the number of darts required to get to within range of a double (ie the number of darts required to score 461 or more) rounded up to the nearest integer. The second file stripped it back to the raw scores as I was worried that the rounding might distort things. However, using the number of darts rather than raw scores does make the data look a bit more normal. Please see below for the distribution of darts rather than scores. Perhaps this is a better way to go afterall?

    Name:  Frequency Darts to Range.jpg
Views: 44
Size:  14.3 KB

  14. #14
    Points: 190, Level: 3
    Level completed: 80%, Points required for next Level: 10

    Posts
    16
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Re: Distribution of Data

    Hismith

    I did think about your idea but for some players data will be very limited.

    I wonder if a hybrid approach would work where players of a certain standard are grouped. If I did that how should I go about adjusting for players within each group (i.e. If they are slightly better or worse than the group average).

    I guess what I'm saying is when you create your own distribution how do you calculate the distribution of players that do not adhere to the mean of that group?

    Any thoughts much appreciated as always.

  15. #15
    Human
    Points: 12,955, Level: 74
    Level completed: 27%, Points required for next Level: 295
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,401
    Thanks
    460
    Thanked 474 Times in 414 Posts

    Re: Distribution of Data


    If you take your data and do (data - 8) so that the data can be 0 or larger, then maybe a Poisson model could be useful.

    y = data - 8

    Lets assume that you have players in four levels x= 1, 2, 3 and 4.

    Then you can do a Poisson regression with y as dependent variable and x as independent variable. Each x will give you a new mu (expected value) and therefore also the distribution at of y at that skill level of x.

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats