+ Reply to Thread
Results 1 to 12 of 12

Thread: Data distribution

  1. #1
    Points: 32, Level: 1
    Level completed: 64%, Points required for next Level: 18

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Data distribution




    If I have a single column of data; the weights of 100 individuals measured. How would I test to see what sort of distribution the weights follow. And how well they fit this distribution? Would the test results depend on how the data is grouped?

    Also what other sort of analysis would be crucial to carry out with this single column of data?

    Thank You

  2. #2
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Data distribution

    You should look at basic descriptive statistics (mean std var median Q1 Q3 min max). You should look at the data "moments". You can examine the shape of the distribution via a histogram. I believe the program " r" has a package that examines data distributions. Weights are usually right skew and normalized via a log transformation.
    Stop cowardice, ban guns!

  3. #3
    Points: 32, Level: 1
    Level completed: 64%, Points required for next Level: 18

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Data distribution

    I fitted the histogram. What does this tell me about the distribution other than it is positively skewed? I've also log transformed the data and plotted the resulting histogram. Both are attached.
    Attached Images  

  4. #4
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Data distribution

    hi,
    for the histogram of untransformed weights you might want to chose smaller bins - I would guess that you will have less really small values and some more that are small but not that small.( I suspect this due to the shape of the log transformed data). That would hint towards a skewed distribution like the lognormal . Does your system have something like an "identification of the distribution" option?

    Regards

  5. #5
    Points: 32, Level: 1
    Level completed: 64%, Points required for next Level: 18

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Data distribution

    Hi,
    I have grouped the data into smaller bins. Is there anything that can tell me for sure what the distribution would be, rather than observing the graph? I have tried the option in minitab of 'individual distribution identification', but none of the distributions return a significant result. Is there a way I could try this test in r that you know of?

    Thanks
    Attached Images  

  6. #6
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Data distribution

    Why do you want to know the distribution, since most tests put the distribution assumptions on the error terms.


    What is being weighed? Is there a mechanism that makes weights right around 25? What is the data generating mechanism, since weights are bounded by a near 10 value?
    Stop cowardice, ban guns!

  7. #7
    Points: 32, Level: 1
    Level completed: 64%, Points required for next Level: 18

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Data distribution

    The weights are just from a random survey so there is no reasoning behind the weights.

    I have just this single column of results and was looking at various things I could investigate about the data. Besides descriptive statistics is there anything else I should do?

    Thank you

  8. #8
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Data distribution

    Is this for a class or work? A boxplot would also help convey the skewness.
    Stop cowardice, ban guns!

  9. #9
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Data distribution

    Quote Originally Posted by 101tai101 View Post
    Hi,
    I have grouped the data into smaller bins.
    hi,
    the new graph looks as if the bins were larger, originally a binwidth of 20, now 30. It does not impact the distribution identification, but it would be interesting to see how the histogram would look like with bins of width 10 or 5.

    regards

  10. #10
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Data distribution

    I would disagree, "The weights are just from a random survey so there is no reasoning behind the weights", that there is a reasoning behind them. So your "random" sample is a cross-sectional realization of a super-population that had to have some physical limitation behind why the values have a particular under-lying distribution. If they are weights of living organisms, age, diet, living environments, etc. all dictate the underlying distribution. That is why certain phenomena can be modeled and predicted, because there is an underlying data generating function. I can predict a persons weight given age, sex, ancestry, etc. since it is a mix of nurture and nature plus processes. Usually when you see a threshold or bound you should always wonder why. Something dictates it.
    Stop cowardice, ban guns!

  11. #11
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Data distribution

    I think using QQ plots with different theoretical distributions is the best way to go, based on the literature I have seen.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  12. The Following User Says Thank You to noetsi For This Useful Post:

    hlsmith (08-08-2017)

  13. #12
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Data distribution


    hi,
    when running the distribution identification in Minitab we get all the Q-Q plots as well. The p-values in the table as shown above, are, at least for me, a better indication of the fit .
    regards

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats