If I have a single column of data; the weights of 100 individuals measured. How would I test to see what sort of distribution the weights follow. And how well they fit this distribution? Would the test results depend on how the data is grouped?

Also what other sort of analysis would be crucial to carry out with this single column of data?

Thank You

You should look at basic descriptive statistics (mean std var median Q1 Q3 min max). You should look at the data "moments". You can examine the shape of the distribution via a histogram. I believe the program " r" has a package that examines data distributions. Weights are usually right skew and normalized via a log transformation.

I fitted the histogram. What does this tell me about the distribution other than it is positively skewed? I've also log transformed the data and plotted the resulting histogram. Both are attached.

hi,
for the histogram of untransformed weights you might want to chose smaller bins - I would guess that you will have less really small values and some more that are small but not that small.( I suspect this due to the shape of the log transformed data). That would hint towards a skewed distribution like the lognormal . Does your system have something like an "identification of the distribution" option?

Regards

Hi,
I have grouped the data into smaller bins. Is there anything that can tell me for sure what the distribution would be, rather than observing the graph? I have tried the option in minitab of 'individual distribution identification', but none of the distributions return a significant result. Is there a way I could try this test in r that you know of?

Thanks

Why do you want to know the distribution, since most tests put the distribution assumptions on the error terms.

What is being weighed? Is there a mechanism that makes weights right around 25? What is the data generating mechanism, since weights are bounded by a near 10 value?

The weights are just from a random survey so there is no reasoning behind the weights.

I have just this single column of results and was looking at various things I could investigate about the data. Besides descriptive statistics is there anything else I should do?

Thank you

Is this for a class or work? A boxplot would also help convey the skewness.

Originally Posted by 101tai101
Hi,
I have grouped the data into smaller bins.
hi,
the new graph looks as if the bins were larger, originally a binwidth of 20, now 30. It does not impact the distribution identification, but it would be interesting to see how the histogram would look like with bins of width 10 or 5.

regards

I would disagree, "The weights are just from a random survey so there is no reasoning behind the weights", that there is a reasoning behind them. So your "random" sample is a cross-sectional realization of a super-population that had to have some physical limitation behind why the values have a particular under-lying distribution. If they are weights of living organisms, age, diet, living environments, etc. all dictate the underlying distribution. That is why certain phenomena can be modeled and predicted, because there is an underlying data generating function. I can predict a persons weight given age, sex, ancestry, etc. since it is a mix of nurture and nature plus processes. Usually when you see a threshold or bound you should always wonder why. Something dictates it.

I think using QQ plots with different theoretical distributions is the best way to go, based on the literature I have seen.

hi,
when running the distribution identification in Minitab we get all the Q-Q plots as well. The p-values in the table as shown above, are, at least for me, a better indication of the fit .
regards

