# Thread: Struggling with finding a distribution...

1. ## Struggling with finding a distribution...

Hi All,

When I use a histogram to view my data, I get the following... (see age.png)

I'm trying to figure out the distribution...

From visual inspection, I assumed it would be lognormal. The data is left balanced and there's no "peaks" otherwise.

However, in R I try the following...

> s <- sum( (log(age+.1) - u )^2 ) / length(age)
> u <- sum(log(age+.1)) / length(age)
> s <- sum( (log(age+.1) - u )^2 ) / length(age)
> my_lnorm<-rlnorm(length(age), u, s)
> qqplot( my_lnorm, age )
(I add by .1 because some of the ages == 0. log(0) returns -Inf)

See qqplot.png for the result.

So according to the qq-plot, the data clearly does not match...

Is there a better way to determine distributions?

2. It looks lognormal to me. To test that, take the logarithm, then do a test of normality for the transformed data.

3. Originally Posted by squareandrare
It looks lognormal to me. To test that, take the logarithm, then do a test of normality for the transformed data.
Thanks for the response.

I have to admit, I don't understand the rationale behind transforming into normal...

but even in doing so, the qq-plot doesn't appear to fit the data...

I try the following in R:
> u <- sum(log(age+.1)) / length(age)
> s <- sum( (log(age+.1) - u )^2 ) / length(age)
> age_trans = ( log(age+.1) - u ) / s
> qqnorm(age_trans)
> qqline(age_trans)

And I get the results in "age_transformed.png".

It still doesn't seem to fit the distribution all that well... What should I interpret from this?

4. Originally Posted by nami1234
What should I interpret from this?
That it isn't lognormal.

First, I would try the exponential distribution. Then if that doesn't look good, maybe try the Weibull. You could also try a Box-Cox transformation. There should be some documentation online about how to calculate the maximum likelihood estimates for the parameters Exponential, Weibull, and Box-Cox.

The reality is that your data might not fit any well-known distributions. The distribution is what it is.

5. Originally Posted by nami1234

...It still doesn't seem to fit the distribution all that well...

You might want to consider fitting a generalized lambda distribution (GLD) to your data. See, for example (there are a number of links),

http://www.jstatsoft.org/v21/i09/paper

http://www.algorithmics.com/EN/media...3-3_lambda.pdf

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts