I am trying to fit a distribution to some data that consist of measurements of dollar amounts. The range is basically 0 to 300,000 (this range encompasses more than 99% of all measurements), although there are measurements that exceed this. The summary stats for the data look like this:
Summary Stats:
Length: 32015
Missing Count: 0
Mean: 18002.581787
Minimum: 0.000000
1st Quartile: 137.880000
Median: 3146.500000
3rd Quartile: 14274.605000
Maximum: 6331830.630000
Type: Float64
The 99th percentile is $206,143 and a histogram of the data looks like this:
As you can see, the data are largely bunched up in the $0 - $10,000 range. I tried to fit a truncated normal distribution to the data, which looks like this:
But when I do a quantile-quantile plot to check how well the data fit this distribution, it looks like this:
I'm trying to figure out what kind of distribution to use to represent these data and could use some feedback! I was reading about Gamma and Pareto distributions but it seems that those won't work because the mode of my data is 0.....any ideas?
Summary Stats:
Length: 32015
Missing Count: 0
Mean: 18002.581787
Minimum: 0.000000
1st Quartile: 137.880000
Median: 3146.500000
3rd Quartile: 14274.605000
Maximum: 6331830.630000
Type: Float64
The 99th percentile is $206,143 and a histogram of the data looks like this:

As you can see, the data are largely bunched up in the $0 - $10,000 range. I tried to fit a truncated normal distribution to the data, which looks like this:

But when I do a quantile-quantile plot to check how well the data fit this distribution, it looks like this:

I'm trying to figure out what kind of distribution to use to represent these data and could use some feedback! I was reading about Gamma and Pareto distributions but it seems that those won't work because the mode of my data is 0.....any ideas?