I'm now working on the recommendation system to recommend football match lottery to the users. The users may choose a match to bet arbitrary amount of money, e.g. $2, $4, or even $20,000. I want to compare if two recommendation algorithms are different (or if one outperforms another), in terms of the averaged bet money per user. The problem is that the distribution of the bet money looks like power low distribution (long tailed). There are nearly a half of users who don't buy which results in $0 and another 30% users who bet the minimal amount ($2). The larger the bet money, the less users are there.

A direct solution is to use t-test according to the central limit theory that the mean of any arbitrary distribution tends to normal if the number of samples is large. However, I found that in the wiki page of central limit theory:

The central limit theorem states that the sum of a number of independent and identically distributed random variables with finite variances will tend to a normal distribution as the number of variables grows. A generalization due to Gnedenko and Kolmogorov states that the sum of a number of random variables with a power-law tail (Paretian tail) distributions decreasing as |

*x*|−

*α*− 1 where 0 <

*α*< 2 (and therefore having infinite variance) will tend to a stable distribution

*f*(

*x*;

*α*,0,

*c*,0) as the number of summands grows.[9][10] If

*α*> 2 then the sum converges to a stable distribution with stability parameter equal to 2, i.e. a Gaussian distribution.

It says if the distribution is power-law, the averaged sample mean tends to stable distribution instead of normal distribution. I found a paper about z-test on stable distribution:

*z Test for the significance of the mean of a stable probability distribution with 1<α≤2*

However, when I fit my data to Pareto distribution using MLE, the value of α is less than 1.

After that, I turned to use non parametric test, the Mann-Whitney-U test. It seems work but with large p values (>0.6). The reason I guess is that there are lots of equal values in the compared methods, e.g. lots of users buy $2 and $4, which results in lots of equal rank indices for both methods.

I want to know if there are any other proper statistical tests for this problem. Please help me if any of you have any idea.