I'm currently working on a computer science project that can tell a user how good a particular phrase for a given product is. The particular product is the same for now, lets assume it's t-shirts for this post.

I have successfully gathered and or calculated the following statistics relevant to each phrase:

Total Number of M = total number of shirts

Total Number of All Products

P(M) = The number of shirts for a given phrase / total products for a given phrase

P(S|M) = Probability of a sale for a shirt given a phrase

P(S) = Probability of a sale for all products given a particular phrase

M Sales Rank for a phrase= a whole number between 0 and the total number of all products not inclusive of shirts, lower the better. The Sales rank is calculated by averaging the sales rank of up to 50 shirts for each phrase.

Total Sales Ranks = same as the previous, but is the average of 50 random products which may contain shirts.


What I am trying to do is merge the aforementioned statistics into a whole number between 1 and 10 to essentially rank the phrase by product for a user. I come from a computer science background so I can understand basic statistics and program.

An idea that comes to mind would be taking all the M Sales ranks and calculate the percentile of each one versus the whole. Then subtract the percentile calculated from 1 because the lower ones are better. I would then multiply this percentile * P(M) * P(S|M) * P(S), then calculate the percentile of each of these versus the whole and convert it to a whole number through (ceil(percentile * 10)).

Excuse my lack of statistics definitions took basic stats ages ago, but any help would be most appreciated.