PDA

View Full Version : Statistics question for algorithm development



mcmurphy510
08-02-2010, 05:33 PM
Hi,

This isn't a homework help question. I'm actually trying to figure out some formulas for algorithm programming. I'm trying to calculate probabilities of votes in a 1 to 10 rating system, and it's been years since I've used any advanced statistics. If there's a better place to post this, let me know.

To make things easier, I'm going to use a simplified example:

Let's say I have a 0 to 10 rating system (where zero is lowest and 10 is highest) where users are presented with different kinds of food and are then asked to rate the food based on how much they like it. Let's also say that over time these kinds of food have achieved the following ratings:

Ice Cream: 8
Mashed Potatoes: 5
Brussel Sprouts: 3

Let's also say that for whatever reason I don't have access to the underlying data that generated those numbers. So I don't know (nor can I calculate) the standard deviation or any other properties of these distributions. I only know the mean. So for all I know the score of 8 for Ice Cream could have been generated by just one vote of 8, or a thousand (or 500 10's and 500 7's). I really don't KNOW anything about it other than 8 is the average score.

So basically what I want to do is create a formula that will tell me the probability that a particular food will receive a particular vote based on what its current average is. In other words, in the case of Ice Cream above, what is the probability that the next vote will be a 10, 9, 8, etc.

Since I only have the mean, I'm making the following assumptions:
1. The mean and the mode are the same.
2. That foods that have scored 5 have normal distributions.
3. Low scores, with distributions skewed to the left, are a normal as possible and are skewed just enough to account for their mean. Same thing with high scores.

So my questions are:

Am I working with good assumptions, or are they flawed?
To find a distribution I need to assume a standard deviation. What's the best best way to do that? Or should I just assume that 0 to 10 spans x standard deviations and divide 10 by x?
Is there a formula where I can plug in a range of possible scores (0 to 10), a mean score (i.e 8 for Ice Cream), and a score to test against to determine what the probably that any given vote will be that score.

Hope I'm explaining this well enough.

Thanks for all your help.