What are you trying to do?
I have collected final sales data for over 12,000 sales of a single particular item. The prices range from $0.01 to a little over $4.00. I calculate a mean of $0.60. My problem is my standard deviation is $0.58. That doesn't make sense to me. If I move 2sd to the left I am at a negative number and none of my pricing data is negative (obviously something sells for a positive value). What could I be doing wrong? I have calculated my mean and stdev in both excel and access and get the same numbers. If I am not doing anything wrong then can I normalize the data in some way?? Help!!
PS I removed several upper level outliers and this only reduced my stdev by about $0.10. Still 2sd from the mean is a negative.
What are you trying to do?
The price data is actually auction data. What I would like to do is predict the probability a particular closing price (based on historical data). It just seems something is wrong if 2 stdev to the left of the mean is negative when none of my data is negative. Can this be?
Min $0.01
Max $4.70
Mean $0.60
Stdev $0.57
Of course it can be. Your data is an example of it happening. The problem is that you're trying to use a normal distribution when it doesn't make sense for this data. The normal distribution takes values on the entire real line (your data doesn't) and the normal distribution is symmetric (your summaries tell me that your data is skewed) so the normal is a bad distribution to use for modeling.
May I ask if you have 12,000 why not just use the empirical distribution to calculate your probabilities?
I thought that emperical distribution was a normal distribution? I thought that was the bell shapped curve no?
How can I do this and/or what is the best wat to model this data?
Last edited by derreckn; 02-20-2011 at 03:10 PM. Reason: added data attachment
No, your distribution is not normal - rather it's approximately exponential.
Anyway, the quickest way to compute the empirical distribution is just find your ranking routine and rank the data points (from 1,,,n) the smallest data point is a 1 and so on. Then divide your ranks by your sample size n=12128.
I think their issue is that they've probably never been exposed to an empirical distribution before in a formal setting (although it's quite intuitive). But they probably have heard of the 'empirical rule' which does have to do with the normal distribution. But that's not related at all to what I was talking about.
Is there a way I can determine the probability of a certain range of points (in this situation price range). For instance in a normal distribution I know that roughly 68% of the data falls within 1 stdev of the mean. Is there anything to be said about an exponetial distribution i.e. for every unit increase on the x axis means an increase of "x%" increase/decrease in probability? It seems most of the data should fall near the mean with extremes being on the ends (min & max).
Also excel has an exponential distribution function =EXPONDIST(X,Lambda,Cumlative) I'm not sure how I can use this. Does it return the probability of a particular value? If I summed a range of probabilities then would that give me the probability if that range?
There are ways to do what you want to do. The real question is if you want to do this for some theoretical distribution or if you want to do it based on some data you actually observed. If you have data you observed and you want to assume that it came from an exponential distribution and you want to be able to say something similar to what you were then it's a different story.
If you take the log of 30 randomly selected data samples or otherwise transform the samples do they better approximate a normal distribution?
The 12,000+ observations were collected over the past few months and are real observations. What I am trying to do is provide my members with an price range in which to place bids where they will be most likely to win.
Originally one concept was to use the price range between the fist stdev and second stdev (assuming a normal distribution). Knowing that an auction (based on past observations) has a point where the bidding will drop then I figured this would give my users a good chance at winning an auction that has made it to this price range.
So is there something like a 68-95-99.7 rule for an exponential distribution? What I did was counted the frequency of all the points of my data set. It shows that 43.3% of all observations fall between ($0.03-$0.34). Then it drops off and 34% fall between ($0.35-$1.00).
Instead what I have done was look at the frequency for each price of the dataset. Then I divided the frequency by the total number of observations to determine the probability of each sale price. Then I used that to determine the price range probabilities.
Based on the above my data reflects:
43.46% of all data falls within the range of $0.03 - $0.34.
The most frequent value (mode) is $0.17
After that 34.48% of the data falls with a range of $0.35 - $1.00
I'm not sure if this is the best way to go about this or not..
Hey guess what you're using the empirical distribution I suggested before!
Tweet |