# Thread: Normal Distribution & possible Standard Deviation error

1. ## Normal Distribution & possible Standard Deviation error

I have collected final sales data for over 12,000 sales of a single particular item. The prices range from \$0.01 to a little over \$4.00. I calculate a mean of \$0.60. My problem is my standard deviation is \$0.58. That doesn't make sense to me. If I move 2sd to the left I am at a negative number and none of my pricing data is negative (obviously something sells for a positive value). What could I be doing wrong? I have calculated my mean and stdev in both excel and access and get the same numbers. If I am not doing anything wrong then can I normalize the data in some way?? Help!!

PS I removed several upper level outliers and this only reduced my stdev by about \$0.10. Still 2sd from the mean is a negative.

2. ## Re: Normal Distribution & possible Standard Deviation error

What are you trying to do?

3. ## Re: Normal Distribution & possible Standard Deviation error

The price data is actually auction data. What I would like to do is predict the probability a particular closing price (based on historical data). It just seems something is wrong if 2 stdev to the left of the mean is negative when none of my data is negative. Can this be?

Min \$0.01
Max \$4.70
Mean \$0.60
Stdev \$0.57

4. ## Re: Normal Distribution & possible Standard Deviation error

Of course it can be. Your data is an example of it happening. The problem is that you're trying to use a normal distribution when it doesn't make sense for this data. The normal distribution takes values on the entire real line (your data doesn't) and the normal distribution is symmetric (your summaries tell me that your data is skewed) so the normal is a bad distribution to use for modeling.

May I ask if you have 12,000 why not just use the empirical distribution to calculate your probabilities?

5. ## Re: Normal Distribution & possible Standard Deviation error

I thought that emperical distribution was a normal distribution? I thought that was the bell shapped curve no?

How can I do this and/or what is the best wat to model this data?

6. ## Re: Normal Distribution & possible Standard Deviation error

Originally Posted by derreckn
I thought that emperical distribution was a normal distribution? I thought that was the bell shapped curve no?

How can I do this and/or what is the best wat to model this data?
No, your distribution is not normal - rather it's approximately exponential.

Anyway, the quickest way to compute the empirical distribution is just find your ranking routine and rank the data points (from 1,,,n) the smallest data point is a 1 and so on. Then divide your ranks by your sample size n=12128.

7. ## Re: Normal Distribution & possible Standard Deviation error

I think their issue is that they've probably never been exposed to an empirical distribution before in a formal setting (although it's quite intuitive). But they probably have heard of the 'empirical rule' which does have to do with the normal distribution. But that's not related at all to what I was talking about.

8. ## Re: Normal Distribution & possible Standard Deviation error

Is there a way I can determine the probability of a certain range of points (in this situation price range). For instance in a normal distribution I know that roughly 68% of the data falls within 1 stdev of the mean. Is there anything to be said about an exponetial distribution i.e. for every unit increase on the x axis means an increase of "x%" increase/decrease in probability? It seems most of the data should fall near the mean with extremes being on the ends (min & max).

Also excel has an exponential distribution function =EXPONDIST(X,Lambda,Cumlative) I'm not sure how I can use this. Does it return the probability of a particular value? If I summed a range of probabilities then would that give me the probability if that range?

9. ## Re: Normal Distribution & possible Standard Deviation error

There are ways to do what you want to do. The real question is if you want to do this for some theoretical distribution or if you want to do it based on some data you actually observed. If you have data you observed and you want to assume that it came from an exponential distribution and you want to be able to say something similar to what you were then it's a different story.

10. ## Re: Normal Distribution & possible Standard Deviation error

If you take the log of 30 randomly selected data samples or otherwise transform the samples do they better approximate a normal distribution?

11. ## Re: Normal Distribution & possible Standard Deviation error

The 12,000+ observations were collected over the past few months and are real observations. What I am trying to do is provide my members with an price range in which to place bids where they will be most likely to win.

Originally one concept was to use the price range between the fist stdev and second stdev (assuming a normal distribution). Knowing that an auction (based on past observations) has a point where the bidding will drop then I figured this would give my users a good chance at winning an auction that has made it to this price range.

So is there something like a 68-95-99.7 rule for an exponential distribution? What I did was counted the frequency of all the points of my data set. It shows that 43.3% of all observations fall between (\$0.03-\$0.34). Then it drops off and 34% fall between (\$0.35-\$1.00).

12. ## Re: Normal Distribution & possible Standard Deviation error

Originally Posted by Outlier
If you take the log of 30 randomly selected data samples or otherwise transform the samples do they better approximate a normal distribution?
I'll give it a shot. If it does fit a normal distribution then would the 68-95-99.7 apply?

13. ## Re: Normal Distribution & possible Standard Deviation error

Originally Posted by derreckn
If it does fit a normal distribution then would the 68-95-99.7 apply?
I assume so but with this transformation stuff I am at one of the limits of my stats knowledge.

14. ## Re: Normal Distribution & possible Standard Deviation error

Instead what I have done was look at the frequency for each price of the dataset. Then I divided the frequency by the total number of observations to determine the probability of each sale price. Then I used that to determine the price range probabilities.

Based on the above my data reflects:

43.46% of all data falls within the range of \$0.03 - \$0.34.
The most frequent value (mode) is \$0.17
After that 34.48% of the data falls with a range of \$0.35 - \$1.00

15. ## Re: Normal Distribution & possible Standard Deviation error

Hey guess what you're using the empirical distribution I suggested before!

Page 1 of 2 1 2 Last

 Tweet