1. ## Normal distribution

Hi guys,

I'm a newbie here. My question is the following.

Are the following set of values normally distributed?
26, 33, 65, 28, 34, 55, 25, 44, 50, 36, 26, 37, 43, 62, 35, 38, 45, 32, 28, 34

The above values are from the below link
https://www.mathsisfun.com/data/stan...tribution.html

They go on to compute the mean and standard deviation and the corresponding z scores assuming they are normally distributed.

However when i plotted the values on a histogram using excel, i get the following chart(Attached image) which shows a positive skewness and we know that a normally distributed set of observations has no skewness at all i.e its perfectly symmetrical.

Do we need to transform the data-set into normally distributed values before calculating the mean , standard deviation and the z scores ? ...since in real world situations , data-sets may not be normally distributed , then how do we go ahead to perform statistical tests on them.

2. ## Re: Normal distribution

Way to question the establishment!

Well I know what you mean, the histogram is some what skewed. I ran normality tests on the data, and these data pass 3 of the standard tests and the 4th was very close to being passed. Ideally you want a pretty symmetric shape, but that is never really the case. You could have some fun and try to transform these data if you wanted. The issue, as you mentioned can come from placing confidence intervals or interpreting dispersion measures. So you say 68% data within +/- 1 standard deviation, 95% within +/- 2 SD, etc. These data are probably not egregious, you can also look at the QQplot, which also shows departures from normality.

3. ## Re: Normal distribution

I agree these data are not terribly skewed (although they have a kind of odd shape, which is uniform for much of the range).

But if they converted ALL the values to z scores were they actually interested in the percentiles and such (68% etc...)? Or was the purpose to standardize the distribution to a mean of 0 and SD = 1? The standardization will always work if you convert a set of data to z scores. It does NOT makes them any more normally distributed. But it sure will give them a mean 0 and SD 1, which can be useful for comparison to other datasets that may be in different units, say, even if neither is terribly normal.

OK I just clicked the link (should have done that first) and they actually ARE using data like these (but a different example) for individual decisions, like failing a student who gets below 1 SD from the mean. I don't use that rule and would be uncomfortable with it unless the data was fairly normal. The article doesn't say why the first dataset is being standardized -- they just do it.

4. ## Re: Normal distribution

Originally Posted by hlsmith
Way to question the establishment!

Well I know what you mean, the histogram is some what skewed. I ran normality tests on the data, and these data pass 3 of the standard tests and the 4th was very close to being passed. Ideally you want a pretty symmetric shape, but that is never really the case. You could have some fun and try to transform these data if you wanted. The issue, as you mentioned can come from placing confidence intervals or interpreting dispersion measures. So you say 68% data within +/- 1 standard deviation, 95% within +/- 2 SD, etc. These data are probably not egregious, you can also look at the QQplot, which also shows departures from normality.
For my information, can you tell me which tests you used to determine normality?

5. ## Re: Normal distribution

There are formal test for skew. Plus it probably would help to run a qq plot

None of the formal test for normality are very good. They all have power issues. A qq plot is the best I have found to assess this.

6. ## Re: Normal distribution

Originally Posted by noetsi
There are formal test for skew. Plus it probably would help to run a qq plot

None of the formal test for normality are very good. They all have power issues. A qq plot is the best I have found to assess this.
ah yes, the fun of the theoretical gaussian distribution...

7. ## Re: Normal distribution

The fun of coming up with a test you get credit for that does not in fact work, but gets used all the time in both statistics and the work place. Over the years, as a data analyst rather than a statistician, I have grown increasingly concerned just how often well known tests have serious flaws - but these flaws are commonly only known to the statistical community (which does not include many who use statistics).

The Durbin Watson test (in its most common form) is another example of this. Its used all the time and it has serious issues...which I suspect many who use it have never heard of.

8. ## Re: Normal distribution

This issue is compounded further by the fact that widely used statistical programs use generalized estimators due to computing efficiency. You sacrifice some measure of accuracy but save computing power.

But yeah, its a well known issue within the field of statistics. The limitations of quite a few of the current common modeling procedures.

The truth is, though, modern statistics has become fairly sophisticated with dealing with those issues. The concern is that there is a lag between what is going on in the world of academic statistics and what is going on in industry. In general, my estimation is that there is about a 20 year gap between the stats department of a R1 university and other departments (or even greater in some cases). This gap is even greater between academic statistics and widespread industry. My guess is probably about 50 years for the vast majority of companies using statistics.

The other issue is that due to this gap, there is a lot of the underlying "why we do it" thats lost on its way to becoming "how we do it" on the way from academia to industry.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts