PDA

View Full Version : skewness issue



cmb
02-11-2006, 10:47 PM
I'm trying to find the skewness for this distribution using SPSS:

Bin..........Freq
0-10........78
10-20......2589
20-30......922
30-40......11


So, all I know is how many values are in each bin (not what the values are themselves). Does that mean I have to type in the midpoint value of the 2nd bin 2589 times in order for SPSS to understand what I'm looking for? Anyone know of a shorter way? Thanks.

JohnM
02-11-2006, 11:14 PM
I don't think it can be done with "grouped" data in SPSS, but here's a way that will be pretty close (you could do this in Excel):

skewness = m3 / [ m2 * sqrt(m2) ]

where m2 and m3 are "moments" of a distribution

m2 = summation of [ (x - mean)^2 / n ]

m3 = summation of [ (x - mean)^3 / n ]

the x's would be the bin mid-points (5,15,25,35), the mean would be the mean of the grouped data (summation of (x * p(x)), and n would be the sum of the frequencies

for the mean, you should get 17.4 and n = 3600

Hope this helps.

cmb
02-14-2006, 06:35 PM
Hi John, thanks for the reply. I tried that method and it worked well (I had never even heard of moments before). I have another distribution that has 12 bins. When I calculated the skewness for it using that method, I got a crazy large number that doesn't seem correct. For 12 bins, would I use the same method as you outlined, or do the 'moment' equations change?

JohnM
02-14-2006, 07:58 PM
It depends on how badly skewed the data is - I don't think the skewness index has an upper or lower limit.

Go ahead and post the 12-bin data set and I'll take a look. If you got a very high number then it should be obvious from looking at the frequency distribution.

Moments come from mechanics and physocal masses or bodies - the first moment is associated with the center of gravity, which is the mean of a probability distribution. The second moment is associated with gyration, which is analogous to variation. Third moment is associated with skewness, and the fourth moment is associated with kurtosis.

cmb
02-14-2006, 08:24 PM
Ok, I'm going to try it again by hand to see what kind of answer I get. I might've screwed up with the negatives.

Here's the distribution:

Midpt. Freq
2.75.......216
2.25.......178
1.75.......383
1.25.......542
0.75.......682
0.25.......646
-0.25.....508
-0.75.....226
-1.25.....116
-1.75......39
-2.25......17
-2.75......47

JohnM
02-14-2006, 09:52 PM
You should get -16.79.

cmb
02-16-2006, 01:59 PM
Yep, that's what I got when I did it manually. However, when I typed in all the values into Excel and used the skew function, it gave a much much smaller negative number.

I've found this equation for skewness too that would be interesting to try and compare:

Skewness = M3 / M2^3/2

where Mn = ((x - {x})^n)

where {x} is the expected number.

Would the 'expected number' be the same as the mean?

JohnM
02-16-2006, 02:03 PM
Yes, the "expected number" is the mean.

When you say you typed in all the numbers - the individual data points or the bin frequencies?

cmb
02-16-2006, 02:56 PM
I typed in all the individual data points (i.e. 216 cells with "2.75" and 178 cells with "2.25", etc.) The skewness it reported was -0.26.

JohnM
02-16-2006, 03:30 PM
Well, actually what you did was put in the midpoints of the bins and "weight" them. Excel uses a completely different formula for skewness, so I'm not surprised by the difference.