skewness issue

cmb

New Member
#1
I'm trying to find the skewness for this distribution using SPSS:

Bin..........Freq
0-10........78
10-20......2589
20-30......922
30-40......11


So, all I know is how many values are in each bin (not what the values are themselves). Does that mean I have to type in the midpoint value of the 2nd bin 2589 times in order for SPSS to understand what I'm looking for? Anyone know of a shorter way? Thanks.
 

JohnM

TS Contributor
#2
I don't think it can be done with "grouped" data in SPSS, but here's a way that will be pretty close (you could do this in Excel):

skewness = m3 / [ m2 * sqrt(m2) ]

where m2 and m3 are "moments" of a distribution

m2 = summation of [ (x - mean)^2 / n ]

m3 = summation of [ (x - mean)^3 / n ]

the x's would be the bin mid-points (5,15,25,35), the mean would be the mean of the grouped data (summation of (x * p(x)), and n would be the sum of the frequencies

for the mean, you should get 17.4 and n = 3600

Hope this helps.
 

cmb

New Member
#3
Hi John, thanks for the reply. I tried that method and it worked well (I had never even heard of moments before). I have another distribution that has 12 bins. When I calculated the skewness for it using that method, I got a crazy large number that doesn't seem correct. For 12 bins, would I use the same method as you outlined, or do the 'moment' equations change?
 

JohnM

TS Contributor
#4
It depends on how badly skewed the data is - I don't think the skewness index has an upper or lower limit.

Go ahead and post the 12-bin data set and I'll take a look. If you got a very high number then it should be obvious from looking at the frequency distribution.

Moments come from mechanics and physocal masses or bodies - the first moment is associated with the center of gravity, which is the mean of a probability distribution. The second moment is associated with gyration, which is analogous to variation. Third moment is associated with skewness, and the fourth moment is associated with kurtosis.
 

cmb

New Member
#5
Ok, I'm going to try it again by hand to see what kind of answer I get. I might've screwed up with the negatives.

Here's the distribution:

Midpt. Freq
2.75.......216
2.25.......178
1.75.......383
1.25.......542
0.75.......682
0.25.......646
-0.25.....508
-0.75.....226
-1.25.....116
-1.75......39
-2.25......17
-2.75......47
 

cmb

New Member
#7
Yep, that's what I got when I did it manually. However, when I typed in all the values into Excel and used the skew function, it gave a much much smaller negative number.

I've found this equation for skewness too that would be interesting to try and compare:

Skewness = M3 / M2^3/2

where Mn = ((x - {x})^n)

where {x} is the expected number.

Would the 'expected number' be the same as the mean?
 

JohnM

TS Contributor
#8
Yes, the "expected number" is the mean.

When you say you typed in all the numbers - the individual data points or the bin frequencies?
 

cmb

New Member
#9
I typed in all the individual data points (i.e. 216 cells with "2.75" and 178 cells with "2.25", etc.) The skewness it reported was -0.26.
 

JohnM

TS Contributor
#10
Well, actually what you did was put in the midpoints of the bins and "weight" them. Excel uses a completely different formula for skewness, so I'm not surprised by the difference.