Chi-square test followed by testing with subgroups

#1
Just checking that I'm thinking about this right...

I have the number of units sold for 3 years by season:

Spring 3000
Summer 3500
Autumn 4000
Winter 5500

If I run a chi-square test of independence with:
alpha = 0.05, and expected value for each cell = 4000,
I would find the number of units sold to not be independent of season.

Now I want to look at a specific location A where the units were sold. Which is the correct approach? Or does it depend on the question asked?

a) For this particular location, are the sales independent of season?
A chi-square test of independence on this subset of data. For example:

Spring 275
Summer 325
Autumn 425
Winter 575
(expected value for each cell = 400)

OR

b) Is this location's season-dependence of sales significantly different from those of other locations?

Chi-square test with a contingency table:
Row1: location A, Spring 275, Summer 325, Autumn 425, Winter 575
Row2: location not(A), Spring 2725, Summer 3175, Autumn 3575, Winter 4925

But then I'd want to test this for several locations and so repeat a or b for each.

Am I missing something? I know that subtle things lurk in the land of stats.
 
Last edited:

rogojel

TS Contributor
#2
hi,
I would consider a two way chi squared, e.g. create a table with the rows being the locations and the columns being the seasons. This way you can analyse both effects in ome test and get an idea of possible interactions as well, like locations doing differently in different seasons.

Regards
rogojel
 
#3
Thanks for your response. This answers part of my question:
The two-way test would show that the two types of location have different seasonal dependencies. So the result would be significant, even if one of the locations has no seasonal dependency on its own.

I guess what am after is knowing:
1) which locations have seasonal dependency and which ones don't
2) for the ones that have seasonal dependency, which ones are similar to each other

There are many different types of locations. A, B, C, D, E, ... I'm reluctant to do repeated two-way tests because that would be like sampling too many times out of the same bin. Any suggestions?
 
Last edited:
#4
Aha! I found my answer here: http://www.biostat.umn.edu/~dipankar/bmtry711.11/lecture_10.pdf
Lecture 10: Partitioning Chi Squares and Residual Analysis by Dipankar Bandyopadhyay

I'm quoting from one of the slides:
"Motivation for this:
• If you reject the H0 and conclude that X and Y are dependent, the next question
could be ‘Are there individual comparisons more significant than others?’.
• Partitioning (or breaking a general I × J contingency table into smaller tables) may
show the association is largely dependent on certain categories or groupings of
categories."

He then states the rules for partitioning a contingency table and provides an example.