# Is data aggregation proper for Chi-Squared's?

#### tgooberbutt

##### New Member
I'm looking into the Attraction or Asymmetrically Dominated Decoy Effect (ADE). To test the effect, most experiments compare a choice share between a control and a test condition. Participants are randomly assigned to either the control or test condition. Within each condition, each participant answers several questions...they choose products from let's say 5 choice sets: orange juice, cars, TV....all very different products. The choice sets are always presented in the same order.

Let's say a study has 20 participants. 10 get assigned to the decoy condition and 10 to the control condition. Each of the control condition participants makes 5 choices, same with the decoy condition. The study then reports 10*5=50 observations in the control, and 50 in the decoy condition. The observations are then subjected to a Chi-squared analysis that has a total cell count of 100 (50 ctrl, 50 decoy).

Does this method of aggregation violate independence? If so, what are the negative repercussions of combining the data in this way?

There is psych research that suggests there may be individual, cognitive differences in how people react to and choose in decoy scenarios, and there is also certainly literature on how the decision making process differs depending on product characteristics as well...so choosing orange juices may trigger different cognitive processes than choose a car.

Do these individual and product differences affect independence of observation when aggregating data in the manner described above? Since participants were randomly assigned to either control or test conditions, does that 'independence' carry through and allow me to say that the analysis had independent observations? My inclination is that having each participant counted as five separate observations (once for each choice they made) is wrong....because by that logic, can't I just have an experiment with 2 participants - randomly assign one to the control condition, and one to the test condition, and have each make 50 choices? By the same method, I could report an n=100, no? But I don't know if this extreme reasoning is correct, or how to describe it in technical/statistical terms.

Thanks - I very stuck at this point

Last edited:

#### Karabiner

##### TS Contributor
But I don't know if this extreme reasoning is correct, or how to describe it in technical/statistical terms.
It isn't extreme. It is a correct reasoning. One can analyse such
data using a multivel model (complicated), or some kind of
repeated measures-analysis, or aggregate each individual's
5 responses in an appropriate way and use the aggregated
measure as dependent variable.

With kind regards

K.

#### tgooberbutt

##### New Member
Hi Karabiner,

Thank you for your time in looking at this. Can you point me in the right direction on what type of aggregation is "appropriate?" Perhaps some stat terms/streams of literature that I can look up?

The aggregation I see in the attraction effect literature just counts each of the participant's five choices as five separate observation/count in a Chi-Squared. The design and intent of these studies was not meant to be repeated measures or to detect within-in participant differences with the five choices. Thanks in advance.

#### Karabiner

##### TS Contributor
Thank you for your time in looking at this. Can you point me in the right direction on what type of aggregation is "appropriate?"
That depends on the nature of the measurements
and on the research question.

E.g.. if one has 5 yes/no responses from each partcipant,
then it could be useful just to count the number of "yes"-
responses and use them in the analysis. I am not sure
what exactely your response variable is/response variables
are here, but maybe you can aggregate the 5 responses in
such a way that it suits your purposes.

Alternatively, if response sets are different from each other
and all 5 sets are the same for all participants, it might be
feasable to perform 5 analyses, but I am of course not sure
about that because I am not familiar with this area of research.

With kind regards

K.

With kind regards

K.

#### tgooberbutt

##### New Member
Thanks Karabiner - I think I might have found a direction...

The response sets are all binary, and I'm looking at the proportions between yes and no. The literature I'm working in has straight addition of yeses and nos for all responses for all participants. I did more digging and it looks like the aggregation does indeed violate iid as multiple measures (responses) from a participant is inherently not independent. The advantage to aggregating is increased power and 'n,' but the downside, of course is a higher risk of Type I errors, and of course that it's a violation of iid. My conclusion is that the data needs to be clustered by participant...It's not optimal, but the observations can be aggregated, but aggregation would require an adjustment to the Chi2 stat.

I found the following articles on this subject useful:

Garson, G.I. & Moser, E.B (1995). Aggregation and the Pearson Chi-Square statistic for Homogeneous Proportions and Distributions in Ecology, Ecology, 76(7), 2258-2269.
Donald, A. & Donner, A. (1987). Adjustments to the Mantel-Haenszel Chi Square Statistic and odds ratio variance estimators when the data are clustered, Statistics in Medicine, 6, 491-499.