Statistics for categorical data in linguistics use

Hi there! I'm really hoping that somebody is kind enough to help me sort this out.

I'm researching turn-taking in conversation, and some of the time the person uses a name at the turn-taking change, sometimes a question, sometimes (rarely!) its name and question, but most often there is no name or question to indicate a changeover. There are four different conversations that have between 3 and 5 participants.

The question is, are there significant differences in the frequencies of use of nothing, name, question and name & question when taking turns between the four groups?

The data looks like this

Discussion 1
Total Nothing Name Question
Person A to B 75 49 18 11
Person A to C 12 4 6 5
Person B to A 71 62 4 5
Person B to C 18 16 0 2
Person C to A 10 10 0 0
Person C to B 4 4 0 0

And so on for Discussions 2,3,4

From my understanding, Pearson's Chi square test would be the way to go. Does that make sense for this data and what I'm trying to do?

Any advice would be gratefully received!



Well-Known Member
Hi Sivodsi,

I'm not sure what is the goal of the research, what is the difference between the groups? :)

I also think the Chi-square test for goodness of fit is appropriate.
If I understand you correctly, you can run the Chi-square test for the following table:

Group Nothing Name Question
Thanks for responding, Obd,

The groups have different sets of people in them, with different communication styles, I want to get confirmation from this statistic that they are significantly different from each other (not that I mind if it finds they are all the same!).

Chi Square seems to be the way to go, but I have two questions:
1) a condition is that no cell must have an expected frequency lower than 5
But I have several cells where nobody contributes (in the above table 'Person C to A' asked no questions.

2) I can't see how the data can be entered. The table I posted above is from one entire Discussion, and I have three more similar to it (but some with different numbers of people). From the example Chi Square analyses I've seen in textbooks and websites, there always seems to be less data.

For example, you suggest

Group Nothing Name question

but each group has data for 3, 4, or 5 people. Where do I put all the data?

Really looking forward to any advice on this one!
Last edited:


Well-Known Member
Hi Sivodsi,

You wrote you want to compare the groups for "Nothing Name Question name & question "
So why do you want to have the data in the resolution of person?

If your example is G1
Can you group the data in your example to one row?

Group Nothing Name Question name & question
G1 190 145 28 0
Hi Obd,
Ah! Interesting suggestion!

I guess the issue with it is that the data for each person represent interactions - where a person asks a question or uses a name or just finishes a turn indicates a different kind of turn-taking transaction. Two groups would have vastly different patterns of interaction if it went like this:

G1: PersonA------10----------0
G1: PersonB------ 0----------10

G1: PersonA------5----------5
G1: PersonB------5----------5

Adding each group together loses those distinctions between groups and runs the risk of boiling everything down to mud.

However, I see that it could work and so will give it a shot when I have time later on - thank you so much for the idea.

In the meantime, if you can think of any alternatives that would allow group differences to be preserved, I would be very grateful to hear them!



Well-Known Member
Hi David,

If you do 3 dimensions table Group/person/interaction you will miss the point of the experiment, it will just say not all person are equal ...

I think I understand what you want, I suggested comparing the totals, you want something like comparing the variances
Or maybe comparing the maximum of each group and/or the minimum of each group.
Probably maximum is better as you don't want empty groups.

I will let you know if think of a better idea.

Hi there,
Thank you for your help, its much appreciated.

Okay, so after searching the internet and textbooks I've found that the way to run chi square test is by running it three times, comparing the counts that I got with their expected values in excel.

So, I had four groups, each with three categories: the total number of 'nothing', the total number of 'names', and the total number of 'questions'. In the first two cases, the p figure was tiny, and for the questions it was p = 0.0402, so still significant at 0.05. The null hypothesis is rejected and we can state that the groups were significantly different from each other.

Does this sound alright? Or is there a more elegant way of doing it?
Sure, I made a table like this

where 220 is the expected value if they were all the same

And used the excel function =CHISQ.TEST(observed range, expected range), which returns the minuscule figure of 7.95483E-14.

I did the same procedure with the other categories, 'name' and 'question'.
I could not find an example of how to do this in SPSS.
For example, I was trying to follow this but did something wrong and the numbers came out weird.
I thought that maybe this was because my data would not suit. Alternatively, (and it seems, much more likely) I set it up wrong and pressed the wrong buttons.

I've attached a zipped .sav file so you can see how I set it up.



Well-Known Member
Hi David,

I don't use the SPSS, but R.
I'm sure you can do it with SPSS, it actually the same idea like the regular goodness of fit with chi-test, the one you used, but since it is 2d table
the df is (rows-1)*(columns-1)
I can check how to do in R, (but only on Tuesday) if you want, or there are also plenty of online calculators like
PS, did you check the sample size before doing the experiment?
Hi obh,
Thanks, I used the website's calculator and got myself a p statistic - fantastic. I can now muck around with SPSS until I can duplicate the p value.

The sample size shouldn't be a problem as the expected frequencies are all above 5 (one of the assumptions chi square) - I assume that's what you mean by checking the sample size?
The Chi square statistic is 32.1024 and p value .000016
- its a bit of a non-result because it was quite obvious the nature of the turn-taking depends on the participants.
But its nice to have some statistics to support us!

I can replicate the site's results with the chi-square formula in excel, but my one concern is that I am unable to replicate the results from the online Chi-square in SPSS. Oh well!

Thanks a lot for your help, I really appreciate it.