I work in a UK university. Each year finalist students take part in a National Student Survey responding to 30 questions on aspects of their experience on a five point likert scale, ranging from strongly agree to strongly disagree, and the convention is to report the percentage of respondents who answer agree or strongly agree. We do not see the results by student but by subject. We can compare the percentage who agree with the sector by subject.

We report the results by subject, there are approx. 30, for each of the 30 questions. I have been asked to indicate which year on year changes are statistically significant. My approach was simply use chi-square tests – or more exactly the Fisher Exact test - for each subject for each question to compare if there was a statistically significant difference between the numbers agreeing in one year with those agreeing in the year before – so I have grouped any response that is not either strongly agree or agree as Not agree. I report a p value in the double sided Fisher Exact test of less than 0.05 as statistically significant. And that is my problem – I am told that my approach is not sound because I do not address the multiple testing problem. I get that the 0.05 means that there is a one in 20 chance of statistically significant results being incorrectly reported and as performing all of these tests – 30 for each of 30 questions – involves a lot of individual tests it is likely that this alone would introduce a lot of error.

What I need is advice on how I can adjust appropriately for multiple testing – applying bonferroni to maintain an overall p value across all the year on year changes for each subject for each question would mean dividing the p value by 600 which seems to me to be a huge over correction. Is there a legitimate approach to reducing the number of tests that need to be performed? I had a suggestion that I should perform chi-square/Fisher Exact for each subject comparing all answers across all questions for statistically significant changes between years – an ‘omnibus test’ -, before I do the tests by subject by question to id year on year statistically significant differences at that level – the advantage being that I can then exclude the subjects where there are no statistically significant differences indicated across all the questions between years for that subject so that the factor that I am reducing the p value for under bonferroni reduced by 30 for each subject that I can exclude.

My other question is at what level should I seek to maintain an overall p value at? Could I just maintain it at the question level and state this in my report? I mean if say my report was only looking at statistically significant year on year changes in one particular question this would be legitimate.

I would be really grateful if someone could suggest an approach here to adjusting for the multiple testing problem – the simplest approach is best for me, i.e., one based on performing chi-square/Fisher exact tests.

Thanks very much,

Steve