# Thread: Significance and multiple outcomes - are we guilty of a form of subgroup analysis?

1. ## Significance and multiple outcomes - are we guilty of a form of subgroup analysis?

In a piece of previous research we wanted to see if there was evidence that peoples' ability to perform CPR declined with time. To test this we made a series of people perform CPR for 5 minutes each and, for each person measured the following indicators of quality every minute:
1. Mean depth of chest compressions during the previous minute
2. Mean rate of compressions during the previous minute
3. Mean volume of air used in rescue breaths during the previous minute

We were interested in decline in any of those areas so decline in any single are was interpreted as a "positive" result.

My problem is this:
• We calculated the required sample size and subsequent statistical significance using normal 95% confidence interval-style techniques
• These assume you are concerned about the reliability of a single test but we did 3
• We would have taken decline in any single measure as a "positive" result
...so, surely we were more likely to see a "positive" result due to chance (i.e. the null hypothesis is true) than our naive calculations led us to believe?

If this is the case, how do we counter this?

Any help much appreciated.

2. ## Re: Significance and multiple outcomes - are we guilty of a form of subgroup analysis

You may have issues with your sample size calculation, but it seems you have already performed the study, correct?

How did you define a decline - specifically.

Seems like you are converting perfectly good continuous data into binary data.

You may be able to to plot time against measures and look at correlations. You would use Pearson's (if data was normally distributed) or Spearman's Rank (if not normally distributed).

Side question: did they know how long they would have to perform CPR (so they may have paced themselves)?

3. ## The Following User Says Thank You to hlsmith For This Useful Post:

DrDogg (10-03-2012)

4. ## Re: Significance and multiple outcomes - are we guilty of a form of subgroup analysis

Hi hlsmith,

I think I have probably over complicated what I am asking by including the time dimension.

A simpler study with the same problem would be to make a series of people perform CPR for 5 minutes each and, for each person measured the following indicators of quality at the beginning and at the end:
1. Mean depth of chest compressions - at the beginning and end
2. Mean rate of compressions - at the beginning and end
3. Mean volume of air used in rescue breaths - at the beginning and end

Assessing significance...
I want to ignore clinical significance for the moment because that's quite a complex topic on its own and doesn't really change the problem.

The analysis we would have performed involves:
1. Plotting a graph of rate (plus 95% CIs) at the start and the end
2. Using a T-test to calculate a p value (or Chi2 for rate)
...and repeating this for depth and volume.

We would have said a statistically significant decline (at the 95% level) had occurred in any of the three if:
1. The confidence intervals didn't overlap
2. P < 0.05

My question is:
• Is this wrong?
It seems the same as the whole "performing multiple T-tests when we should be using ANOVA" thing (but obviously ANOVA solves a different problem)
• How do we fix it?
i.e. How do we calculate significance accounting for the fact that we did 3 T-tests not just one? (and how would we do sample size calcs in the future?)

Finally...
Yes they did know how long they would be doing CPR for. Its a good point but unfortunately they were volunteers and the data was all collected late in the evening when they all wanted to go home so we needed to say how long they'd be hanging around for.

5. ## Re: Significance and multiple outcomes - are we guilty of a form of subgroup analysis

I would plot the data for an illustration, but would use "Wilcoxon signed rank test". There are many tests that sound like this one, but make sure you look up the right one. This test is fairly straightforward. You substract your pairwise data (intitial and last measure for each person) and test that the difference is not "0" or no difference.

The way you used the 95% CI works to some extent, but is crude. I believe with the 95% CI, it has issues if compared to a test. What I mean is it provides a basic interpretation but doesn't hold true all of the time in comparison to a statistical test. Must be in the calculate of the opposing test. May someone else can provide an example.

How many subjects did you have?

6. ## Re: Significance and multiple outcomes - are we guilty of a form of subgroup analysis

Thanks for the suggestion. That's a useful test to know about.

That's not the issue I'm trying to get at though.

My issue is this:
• There are probably any number of relationships in our data that aren't real - they are just random variation.
• That's what we do significance tests for
• If there was no real change and we had only looked at one outcome and repeated the test 20 times, we might expect to see evidence of a decline (significant at the 95% level) at least once just by chance
• We actually looked at three outcomes; if we repeated the test 20 times we might expect to see evidence of a decline (significant at the 95% level) three times
Each outcome might generate a false positive once

In other words, our chance of having what looks like significant evidence of decline but actually just due to random variation could be as high as 15% (3 out of 20) rather than the 5% (1 out of 20) we hope for.

This doesn't seem very good to me.

My question was how do we get around this?
i.e. how do we calculate a big enough sample size to account for this?
How do we test significance accounting for this?

Hopefully that's clearer.

7. ## Re: Significance and multiple outcomes - are we guilty of a form of subgroup analysis

So your question is, a decline in any of these three individual endpoints results in an Overall binary response of decline for the person, and how can you correct for chance in any of the three endpoints declining since you are observing three measures.

I understand this question. However, I am still not sure what you want to report at the end and how you are going to do it. Knowing that may better tease out which level of significance you correct.

8. ## Re: Significance and multiple outcomes - are we guilty of a form of subgroup analysis

Edit: For some reason my post got truncated. Fixed now.

Yes, that's exactly what I'm trying to figure out.

Let me better explain what the results might look like...

What comparisons might we make?
What constitutes effective CPR is contained in guidelines published by the European Resuscitation Council. The guidelines are a target and lots of research has been done to see how achievable those targets are by real rescuers.

In 2010, the guidelines changed:
1. Compressions should now be deeper than previously suggested
2. Compressions should be faster than previously suggested
3. Breaths haven't changed

So, we have reasonable grounds to think that people might well get tired quicker. We cant do a direct head to head comparison because nowhere will agree to randomly assign people to the old or new guidelines.

What are we looking for?
The simplest example is rate of compressions. The sort of result we would look for is either:
• Rate is 110bpm at the start and 105 at the end
Even if its a real decline, who cares? Interpretation: guidelines are manageable
• Rate is 110 at the start but 55bpm at the end
Even if the CPR is otherwise fine, such a low rate is unlikely to be effective. Interpretation: guidelines might be unachievable

A concluding sentence might look like:
"Our data shows that depth declines but rate and breaths don't. We therefore recommend that interventions to target improving depth performance specifically are implemented."

Does that help?

9. ## Re: Significance and multiple outcomes - are we guilty of a form of subgroup analysis

Originally Posted by DrDogg
"Our data shows that depth declines but rate and breaths don't. We therefore recommend that interventions to target improving depth performance specifically are implemented."
Nice response. Though, I still think you are getting hung up on the composite endpoint. I would just report the overall endpoint as a count with percentages (e.g., 56(60%) had a decline in one of the three CPR components). I don't think you need any test related to this number. Then perform your comparisons for the 3 measures for the two time reports. Wilcoxon ranked sum for compression depth and breaths and perhaps with number of compressions. Or you could maybe do a one sample ttest for number of compressions comparing the mean decrease in the number of compressions versus 5 (from the 110-105).

10. ## Re: Significance and multiple outcomes - are we guilty of a form of subgroup analysis

As it so often turns out once you've figured out how to ask the right question, the answer is already half formed:

As soon as I Googled "Multiple Outcomes Analysis" a few useful resources surfaced.

In particular, this one is of relevance to health research:
Clinical Trials with Multiple Outcomes: A Statistical Perspective on their Design, Analysis, and Interpretation

Ah well, you live and learn I guess.

11. ## Re: Significance and multiple outcomes - are we guilty of a form of subgroup analysis

I only perused the document, but I am not sure that gets at your issue. My reasoning being (which I keep coming back to in my mind) you only have one group and a historical comparison that is correlated. No intervention was implemented. The paper did bring up the limitation of using mulitiple markers for a single overall outcome (emphasizing that not all of the markers may be weighted equally in importance). I am thinking about this scenario will probably post again shortly.

12. ## Re: Significance and multiple outcomes - are we guilty of a form of subgroup analysis

There may be a straightforward procedure to tackle your problem (I can't think of it), but I believe that I would report the statistical tests that I proposed. You could also present desciptive data where you state counts with percentages of people who were ? standard deviations (Chebyshev inequality) below minute 1 mean in their last minute for each of the three measures (done independently), then report count and percentage of how many people had at least one decline, etc.

Please post on what you decided to use or if you find an alternative method.

On a comparable structured problem, I reported the events along with an armitage-cochran trend test (one-sided) then the composite with no test, just descriptive presention of count.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts