Wilcoxon test.

#1
Good afternoon. Firstly, I am not a statistician, so apologies if what I am asking isn't clear.
The RALES study "The Effect of Spironolactone on Morbidity and Mortality in Patients with Severe Heart Failure", published in 1999 is highly regarded, and has guided the management of heart failure since its publication.

The journal can be found Here if you're interested.

I take no issue with the study, but can't quite get my head around the Wilcoxon test used below:

"Three categories were used to assess changes in the symptoms of heart failure: improvement, no change, and worsening or death. The condition of patients who were in NYHA class III at base line was considered to have improved if they were in NYHA class I or II at the end of the study and considered to have worsened if they were in NYHA class IV (or had died). The condition of patients who were in NYHA class IV at base line was considered to have improved if they were in NYHA class I, II, or III at the end of the study; other patients in NYHA class IV at base line either had no change at the end of the study or died. In the placebo group, the condition of 33 percent of the patients improved; it did not change in 18 percent, and it worsened in 48 percent. In the spironolactone group, the condition of 41 percent of the patients improved; it did not change in 21 percent, and it worsened in 38 percent. The difference between groups was significant (P<0.001 by the Wilcoxon test)"

Is the Wilcoxon test the most appropriate test to use for this data? (as it has been used to measure a difference between groups as described above)
Is there a way to quantify the robustness result? and is there anything else I need to consider while drawing conclusions from this paragraph?


Thanks, Dan.
 

fed2

Active Member
#2
I think it is basically OK analysis. Its not the only analysis. I don't like the term 'appropriate', its started to irk me after awhile of stats, its not really a well defined thing 'appropriateness'. Makes it sound like we are selecting correct fork for our fancy restaurant desert or something.

I think the main critique is the use of change scores in the first place. I have seen it out there that a better analysis would be ordinal logistic regression with baseline status as a covariate. If they give a shift table for the NYHA class you could try it yourself. I'd expect a high level of agreement between his and the wilcoxon test, although i'm not sure the exact relationship between the two tests. Nonetheless change scores are widely used in this type of study.
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
To piggyback on @fed2 - the Wilcoxon test typically only kicks out a pvalue which is pretty much meh. It is telling you that the proportions between the groups are different, but neglecting to give you an effect estimate with a precision bound. In 1999, this would have been status quo, but now days if randomization was successful, ordinal logistic regression or a simple comparisons of rates with a correction for three outcome categories would be more informative.
 

fed2

Active Member
#4
concerning the 'robustness' of the result, i think the main support is that the all cause mortality showed a clear effect. if the nyha class wasn' different between the groups, it would sort of imply that nyha score was not associated with mortality, which is not a good score, it seems to me.