Testing for significance: Disease Prevalence vs Disease Severity

Hi Folks,

This is my first post and low and behold, it's a help request. I am a relative stats beginner and I have been searching the net for clues to help me determine the correct statistical test to undertake. I have been unable to work this out- perhaps I am just not used to the language. I was hoping someone may be able to give me some pointers.

I am investigating a disease condition at five geographical locations. As part of this, I have recorded the prevalence and the proportion of each severity stage of the condition (stage 0-5) for each site. My observations are that there appears to be increased severity where there is increased prevalence. My null hypothesis is that there is no association between location prevalence and severity stage

I was hoping someone could point me in the right direction concerning the correct test to undertake. Based on some previous work I have done, I was thinking something like odds ratio but as I said- I am a rookie :)

Could any of you kind bunch offer some pointers?

Many thanks


Hmmmm, I am guessing its a comparison :confused:

Essentially I have five locations of relatively low to high prevalence. 100% stacked histogram appears to show that as prevalence increases, the proportions of the higher severity stages also increases. I wanted to see if this is statistically significant.

Does that help? Or am I truly dumb! :(


Omega Contributor
So to make sure I have this correct, for five locations you have the prevalence of the condition (e.g., 25%), then you also of the breakdown of the disease severity for those people (breakdown of severity in disease with the 25%).

I will be honest, a simple solution to this question is not jumping out for me. My issue is that those people not included in the prevalence do not have a severity score. Not sure how to incorportate that information.

A very basic thing you could do is conduct the kruskal wallis test with the locations as the groups and severity as the dependent variable. Then if the test was significant you would perform pairwise Wilcoxon rank sum (median) test to determine which locations were different (controlling for multiplicity). Then you could crudely look at the prevalences for those locations and may make a speculation.

However, there is probably a better way than this.
Yes you are correct.

I agree with your comments concerning unaffected individuals- after all, if you don't have the disease, your don't have a corresponding severity score.

My colleague believed two tests were probably the way forward too. I will crunch some figures :)

Many thanks for your input. I REALLY appreciate it.



TS Contributor
You have n=5 centres and for each centre you have
the median of the severity (median of the affected
individuals in each centre, that is) and the prevalence
rate. So you could do a Spearman rank correlation
between median severity and prevalence. The sample
size is a bit small, though.

With kind regards

Hi Karabiner. Sorry, I had to park this work for a bit as we investigate additional aspects. Many thanks for your input.
I am not sure if I mentioned this but, each site had 50 individuals investigated for the disease of interest (BTW it was 6 sites, not 5- my bad).
So we have 300 individuals with (a) a presence/absence score and (b) qualitative severity score whereby 0= absent and 5= severe. However, because '1' and '0' i.e. present/absent, are not exactly informative, we have to use prevalence which reduces n from 300 to 6.