Choosing correct post-hoc test for my data in SPSS

#1
Hey everyone!

I'm currently working on an observational study looking at differences in heart rate and blood pressure variability in heart failure patients. It's a very large study with 500+ patients, and we are comparing the heart rate and blood pressure between the sexes, between ethnicities, between diabetics and non-diabetics, smokers and non-smokers etc. The groups are of unequal sizes. I've conducted normality tests for the heart rate and blood pressure data using Shapiro-Wilk and found that my data was non-parametric. As such I've done Kruskal-Wallis and Mann-Whitney U tests on the data, and would now like to do post-hoc testing on the results I have. I've spent the past week trying to get my head around post-hoc, but there seems to be about a million different ways to go with this. My impression is that Bonferroni corrections are overly conservative and not the optimal type of post-hoc I should use. I would be extremely grateful if someone could point me in the right direction about what type of post-hoc is most suitable for my data (ideally available on SPSS, which is the program I'm using).

Many thanks :)
 
#2
I've conducted normality tests for the heart rate and blood pressure data using Shapiro-Wilk and found that my data was non-parametric.
I must humbly say that I have never heard of any non-parametric data, only of non-parametric methods. I guess that the data were non-normally distributed.

But as have been said at this site a million times, it is not the dependent variable it self, but the dependent variables conditional on the independent variable, i.e. the residuals, that need to be normally distributed in analysis of variance (anova).

Or else the dependent variables conditional on the independent variable need to have a known distribution, like e.g. the gamma distribution in generalized linear models (glm).




It's a very large study with 500+ patients, and we are comparing the heart rate and blood pressure between the sexes, between ethnicities, between diabetics and non-diabetics, smokers and non-smokers etc.
There seem to be four factors (“the sexes” and so on). [Although I always find it strange when someone involves “ethnicities”.]

I would simply run these three (or four) factors in anova, try a transformations if the residuals are non-normal and going to a glm if that does not work. The factorial structure would give parameter estimates directly for the three (or four if you would involve that strange race factor that can’t be clearly “measured”). The parameter estimates would be normally distributed by the central limit theorem since there are so many observations.


On a multiple inference problem: Isn’t it easier to look at four estimated parameters based on 500 observations than to look at 16 combinations (4 factors with each on 2 levels gives 16=2^4) based on about at most 30 observations (500/16)? These 16 combinations give 16*15/2 = 120 pairs to compare. So the nominal critical p-values would be quite low (0.0004) on a Bonferroni correction (or Bonferroni-Holm method). The non-parametric methods seems quite hopeless to me in my humble meaning.

No, I would go for the parametric multifactor method (and then use Bonferroni-Holm method on the four estimated main effect parameters).
 
#3
Thanks for getting back to me. I'm afraid working with statistics is actually completely new for me, so I apologise for the confusion "non-parametric data" may have caused. [And looking into the effects of ethnicity is not "strange" but actually very interesting because in a way we're extrapolating the effects of various gene pools, and not making political statements. I suppose you're not a biologist]

I read about the Holm sequential Bonferroni, though I'm afraid it doesn't exist in SPSS (and with the volume of the data I have I don't think my supervisor would expect I do it manually).
 
#4
I understand that the factor “the sexes” take the values “female” and “male” and I understand the factor with levels “diabetics” and “non-diabetics”.

But, just for curiosity, what level does the factor “ethnicities” take? And how is it measured?

I do apologize for my lack of understanding of biology.

(I hope you can use my suggestion about linear models.)
 
#5
Thanks for the advice - I'm gonna dig my head into GLM in the next few days and see what I come out with, I'll keep you posted on how I get along!

For ethnicities we're mainly comparing a Northern European (=0) with a South Asian population (=1), so it's measured much like the other data.