My research paper was recently rejected and some of the feedback I received was in relation to the statistical tests done/not done. I would like help in clarifying what I could do differently as the feedback was not to informative.

I am attempting to see which baseline characteristics (my independent variables) can predict who will improve the most in my dependent variable after an intervention. As it is not published yet I won’t give to many details but a similar example would be trying to decide if any baseline characteristics in humans (such as muscle mass, age, gender, alcohol use, pulse rate etc) can predict improvement in 100m foot race times after undergoing a strength exercise program. I have a cohort of about 100 individuals all undergoing the same intervention.

In order to test this I collected data on all my baseline values and measured participants 100m times before and after undergoing the strength exercise program. I then made a multiple regression model were I included previous known confounders and my baseline characteristics of interest and used backwards stepwise removal of non-significant regressors to end up with a model of 3 independent variables significantly associating with improvement in 100m race times. For the sake of the argument let’s make up the following; gender, thigh muscle mass and smoking status (yes/no).

I was asked/critiqued on the following (again examples are made up);

1; type of sports shoe is a well-known determinant of 100 m race times, were improvement in race times still associated with baseline thigh muscle mass after adjusting for choice of sport shoe,?

-type of sport shoe was one of the independent variables included in my multiple regression model, however it was not significant when included with the other independent variables so it was removed in the backward stepwise removal process. Is any other statistical test more appropriate to run?

2, Could collinearity explain the results as several of the independent variables are likely to be similar

-I ran collinearity diagnostics in SPSS and did not receive any VIF values over 4 (with only one independent variable had a VIF at 4, the rest were under 3)

3; discuss regression to the mean as an explanation to my results

-I concede that it is likely that regression to the mean plays a part in which individuals improved the most/least but I don’t see how this impacts on the baseline characteristics in a significant way other than that these individuals are given greater weight in the results since they show the biggest change. I divided my cohort into tertiles based on improvement in race times and did not find that they differed in baseline values in any of my independent variables of interest.

Any input much appreciated!