Searching for variables that could cause a certain effect

We have measured the range of movement of a hip joint (measured in degrees) before and after surgery. And we have plenty of variables that could explain the differences in the results. Some of them are factors (sex, type of prosthesis, bearing surfaces...) and some are continous (age, weight, height...). We have more than 40 variables that could have an influence on the results.
At first I thought about testing variable after variable to see their influence in the range of movement to create a regression model. But then I though that, by doing that, I would lost the main advantage of the study: we have paired measurements before and after surgery. So I decided to go for a repeated measures GLM... but I am not sure about how to do that. I understand that GLM allows you to introduce a factor and several continous variables... but in our case we do not have a grouping variable. We have more than 20 factors that could have an influence and more than 20 continuous variables that could affect too. If I am not in a mistake, GLM is more to understand if the variables interact, but here we would like to decide which ones do influence the results and which others do not. It is not really about finding interaction effects.
Any ideas?
Thanks in advance.


TS Contributor
If you only have before and after measurements (as opposed to after_time_Zero, after_1_day, after_2_days, etc.), you can calculate the after - before = differences for each subject and analyze using differences as the response. That would provide more options for the analysis.


Well-Known Member
One of the big problems with having so many predictor variables is that some things are likely to produce false positives just by chance. One way of protecting yourself is to set your critical p value much lower than usual.
If you have plenty of cases, you could try an exploratory run with some of the data and then use the rest to test your significant finds are found again.
Thank you for your kind answers but... are you sure that using the post-pre data as the dependent variable will not give you different results compared with using a repeated measurements method? Or difference-in-differences?


TS Contributor
How large is your sample size?

And what is the context of the study, what will the results be used for? Is it not possible to reduce the
number of predictors, based on previous knowledge and on literature, and focus on only some of the

With kind regards