Results are opposite of what one would expect on a biological basis and I believe it is due to a number of confounding variables that were not included in the final model using the AIC method and were deemed not significant. However, although these variables are excluded statistically, there is a strong theoretical basis to have included at least some of them. For example, as the outcome is disease activity and one group had more patients on high efficacy therapy to prevent disease activity it seems logical to include this variable despite its statistical significance.

Is this method appropriate with a small sample size and categorical variables (with some that are broken into up to 6 categories)? And are there any other downfalls based on this method, which might explain these opposite results? I appreciate any feedback, Thanks!