How shall the ‘rule of 10’ be interpreted in stepwise forward multiple logistic regression?

#1
I have a set of 20 cases and 80 controls and 5 independent variables that are all significantly correlated with the 20 cases in the univariate logistic regression analysis. Is it correct to add one variable at the time of the four weaker variables to the strongest variable and chose the one combination that decreases the deviance the most, thus creating a model with 2 variables (provided the deviance decrease significantly)? Or is this overfitting the model because I would then need 50 cases (50/10=5) because I´m exploring 5 variables, even though only 2 are in the final model?
 
#2
Hi,

Partial answer:

1. I think the rule of thumb for logistic regression is max(10 k / p, 100) , not 10k. https://www.researchgate.net/post/W..._sample_size_for_running_logistic_regression2
This is probably when you know the number of predictors.

2. Using the stepwise method to choose the predictors is only to give direction, you should also include methodology and/or other researches

2. I think that to choose the correct predictors based on the stepwise method you need a bigger sample size from the overall model since you need to check every predictor(at least in linear regression ...) So as a rule of thumb, probably I would use at least the 5 IV, unless you determine the 2IV not based on stepwise method.
 
Last edited: