I am currently in the progress of performing multicollinearity diagnostics for a logistic regression model using tolerance and VIF calculations based on recommendations in Allison (2012) (Logistic Regression Using SAS: Theory and Application, Second Edition).

In my model I include three sets of fixed effects. Specifically they cover origin countries (25), and industries (22), in total 45 dummy variables. I have tried calculating VIF values with- and without these dummies. Sample size is around 6000. Its a cross-sectional analysis (not panel data).

When calculating VIF with the dummies, tolerances are acceptable, all with VIF < 4. However when adding the dummies, tolerance values of some predictors drop to very low values (even the lowest VIF is > 10), indicating the predictor variables that I want to use for inference is basically useless (at least for explanatory analysis of these, as I understand it). The non dummy variables (besides a single pair) is not correlated:

Using SAS REG and GLM (automatic fixed effects dummies not possible in REG) respectively:

I have tried adding and removing variables to see if the problem was due to correlations between the predictor variables (not the dummies), but I can conclude that the VIF inflation happens due to the dummies.

Theoretically it makes sense to add the dummies to account for non measured country, industry and time dependent effects. Also the variables included makes theoretical sense although they may be correlated. E.g. the number of foreign subsidiaries of a firm is often correlated to its age and size, but still theoretically the effect of these variables is different in regards to the dependent variable. Is it possible to create model in which I can validly explain relationships between independent and the dependent variable if VIF values are high like this? If not what can I do besides omitting important predictors?

In my model I include three sets of fixed effects. Specifically they cover origin countries (25), and industries (22), in total 45 dummy variables. I have tried calculating VIF values with- and without these dummies. Sample size is around 6000. Its a cross-sectional analysis (not panel data).

When calculating VIF with the dummies, tolerances are acceptable, all with VIF < 4. However when adding the dummies, tolerance values of some predictors drop to very low values (even the lowest VIF is > 10), indicating the predictor variables that I want to use for inference is basically useless (at least for explanatory analysis of these, as I understand it). The non dummy variables (besides a single pair) is not correlated:

Using SAS REG and GLM (automatic fixed effects dummies not possible in REG) respectively:

I have tried adding and removing variables to see if the problem was due to correlations between the predictor variables (not the dummies), but I can conclude that the VIF inflation happens due to the dummies.

Theoretically it makes sense to add the dummies to account for non measured country, industry and time dependent effects. Also the variables included makes theoretical sense although they may be correlated. E.g. the number of foreign subsidiaries of a firm is often correlated to its age and size, but still theoretically the effect of these variables is different in regards to the dependent variable. Is it possible to create model in which I can validly explain relationships between independent and the dependent variable if VIF values are high like this? If not what can I do besides omitting important predictors?

Last edited: