Firth regression and AIC-based model selection


New Member
Dear all, when using Firth's bias reduction method (Firth 1993) to fit logistic regressions, is it possible (sensible) to compare models using AIC?

In R, two main functions ("logistf" and "brglm") can handle Firth's method. MuMIn allows to extract AIC(c) values for both functions, so my first guess would be "yes".

However, quoting from the "brglm" vignette:

The use of Akaike’s information criterion (AIC) for model selection when method = "" is controversial. AIC was developed under the assumptions that (i) estimation is by maximum likelihood and (ii) that estimation is carried out in a parametric family of distributions that contains the “true” model. At least the first assumption is not valid when using method = "". However, since the MLE is asymptotically unbiased, asymptotically the modified-scores approach is equivalent to maximum likelihood. A more appropriate information criterion seems to be Konishi’s generalized information criterion (see Konishi & Kitagawa, 1996, Sections 3.2 and 3.3), which will be implemented in a future version.
Furthermore, when trying to fit the same model using "logistf" and "brglm", parameter estimates are similar, but the AIC values are very different. Specifically, "brglm" returns a value of AIC similar to that of "glm", while "logistf" returns a much lower AIC value.

Any clue why that happens?


Less is more. Stay pure. Stay poor.
Hmm, I hadn't thought about this. I would think if each model was fit using the same procedure/package (with correction) wouldn't that be like comparing a bunch of people's heights with a wonky ruler? All would be biased comparably. However this may not hold if weighing people with a wonky scale where the spring has perhaps a non-linear bias, light people slightly skewed but heavier people really skewed.

If they are nested models, would -2loglikelihood work or also have the same issues. So are you fitting them all with the same procedure, they are just nested models? Also, I hate to be daft, but is there a clear winner for best fit (e.g, AUC), so you are just grabbing at straws?

Are you using the correction because of sparse data?

There is an excellent paper, In Press, with either "Epidemiology" or "American Journal of Epidemiology" that talks about sparse data in logistic reg. It doesn't address this topic per se, but is a fun read. One of the others was Sander Greenland.

P.S., Do you have the information available to calculate the KIC or proposed measure?


New Member
Thanks for the reply!

My dataset consists of 200 records, with strongly skewed odds between 0 and 1 (85% vs. 15%, respectively). This means some 30 "1s", which is a fairly low number. To be fair, the models fitted with GLM and binomial structure do no give any warnings of sparse data, so a model selection on binomial GLMs might be OK. For the sake of comparison, I also fitted the same set of models (which are not necessarily nested) with Firth's regression (using "logistf"). The model selections with GLM and Firth's regression return very similar results (as expected, the parameters of Firth's regression are less inflated).
This would support your suggestion of the "comparably biased ruler".

Recently, however, I tried to refit the same set of models using "brglm": while the parameter estimates are the same as with "logistf", the AIC values are not. This resulted in a selection of different competing models. This in turn would support the hypothesis of the "non-linearly biased ruler".

The AUC is similar for GLM and "brglm". "logistf" is not supported by the function predict, so I did not calculate AUC (though I guess it might be possible, with some algebra?).

So, all in all, given the absence of warnings when fitting binomial GLMs, and the very similar model selection results between GLMs and "logistf", I would say the use of AIC with "logistf" might be sensible.
Yet I do not understand the difference between "brglm" and "logistf"... I am wondering if, perhaps, the bias reduction is coded differently?


Less is more. Stay pure. Stay poor.
Yeah, not a regular R user, but I skimmed the documentation. Well logistf, seems like the traditional Firth correction based on Jeffrey's prior.

brglm, is also associated with Jeffrey's prior but unsure if there is some type of hangover due to the "generalized" approach or it that gets deactivated. Either way I would imagine both approaches should be very comparable. Your question was whether you can compare AIC between to approaches, well to be safe I would just commit myself to one of the approaches. Though, as you noted, sparsity may not be an issue and I am guessing you have finite confidence intervals from non-corrected approaches.

You could probably use "pROC" on all models to score datasets with model coefficients for calculation of AUCs. But, it all comes down whether there really are two competitive models that may best explain the data generating function.

I imagine you are fine, you seem to already be doing your due diligence. One can always find things to overly debate on, but an easy fix is to eventually understand all the approaches you are using and marry yourself to an a priori data analysis plan for the project. That puts many trivial debates to rest and takes any unconscious investigator bias away.


New Member
Yep, the fact that I get finite CI even with simple GLMs without correction is somewhat reassuring!
Also agree on the trivial debates part ;-)
Thanks a lot for the feedbacks!