One issue that arises is that not all samples have all independent variables coded. For sake of discussion, let's assume no imputation of missing independent vars: those samples just get dropped when generating models that require vars they don't have. In other words, the nested models also have nested samples.

This leads to a situation where the AIC (and log likelihood ratio) would differ between models even if they were equally "good fitting" by virtue of the way that AIC/LR are calculated.

I am tempted to correct for this by dividing each model's AIC by the number of sampled individuals that were used to generate that model, and compare models based on this "AIC per sample". Intuitively, this feels acceptable, like studentizing residuals (though obviously totally different). But theoretically, I have no justification for it.

Any thoughts or feedback? ("That is a terrible idea" is a fine thought, too...)