I would really appreciate some pointers on this:

a log-linear model that produces better fit with fewer terms. E.g. in a 3D scenario I have
Model 1: A+B+C
Model 2: AB+C

And in some cases I find that Model 1 has better fit. (just to be clear, by better fit I mean it has a higher p-value, not a lower Chi-square value - I do realise the Chi-squares are not directly comparable.)

Somehow I just always assumed that a "higher" model would always improve the fit - the fact that in all the examples I can find in my books that is always the case may have had something to do with this

Any pointers to literature that touches upon such issues would be extremely welcome.