Hey y'all,

I was running a stacked ensemble (weighted combo of base machine learners) model yesterday (R: H20: autoML). And I noticed the top contributing model had an accuracy of 99.995%. It was a gradient boosted model (per random grid search) for a classification problem. I thought hey maybe the model really is that good, since I am not over familiar with gbm. So today I wanted to check it out and ran a logistic reg on the problem and got a complete separation error. I am on a laptop and using R, which I am not great on the little computer or R. So I toyed around with the model by excluding a single covariate at a time in logistic reg. I notice when either X1 or X5 were excluded the model would run with out error. So I then generated the three different 3-way contingency tables (since variables were binary: Y, X1, X5) and did not notice any null cells in the tables.

What do you all think for investigating this?

P.S., Side note, the model is actually modeling the missingness of a variable in the dataset, so (prob(missing(y/n) |

I was running a stacked ensemble (weighted combo of base machine learners) model yesterday (R: H20: autoML). And I noticed the top contributing model had an accuracy of 99.995%. It was a gradient boosted model (per random grid search) for a classification problem. I thought hey maybe the model really is that good, since I am not over familiar with gbm. So today I wanted to check it out and ran a logistic reg on the problem and got a complete separation error. I am on a laptop and using R, which I am not great on the little computer or R. So I toyed around with the model by excluding a single covariate at a time in logistic reg. I notice when either X1 or X5 were excluded the model would run with out error. So I then generated the three different 3-way contingency tables (since variables were binary: Y, X1, X5) and did not notice any null cells in the tables.

What do you all think for investigating this?

P.S., Side note, the model is actually modeling the missingness of a variable in the dataset, so (prob(missing(y/n) |

**X**)
Last edited: