Well, I will start off by applauding your awareness of a potential issue and your approaches. I am not a master of multilevel models by any stretch - I have only self-taught myself.

I believe the first step you are supposed to do, is run an empty model with no predictors, but controlling for random effects (intercepts in your case). If that model predicts a significant amount of variance, then you control for clustering n the iteratively built models.

The reason your first model is different comes down to the amount of variance you are neglecting to measure - the between hospital effects. So model two explains outcome by within and also between facility variance and the last model has robust errors, so it will also make finding and effect difficult, since you are comparably broadening your confidence intervals.

Literature says, not addressing clustering can lead to Type I errors, saying there is a difference when there isn't. I would examine how much of the covariance you can explain controlling for clustering. I recall when I ran my first multilevel model, I thought it would shrink up my confidence intervals and I would find significance easier. You are on the right path, just think about meta-analyses. You can't just pool the results from two studies together, you have to control for study differences. Much like you may have to control for hospital differences.