ichbin, this would be graphing the observed versus the expected for the deciles?

Yes. With N=1^5, I would probably use 100 bins instead of just 10.

Originally Posted by ichbin
You are asserting that your model (plus perfectly Gaussian noise)
There is no gaussian noise assumption for logistic regression - I'm guessing you were just talking about linear models in general?

Originally Posted by ichbin
Yes. With N=1^5, I would probably use 100 bins instead of just 10.
Annnd I'm guessing you meant 10^5. 1^5 isn't as exciting.

Dason: Correct on both counts. Thanks for the helpful editing.

Thanks for the info ichbin.

Don't worry, Dason is not trying to be a super automated pain. Its just trying to make this forum the cleanest most accurate resource for community-generated statistical conversations.

You are asserting that your model (plus perfectly Gaussian noise) is the complete and exact description of how the world works,
I know that is a classical way to looking at models, but I don't think any real world model fullfills it (a point made by William Berry among others). Models are simplifications at best of the real world. In the real world there would be likely hundreds of variables, at least, influencing the results and pathways between dependent and independent variables would commonly flow both ways and through indpendent variables. None of which is modeled normally in regression or ANOVA for example.

Models are gross simplifications at best to make it easy to think through issues. They never truly model the real world, although we tend to overlook that in practice.

I totally agree noetsi, and it seems to me that this fact is a good reason not to rely on the Hosmer-Lemeshow test.

Hi again. Thanks a lot for your posts... they're more than I could ask for...
I do not have any variable which is exclusively related to one group. (like mortality rate or hospitalization rate which would be naturally more related with the more severe visits).
The ratio between the 2 groups is almost 1/2 (30000 severe visits, 70000 less severe).
I used interactions because we should expect (and it does happen) to notice differential effect across different economic status (interaction year*economic status) and between the 3 levels of ED care we have (since the payments are the least for less diferentiated Emergency Room, and highest for central ER). I used all those interactions.
I will post the output in a minute. Thanks!!

Yes, lets see the output - it will definitely help us understand the model!!

Here it is.. I just had to edit the name of variables to english so it was easier to give the output in a png...

Two interactions were NS: YEAR*ECONOMIC_STATUS and YEAR*ECONOMIC_STATUS*DISTANCE
I still included them when calculating the effect of change (year 1 vs year 0) since I read here in the forum I should include interactions NS when higher interactions and main effects were S

I used interaction between Year and ED Level and economic status since we should see a differential effect of the political change (Year) across the type of ED (since each Ed had different fee amount) and of economic status (users with low resources were even exempted from payment) and with distance because there could be an effect of the price of transportation to the ED (higher for central hospitals, and for people with low resources).
Thanks a lot!!!

Jake I think people rely on H&L (when they know there are problems with the artificial number of categories and it's not even certain what distribution it has) because they want a goodness of fit test to say their model is "good" and there are no real alternatives. The fact that there is no true R squared value for logistic regression makes this even more powerful. Also because Hosmer and Lemeshow are probably the best known of the writers on logistic regression.