Logistic Regression : Model Validation


New Member
I am validating a logistic regression model. This is the first time i am validating a model. I am using split sampling method. I have split data randomly into two parts - 70% development and 30% validation data sets (70:30). Then i run logistic regression on development data set using SAS and rank their probabilities in descending order and split data into 10 groups (deciles). Check the percentage of responses in the upper deciles. My question - Is there any thumb rule to assess the model on the basis of percentage of responses in the upper 3-4 deciles? Someone said top 3 deciles should cover atleast 65% of responses. Is it correct? I have checked Hosmer and Lemeshow Goodness-Fit-Test.

I have formulated an equation including intercept and coefficients derived from development dataset and run it on validation dataset. The code is shown below for reference -

Data validation_Output;
Set validation_Set;
resp_xb1= -1.3844+(1.4708)*A_Flg+(2.9829)* B_Flg+(.0317)* C + (-0.2372)*D +(-0.3359)*E;

Then run PROC RANK and PROC SQL to calculate deciles on validation dataset.

I have decile scores on development and validation sets. Should the significant variables in both the datasets be same? OR Concordant?