# SAS Logistic Regression - High Wald Chi square for a binary variable

#### RamShan

##### New Member
I am working on an analytical task where I need to identify why a particular binary independent variable is having high Wald chi-square (approx. 10k) when compared to the other variables (approx in 100's) in the logistic regression model.

Can I simply conclude by saying that the response variable is strongly dependent on that particular binary variable by doing some cross tab or chi-square test of independence or is there any other better way i could explain this stuff to few statisticians here?

Ram

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I would also just present the basic 2x2 table (not controlling for anything), and would imagine the condition is almost mutually exclusive to one of the dependent variable's groups. Probably some more complex things you could do, but if you just want to convey what is going on that should do it.

You may also do ROC curves with and without the variable.

#### Mean Joe

##### TS Contributor
I suspect your binary dependent variable is unevenly distributed in "success"/"failure", perhaps only in some cross-section of your data.

#### RamShan

##### New Member
Few notes -

1. Am not sure if that's is unevenly distributed - its mean is 0.14 and standard deviation is 0.3499 (i interpret this should be close to evenly distributed - correct me if i am wrong)

2. its populated 14% of the times (i.e for every 100 records i will have 14 records with this binary var = 1)...

3. there is no quasi-complete separation as well

Let me know if you need some more stats..

Thanks!
Ram

#### RamShan

##### New Member
When I am searching for SAS Logistic Regression Diagonstics, I frequently come across "Influence" and "Lackfit" options. Do you think this will help me dive deeper into the data and come up with some answers for explaining high wald chi-square?..

#### noetsi

##### Loves R
Lackfit can be useful. My, very cursory, review of Influence is that there is a tremendous amount of information provided and it is not easy to tell what is useful and what is not (unless you have very deep statistical background anyway). This deals in part I believe with Cook's d and leverage (or whatever the equivalent is in logistic regression).

You might want to look at Paul Allison's "Logistic Regression Using SAS" (2nd ed) which is a very good review of this topic in the context of SAS.

#### RamShan

##### New Member
Yep you are true, "Influence" generates ton of statistics and its hard to interpret. Let me take a look at the book you suggested on LR and see what I can get from it.

Thanks!
Ram

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Why do you not just think this is a highly associated variable?