Interpretation of Logistic Regression with Python, Pandas & Statsmodel

Hi guys,

I am new to regression analysis and I am trying to figure out how to interpret my results. I basically did a logit regression in Python and I am wondering how I can interpret the "coef" and "z-value" for example.

My analysis is about how the number of tweets, promos, fb_updates etc. affect whether a business ends up being successful (e.g. the final outcome is either 1 for success, or 0 for failure)

>>> print result.summary()

Logit Regression Results
Dep. Variable: success No. Observations: 2780
Model: Logit Df Residuals: 2776
Method: MLE Df Model: 5
Date: Tue, 23 May 2017 Pseudo R-squ.: 0.1952
Time: 21:48:54 Log-Likelihood: -1665.6
converged: True LL-Null: -2069.5
LLR p-value: 2.202e-172
----------------coef-------std err------z---------P>|z|------[0.025--- 0.975]
tweets -------0.0022-----0.000-----4.903----- 0.000-----0.001-----0.003
fb_updates --0.2344 ----0.014-----16.492-----0.000-----0.207-----0.262

I did some research on what this means:
1. The minus under "coef" means there is an inverse relationship. The coef predicts the dependent variable from the independent variable. However, all my "coef" values are very low except for fb_updates - what does this mean?
2. The std error is low for all variables, so the parameters are statistically different from 0?? Not sure how to interpret this.
3. No idea what "z" means
4. My P-values are all under 0.01 except for they are all statistically significant except for Promos?
5. [0.025 0.975] --> confidence interval for coefficient?

I also did the "odds ratio"
>>> print np.exp(result.params)
tweets 1.002240
fb_updates 1.264199
faq 1.083017
promos 0.988451
images 1.017253
dtype: float64

I read a bunch of websites trying to figure out how to interpret the results, but I am still lost...any help would be appreciated!


Omega Contributor
You should slap confidence intervals on the odds ratios and I will show you how to interpret those results. You post seem right on so far.


Omega Contributor
Can you define how these are formatted: tweets, promos, fb_updates? Are they continuous variables or categorical (binary)?
Yes, tweets, promos, fb_updates etc are ALL continous variables, e.g. there could be 0 tweets, or 10 tweets or even 150 tweets. Does this make a difference when interpreting the results?


TS Contributor
as hlsmith said, your model seems to be ok. As for the interpretation : exp(coef) is the corresponding odds ratio, so if the coefs are low, this is not a such a great problem.

In your case there might be another issue as well: your units might be too small. E.g. by tweets- the odds ratio in the case the number of tweets increases by one is very close to 1 as one tweet more or less makes very (very) little difference. To look at somerhing of practical interest you can calculate the odds ratio for an increase by 50 tweets, say, which would be exp(50*0.0022)=1.116. So, an increase by 50 tweets increases the odds by 16%. Same logic for the other coefficients.



Omega Contributor
The coefficients are the log odds of the outcome (success) for a 1 unit increase in the variable. SEs are the variability in the estimate given sampling distribution, used in statistical tests and confidence intervals.

Odds of business success are 0.2% (95% CI: 0.1%, 0.3) greater for every tweet. So you would likely want to change units to something other than 1 unit if it represents a tweet.
That is very helpful! One more question: How would I interpret a coefficient under 1?

promos 0.988451
1 - 0.988451 = 0.011549

Odds of business success decrease by 1,15% for every promo? Does that seem right? :confused:


Omega Contributor
If significant, those coefficients are more likely associated with the "failure" outcome group. People who eat carrots have X times lower odds for heart disease. So as promos go up your odds of success go down, BUT, that is a non-significant predictor so you can say odds go up or down given your model and controlling for other variables.