+ Reply to Thread
Results 1 to 9 of 9

Thread: Interpretation of Logistic Regression with Python, Pandas & Statsmodel

  1. #1
    Points: 87, Level: 1
    Level completed: 74%, Points required for next Level: 13

    Posts
    10
    Thanks
    1
    Thanked 1 Time in 1 Post

    Question Interpretation of Logistic Regression with Python, Pandas & Statsmodel




    Hi guys,

    I am new to regression analysis and I am trying to figure out how to interpret my results. I basically did a logit regression in Python and I am wondering how I can interpret the "coef" and "z-value" for example.

    My analysis is about how the number of tweets, promos, fb_updates etc. affect whether a business ends up being successful (e.g. the final outcome is either 1 for success, or 0 for failure)

    >>> print result.summary()

    Logit Regression Results
    =============================================
    Dep. Variable: success No. Observations: 2780
    Model: Logit Df Residuals: 2776
    Method: MLE Df Model: 5
    Date: Tue, 23 May 2017 Pseudo R-squ.: 0.1952
    Time: 21:48:54 Log-Likelihood: -1665.6
    converged: True LL-Null: -2069.5
    LLR p-value: 2.202e-172
    ===========================================
    ----------------coef-------std err------z---------P>|z|------[0.025--- 0.975]
    ---------------------------------------------------------------------------------
    tweets -------0.0022-----0.000-----4.903----- 0.000-----0.001-----0.003
    fb_updates --0.2344 ----0.014-----16.492-----0.000-----0.207-----0.262
    faq------------0.0798-----0.017------4.704------0.000-----0.047-----0.113
    promos-------0.0116-----0.010-----1.178------0.239-----0.031-----0.008
    images-------0.0171-----0.005------3.232------0.001-----0.007-----0.027
    ==============================================

    I did some research on what this means:
    1. The minus under "coef" means there is an inverse relationship. The coef predicts the dependent variable from the independent variable. However, all my "coef" values are very low except for fb_updates - what does this mean?
    2. The std error is low for all variables, so the parameters are statistically different from 0?? Not sure how to interpret this.
    3. No idea what "z" means
    4. My P-values are all under 0.01 except for promos...so they are all statistically significant except for Promos?
    5. [0.025 0.975] --> confidence interval for coefficient?


    I also did the "odds ratio"
    >>> print np.exp(result.params)
    tweets 1.002240
    fb_updates 1.264199
    faq 1.083017
    promos 0.988451
    images 1.017253
    dtype: float64

    I read a bunch of websites trying to figure out how to interpret the results, but I am still lost...any help would be appreciated!

  2. #2
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Interpretation of Logistic Regression with Python, Pandas & Statsmodel

    You should slap confidence intervals on the odds ratios and I will show you how to interpret those results. You post seem right on so far.
    Stop cowardice, ban guns!

  3. #3
    Points: 87, Level: 1
    Level completed: 74%, Points required for next Level: 13

    Posts
    10
    Thanks
    1
    Thanked 1 Time in 1 Post

    Re: Interpretation of Logistic Regression with Python, Pandas & Statsmodel

    What does it mean when my coef is low? And what does the std error indicate?

  4. #4
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Interpretation of Logistic Regression with Python, Pandas & Statsmodel

    Can you define how these are formatted: tweets, promos, fb_updates? Are they continuous variables or categorical (binary)?
    Stop cowardice, ban guns!

  5. #5
    Points: 87, Level: 1
    Level completed: 74%, Points required for next Level: 13

    Posts
    10
    Thanks
    1
    Thanked 1 Time in 1 Post

    Re: Interpretation of Logistic Regression with Python, Pandas & Statsmodel

    Yes, tweets, promos, fb_updates etc are ALL continous variables, e.g. there could be 0 tweets, or 10 tweets or even 150 tweets. Does this make a difference when interpreting the results?

  6. #6
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Interpretation of Logistic Regression with Python, Pandas & Statsmodel

    hi,
    as hlsmith said, your model seems to be ok. As for the interpretation : exp(coef) is the corresponding odds ratio, so if the coefs are low, this is not a such a great problem.

    In your case there might be another issue as well: your units might be too small. E.g. by tweets- the odds ratio in the case the number of tweets increases by one is very close to 1 as one tweet more or less makes very (very) little difference. To look at somerhing of practical interest you can calculate the odds ratio for an increase by 50 tweets, say, which would be exp(50*0.0022)=1.116. So, an increase by 50 tweets increases the odds by 16%. Same logic for the other coefficients.

    regards

  7. #7
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Interpretation of Logistic Regression with Python, Pandas & Statsmodel

    The coefficients are the log odds of the outcome (success) for a 1 unit increase in the variable. SEs are the variability in the estimate given sampling distribution, used in statistical tests and confidence intervals.


    Odds of business success are 0.2% (95% CI: 0.1%, 0.3) greater for every tweet. So you would likely want to change units to something other than 1 unit if it represents a tweet.
    Stop cowardice, ban guns!

  8. #8
    Points: 87, Level: 1
    Level completed: 74%, Points required for next Level: 13

    Posts
    10
    Thanks
    1
    Thanked 1 Time in 1 Post

    Re: Interpretation of Logistic Regression with Python, Pandas & Statsmodel

    That is very helpful! One more question: How would I interpret a coefficient under 1?

    promos 0.988451
    1 - 0.988451 = 0.011549

    Odds of business success decrease by 1,15% for every promo? Does that seem right?

  9. #9
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Interpretation of Logistic Regression with Python, Pandas & Statsmodel


    If significant, those coefficients are more likely associated with the "failure" outcome group. People who eat carrots have X times lower odds for heart disease. So as promos go up your odds of success go down, BUT, that is a non-significant predictor so you can say odds go up or down given your model and controlling for other variables.
    Stop cowardice, ban guns!

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats