+ Reply to Thread
Results 1 to 3 of 3

Thread: Logistic Regression

  1. #1
    Points: 6, Level: 1
    Level completed: 11%, Points required for next Level: 44

    Thanked 0 Times in 0 Posts

    Logistic Regression


    I'm currently working on a churn modelling exercise that uses logistic regression. The challenge that i'm facing now is that the model is not returning good results. My hypothesis is that the churn rate in my data set is too small, such that a logistic regression was not able to pick up something meaningful.

    Some of the context are:
    - Sample population is 250000 with a total churners of about 0.3% (approx 700) each month over 3 months duration
    - Predictors include education, marriage, gender, income level etc., most of them requires dummy coding

    My questions are therefore:
    - Is the number of churners in the data set too small for any meaningful logistic regression?
    - If that's the case, would it be ok to remove some of records from the sample population that are non-churners to create a dataset that containers larger numbers of churners?
    - Is there an ideal ratio of churn/ non-churners within dataset that allows for a meaningful logistic regression run?

    Help much appreciated!

  2. #2
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    I work in Europe, live in Hungary
    Thanked 332 Times in 312 Posts

    Re: Logistic Regression

    I think you could use the idea of a case-controlled study:


    others here might help you how to set this up correctly.


  3. #3
    TS Contributor
    Points: 17,749, Level: 84
    Level completed: 80%, Points required for next Level: 101
    Karabiner's Avatar
    FC Schalke 04, Germany
    Thanked 640 Times in 602 Posts

    Re: Logistic Regression

    I guess that this a "logistic regression for rare events" problem.
    I don't know whether 0.3% might ever be a problem in itself,
    but I know that at least the number of events is crucial for
    logistic regression. As long as you do not use too many
    covariates in your model, you'll probably be ok with n=700
    (or 3*700?) events. http://statisticalhorizons.com/logis...or-rare-events.

    Just my 2pence


+ Reply to Thread


Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Advertise on Talk Stats