+ Reply to Thread
Results 1 to 6 of 6

Thread: Logistic Regression Problem

  1. #1
    Points: 1,936, Level: 26
    Level completed: 36%, Points required for next Level: 64

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question Logistic Regression Problem




    Hi!

    My collegue is developing a logistic regression to predict the probability of a customer taking up a particular credit card.

    Let's call the credit card the gold card, so his dependent variable on the left side is GoldCard = 0,1. On the right side of his equation he is also including a variable for overall credit card indicator that equals 1 if the customer has ANY credit card. Therefore any observation where GoldCard=1 the overall credit card indicator also equals 1.

    It seems to me that this would result in an artificially high accuracy because the two variables are perfectly correlated. A comperable example would be developing a model that predicts gender and then including a gender variable on the right side.. And you'd never do that, right?

    Anyways, I'm pretty sure it's incorrect to include the overall card indicator but I'm not knowledgeable or articulate enough to explain why. Any info/insight would be greatly appreciated.

    Thanks,
    Ben

  2. #2
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Logistic Regression Problem

    The two variables won't be perfectly correlated. That would only be true if there was only one credit card available. Many of the zeros on the left side will be ones on the right side (that is on the predictor variable); these will reflect people who had a credit card other than the gold card. If the variables were perfectly correlated than there would be no valid regression equation and many softwares won't run (this is perfect collinearity).

    I would agree that it might artificially inflate the prediction (it might also reduce variability which causes attentuation of the slope in extreme cases). But the real question I think is not the methods issue, it is what is the theoretical reason to include this variable. Obviously if people chose to have a credit card compared to not having it, this would increase the chance of having one specific card. But what does that tell you? It sort of is like predicting eating by using as a predictor whether one is hungry or not. It might predict it, but you have learned very little.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  3. #3
    Points: 1,936, Level: 26
    Level completed: 36%, Points required for next Level: 64

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Logistic Regression Problem

    Thanks for the input, noetsi.

    You're right that it's not perfect collinearity. We're back testing the model on previous campaigns right now so it'll be interesting to see if it predicts enough of the actual responses, or if it only predicts the small percentage of customers in the campaign that already had a card and got the gold card as their second card.

  4. #4
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Logistic Regression Problem

    I would be interested in hearing what you find and methods issues. I have spent a lot of time working with logistic regression in the context of SAS, but I rarely get to use it or see it applied to practical issues.

    One thing of note. You probably will get faster responses here if you put this type of thread in the regression rather than probability forum.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  5. #5
    Points: 1,936, Level: 26
    Level completed: 36%, Points required for next Level: 64

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Logistic Regression Problem

    Ran the model against previous campaign and it did a terrible job discerning take-up. It had essentially the same response rate in the 0.00 - 0.10 score range as in the 0.90 - 1.00 range.

  6. #6
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Logistic Regression Problem


    hi,
    I think, if I understood the case correctly, that it only makes sense to have the value 1 for the column ANY credit card. I mean, a even turtle can predict that if a customer has NO credit card at all then he does not have a Gold card either.

    But in this case you get a column of 1s which will not help in the prediction at all.

    Do I miss something hete?

    regards
    rogojel

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats