+ Reply to Thread
Results 1 to 3 of 3

Thread: Logistic regression - different cutpoints for classification and probability?

  1. #1
    Points: 3,520, Level: 37
    Level completed: 14%, Points required for next Level: 130

    Posts
    13
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Logistic regression - different cutpoints for classification and probability?




    I was reading an article on logistic regression tonight, and I noticed the following:

    "Logistic regression can be used to classify observations as events or nonevents
    as was done in discriminant - classification analysis. ... To use this information you would search through the classification table to find the probability cut-off point that produces the best classification performance. In our example a probability of .22 to .28 produces rules that have the highest overall successful classification rate."

    Are they saying that it could possibly be appropriate to say: your probability of having cancer is 25% by our model; however, based on our cutpoint, we believe you have cancer.

    I'm confused, because I was of the opinion that you should be consistent with your classifier - that is, if you change your cutpoint to like 25%, then you should rescale your probabilities around this point.

  2. #2
    Dark Knight
    Points: 6,762, Level: 54
    Level completed: 6%, Points required for next Level: 188
    vinux's Avatar
    Posts
    2,011
    Thanks
    52
    Thanked 241 Times in 205 Posts
    Quote Originally Posted by james View Post
    I'm confused, because I was of the opinion that you should be consistent with your classifier - that is, if you change your cutpoint to like 25%, then you should rescale your probabilities around this point.
    Cut off point is depending on the event rate (event %).
    Around this point produces rules that have the highest overall successful classification rate.
    In the long run, we're all dead.

  3. #3
    Points: 3,520, Level: 37
    Level completed: 14%, Points required for next Level: 130

    Posts
    13
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Quote Originally Posted by vinux View Post
    Cut off point is depending on the event rate (event %).
    Around this point produces rules that have the highest overall successful classification rate.
    Now, what you say makes perfect sense to me. But it seems that you have to make a mutually exclusive trade-off: either set your cutpoint so that your relative odds work out appropriately, or set your cutpoint to maximize your classification rate.

    By maximizing your "relative odds", I mean that you could optimize for the "correctness" of the odds. So the best model would be one in which 1 out of every 10 packets that are given a 10% chance of being 'signal' are truly signal; 2 out of 10 packets that are given an 80% chance of being 'signal' are truly not signal; etc.

    But it seems that by doing that, you don't simultaneously optimize your cutpoint for maximal classification. I mean, empirically, I have demonstrated this to myself with dozens of models fit around my data. I can either have excellent binary classification, or good "meaning" of the prediction (again, where 20% really correlates with 2 in 10 being misclassified as noise when they are indeed signal), but not both.

+ Reply to Thread

           




Similar Threads

  1. statistics for classification
    By nambad in forum Statistics
    Replies: 1
    Last Post: 03-29-2011, 06:28 PM
  2. Replies: 2
    Last Post: 01-23-2011, 12:39 PM
  3. classification
    By aledanda in forum Statistics
    Replies: 2
    Last Post: 05-27-2010, 04:15 AM
  4. Classification Rate of Multivariate Logistic Regression
    By eyeballjunk in forum Regression Analysis
    Replies: 1
    Last Post: 09-29-2009, 03:01 PM
  5. Replies: 0
    Last Post: 03-22-2009, 02:54 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats