+ Reply to Thread
Results 1 to 3 of 3

Thread: R caret package: problems understanding the functions in confusionMatrix

  1. #1
    Points: 576, Level: 11
    Level completed: 52%, Points required for next Level: 24

    Posts
    44
    Thanks
    5
    Thanked 5 Times in 5 Posts

    Question R caret package: problems understanding the functions in confusionMatrix




    Hey,

    I earlier posted about the problematic interpretation of PPV (positive predictive value). As there seems to be no easy solution for that I tried to understand what the PPV could tell me and especially what it cannot.

    Now I was calculating confusion matrices for predictions of the model i built. To do that I came across the caret package which easily gives you along with the confusion matrix measures like sensitivity, specificity, PPV alongside others.

    The problem I have now are the formulas for calculating the PPV, Precision, Sensitivity and Recall. (The actual description of the Formulas are attached as quote at the end of the post)

    The first problem is with PPV and Precision:

    I learned the calculation of the PPV as

    true positives/ (true positives+false positives)

    the package does call this formula precision (which I always thought is similar to PPV) and calculates the ppv from sensitivity specificity and prevalence.

    In my case the prevalence is taken from the model data itself, thus seemingly PPV and precision are identical.
    So for me it looks like in this case there is only a difference of precision and PPV if I am using the actual prevalence of the population and not the prevalence of my sample.
    Is this always the case if someone is referring to these two measures?

    The second problem is with senisitivity and recall:

    The Formula for Sensitivity and Recall seem to be similar. Is there a difference of these two measures?

    Could maybe someone help me to get out of this confusion?
    I never really got the hang of statistics and the longer I am trying to sort this problem out the more confusing it gets for me.

    Thanks a lot!


    From documentation of caret package:

    The functions requires that the factors have exactly the same levels.
    For two class problems, the sensitivity, specificity, positive predictive value and negative predictive
    value is calculated using the positive argument. Also, the prevalence of the "event" is computed
    from the data (unless passed in as an argument), the detection rate (the rate of true events also
    predicted to be events) and the detection prevalence (the prevalence of predicted events).

    Suppose a 2x2 table with notation

    referenceEvent referenceNo Event
    predicted Event A B
    predicted No Event C D

    The formulas used here are:

    Sensitivity = A/(A + C)
    Specif icity = D/(B + D)
    P revalence = (A + C)/(A + B + C + D)
    P P V = (sensitivity∗prevalence)/((sensitivity∗prevalence)+((1−specif icity)∗(1−prevalence)))
    NP V = (specif icity∗(1−prevalence))/(((1−sensitivity)∗prevalence)+(specificity)∗(1−prevalence)))
    DetectionRate = A/(A + B + C + D)
    DetectionP revalence = (A + B)/(A + B + C + D)
    BalancedAccuracy = (sensitivity + specif icity)/2
    P recision = A/(A + B)
    Recall = A/(A + C)
    F1 = (1 + beta2) ∗ precision ∗ recall/((beta2∗ precision) + recall)

  2. #2
    Omega Contributor
    Points: 38,374, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: R caret package: problems understanding the functions in confusionMatrix

    Please rephrase your question a little if I don't exactly answer it.


    Yeah, I believe you have everything right. The usage of Recall, Precision, etc. come from the Classifications Fields and Machine Learning. They are just synonymous terms to what you are used to and I believe the caret package has other machine learning procedures, so it opts defaults to those terms.


    Not sure why they just don't use A / (A + B) in the precision formula, but just superficially looking at the above formula it seems fine.
    Stop cowardice, ban guns!

  3. #3
    Points: 576, Level: 11
    Level completed: 52%, Points required for next Level: 24

    Posts
    44
    Thanks
    5
    Thanked 5 Times in 5 Posts

    Re: R caret package: problems understanding the functions in confusionMatrix


    Hey,

    thanks a lot. Sorry for the late reply i was on holidays

    Yes it is actually a machine learning package, so ut seems that I have to get used to the different terms

    The only thing I am wondering about is the Formula for the PPV and Precision.

    I learned that (apart from them being the same) the Formula as:

    PPV = A/(A + B)

    but it seems that the formula they use is also commonly used for calculation of PPV as

    PPV = (sensitivity∗prevalence)/((sensitivity∗prevalence)+((1−specif icity)∗(1−prevalence)))

    Is there any rule of when to use which Formula?

    Thank you for your reply!

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats