I earlier posted about the problematic interpretation of PPV (positive predictive value). As there seems to be no easy solution for that I tried to understand what the PPV could tell me and especially what it cannot.

Now I was calculating confusion matrices for predictions of the model i built. To do that I came across the caret package which easily gives you along with the confusion matrix measures like sensitivity, specificity, PPV alongside others.

The problem I have now are the formulas for calculating the PPV, Precision, Sensitivity and Recall. (The actual description of the Formulas are attached as quote at the end of the post)

The first problem is with PPV and Precision:

I learned the calculation of the PPV as

true positives/ (true positives+false positives)

the package does call this formula precision (which I always thought is similar to PPV) and calculates the ppv from sensitivity specificity and prevalence.

In my case the prevalence is taken from the model data itself, thus seemingly PPV and precision are identical.

So for me it looks like in this case there is only a difference of precision and PPV if I am using the actual prevalence of the population and not the prevalence of my sample.

Is this always the case if someone is referring to these two measures?

The second problem is with senisitivity and recall:

The Formula for Sensitivity and Recall seem to be similar. Is there a difference of these two measures?

Could maybe someone help me to get out of this confusion?

I never really got the hang of statistics and the longer I am trying to sort this problem out the more confusing it gets for me.

Thanks a lot!

From documentation of caret package:

The functions requires that the factors have exactly the same levels.

For two class problems, the sensitivity, specificity, positive predictive value and negative predictive

value is calculated using the positive argument. Also, the prevalence of the "event" is computed

from the data (unless passed in as an argument), the detection rate (the rate of true events also

predicted to be events) and the detection prevalence (the prevalence of predicted events).

Suppose a 2x2 table with notation

referenceEvent referenceNo Event

predicted Event A B

predicted No Event C D

The formulas used here are:

Sensitivity = A/(A + C)

Specif icity = D/(B + D)

P revalence = (A + C)/(A + B + C + D)

P P V = (sensitivity∗prevalence)/((sensitivity∗prevalence)+((1−specif icity)∗(1−prevalence)))

NP V = (specif icity∗(1−prevalence))/(((1−sensitivity)∗prevalence)+(specificity)∗(1−prevalence)))

DetectionRate = A/(A + B + C + D)

DetectionP revalence = (A + B)/(A + B + C + D)

BalancedAccuracy = (sensitivity + specif icity)/2

P recision = A/(A + B)

Recall = A/(A + C)

F1 = (1 + beta2) ∗ precision ∗ recall/((beta2∗ precision) + recall)

The functions requires that the factors have exactly the same levels.

For two class problems, the sensitivity, specificity, positive predictive value and negative predictive

value is calculated using the positive argument. Also, the prevalence of the "event" is computed

from the data (unless passed in as an argument), the detection rate (the rate of true events also

predicted to be events) and the detection prevalence (the prevalence of predicted events).

Suppose a 2x2 table with notation

referenceEvent referenceNo Event

predicted Event A B

predicted No Event C D

The formulas used here are:

Sensitivity = A/(A + C)

Specif icity = D/(B + D)

P revalence = (A + C)/(A + B + C + D)

P P V = (sensitivity∗prevalence)/((sensitivity∗prevalence)+((1−specif icity)∗(1−prevalence)))

NP V = (specif icity∗(1−prevalence))/(((1−sensitivity)∗prevalence)+(specificity)∗(1−prevalence)))

DetectionRate = A/(A + B + C + D)

DetectionP revalence = (A + B)/(A + B + C + D)

BalancedAccuracy = (sensitivity + specif icity)/2

P recision = A/(A + B)

Recall = A/(A + C)

F1 = (1 + beta2) ∗ precision ∗ recall/((beta2∗ precision) + recall)