How Are Precision and Recall Calculated?
In data classification, precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances. And recall (also known as sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. Precision and recall are based on an understanding and measure of relevance.
Let us imagine there are 100 positive cases among 10,000 cases. You want to predict which ones are positive, and you pick 200 to have a better chance of catching many of the 100 positive cases. You record the IDs of your predictions, and when you get the actual results you sum up how many times you were right or wrong. There are four ways of being right or wrong:
- TN / True Negative: case was negative and predicted negative
- TP / True Positive: case was positive and predicted positive
- FN / False Negative: case was positive but predicted negative
- FP / False Positive: case was negative but predicted positive
Makes sense so far? Now you count how many of the 10,000 cases fall in each bucket, say:
Predicted Negative | Predicted Positive | |
Negative Cases | TN: 9,760 | FP: 140 |
Positive Cases | FN: 40 | TP: 60 |
Three main questions on this contingency table or confusion matrix:
- What percent of your predictions were correct?
You answer: the “accuracy” was (9,760+60) out of 10,000 = 98.2% - What percent of the positive cases did you catch?
You answer: the “recall” was 60 out of 100 = 60% - What percent of positive predictions were correct?
You answer: the “precision” was 60 out of 200 = 30%
True condition | ||||||
Total population | Condition positive | Condition negative | Prevalence = Σ Condition positive/Σ Total population | Accuracy (ACC) = Σ True positive + Σ True negative/Σ Total population | ||
Predicted condition |
Predicted condition positive |
True positive, Power |
False positive, Type I error |
Positive predictive value (PPV), Precision = Σ True positive/Σ Predicted condition positive | False discovery rate (FDR) = Σ False positive/Σ Predicted condition positive | |
Predicted condition negative |
False negative, Type II error |
True negative | False omission rate (FOR) = Σ False negative/Σ Predicted condition negative | Negative predictive value (NPV) = Σ True negative/Σ Predicted condition negative | ||
True positive rate (TPR), Recall, Sensitivity, probability of detection = Σ True positive/Σ Condition positive | False positive rate (FPR), Fall-out, probability of false alarm = Σ False positive/Σ Condition negative | Positive likelihood ratio (LR+) = TPR/FPR | Diagnostic odds ratio (DOR) = LR+/LR− | F1 score = 2/1/Recall + 1/Precision |
|
Sources :
https://www.kdnuggets.com/faq/precision-recall.html
http://www.wiki-zero.net/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUHJlY2lzaW9uX2FuZF9yZWNhbGw