Test 6. Performance Metrics
Define accuracy. Write its formula and explain in which scenarios accuracy is a suitable metric and when it can be misleading.
Accuracy measures the proportion of correct predictions:
- Suitable when classes are balanced and all errors carry similar cost.
- Misleading on imbalanced data: a model that always predicts the majority class can have high accuracy despite poor performance on the minority class.
Define precision, recall, and F1‑score. Provide their formulas. Given a confusion matrix, explain how these metrics reflect different aspects of classification performance.
- Precision (positive predictive value):
- Recall (sensitivity or true positive rate):
- F1‑score (harmonic mean of precision and recall):
- Interpretation:
- Precision focuses on the correctness of positive predictions.
- Recall focuses on coverage of actual positives.
- F1 balances both, penalizing extreme imbalance between precision and recall.
Explain the confusion matrix for a binary classifier. Label its four cells and describe how you derive accuracy, precision, recall, and specificity from it.
A binary confusion matrix:
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual Positive | TP | FN |
| Actual Negative | FP | TN |
- Accuracy = (TP + TN) / total
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
- Specificity (true negative rate) = TN / (TN + FP)
Define Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). Provide their formulas and explain what each metric conveys about regression performance.
- MSE: average squared difference between predictions and targets:
- RMSE: square root of MSE, in the same units as the target:
- Interpretation:
- MSE penalizes larger errors more heavily.
- RMSE is more interpretable, reflecting typical error magnitude.
Describe the ROC curve and AUC. Define true positive rate and false positive rate. Explain how to construct the ROC curve and interpret the AUC value.
- True Positive Rate (TPR) = Recall = TP / (TP + FN)
- False Positive Rate (FPR) = FP / (FP + TN)
- ROC curve: plot TPR vs. FPR as the classification threshold varies from 0 to 1.
- AUC (Area Under the Curve): probability that a randomly chosen positive ranks higher than a negative.
- AUC = 1.0: perfect separation.
- AUC = 0.5: no better than random guessing.
- Use: compare classifier discrimination ability independent of threshold or class balance.