Calculate accuracy, precision, recall, F1, MCC from a confusion matrix. Also computes measurement error metrics (MAE, RMSE, MAPE) and simple proportion accuracy with confidence intervals.
Accuracy measures how close a result is to the true value. In classification, it's the proportion of correct predictions. In measurement, it's how close measured values are to actual values. This calculator handles both contexts plus simple proportion accuracy.
For classification tasks, enter a confusion matrix (TP, FP, FN, TN) and get accuracy, precision, recall, specificity, F1 score, Matthews correlation coefficient, and more. For measurement accuracy, enter paired actual and measured values to get MAE, RMSE, MAPE, and R². For simple accuracy, enter correct/total counts with a confidence interval.
Accuracy analysis is essential in medical diagnostics (test accuracy), machine learning (model evaluation), quality control (measurement precision), survey research (response accuracy), and any domain where quantifying correctness matters. Check the example with realistic values before reporting. Use the steps shown to verify rounding and units. Cross-check this output using a known reference case. Use the example pattern when troubleshooting unexpected results. Validate that outputs match your chosen standards.
Overall accuracy alone can be misleading, especially with imbalanced classes. This calculator provides a comprehensive suite of metrics — precision, recall, F1, MCC, balanced accuracy — that give a complete picture. For measurement tasks, it distinguishes bias (mean error) from dispersion (RMSE), helping identify systematic vs random errors. Keep these notes focused on your operational context.
Classification Accuracy = (TP + TN) / (TP + FP + FN + TN) Precision = TP / (TP + FP) Recall (Sensitivity) = TP / (TP + FN) Specificity = TN / (TN + FP) F1 = 2 × Precision × Recall / (Precision + Recall) MCC = (TP×TN − FP×FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN)) MAE = (1/n) Σ|yᵢ − ŷᵢ| RMSE = √((1/n) Σ(yᵢ − ŷᵢ)²) MAPE = (100/n) Σ|yᵢ − ŷᵢ|/|yᵢ|
Result: Accuracy = 98.50%, F1 = 0.923
With 90 true positives, 10 false positives, 5 false negatives, and 895 true negatives out of 1,000 observations, accuracy is 98.5%. Precision is 90% (90/100), recall is 94.7% (90/95), and F1 score is 0.923. MCC is 0.909, indicating excellent classification performance.
Classification accuracy counts discrete correct/incorrect predictions. Measurement accuracy quantifies how close continuous predictions are to true values. The metrics differ fundamentally: classification uses counts (TP, FP, FN, TN) while measurement uses deviations (errors). Both are called "accuracy" but require different evaluation frameworks.
In a dataset where 99% of samples are negative, a classifier that always predicts "negative" achieves 99% accuracy. This is the accuracy paradox — high accuracy despite useless predictions. Balanced accuracy, F1, and MCC all address this by accounting for both positive and negative class performance.
Always report multiple metrics. No single number captures all aspects of performance. Pair accuracy with precision/recall for classification, or MAE with MAPE for measurement. Use confusion matrix visualization to identify specific error patterns. Consider the costs of different error types in your domain.
Accuracy is the overall proportion of correct predictions (both positive and negative). Precision is the proportion of positive predictions that are actually positive. With a rare disease, a test that always says "negative" has high accuracy but zero precision.
Use F1 when classes are imbalanced or when false positives and false negatives have different costs. Use accuracy when classes are balanced and both types of errors are equally important.
Matthews Correlation Coefficient considers all four confusion matrix cells and produces a balanced measure even with imbalanced datasets. Unlike accuracy or F1, it's high only if the classifier does well on both positive and negative classes.
MAE gives equal weight to all errors. RMSE squares errors first, penalizing large errors disproportionately. If large errors are particularly undesirable, use RMSE. If all errors matter equally, use MAE.
R² = 1 means perfect prediction (all measured values exactly match actual). R² = 0 means the model is no better than predicting the mean. R² can be negative if predictions are worse than the mean.
The confidence interval width depends on accuracy × (1 − accuracy) / n. For 95% CI within ±2%, you need roughly n = accuracy × (1 − accuracy) × (1.96/0.02)² ≈ 2,400 for 50% accuracy, fewer for higher accuracy.