Calculate sensitivity, specificity, PPV, NPV, likelihood ratios, and Youden's J from a confusion matrix. Includes PPV/NPV table at different prevalence levels.
Sensitivity and specificity are the two fundamental measures of diagnostic test accuracy. Sensitivity (true positive rate) measures how well the test detects the condition — a sensitive test rarely misses positive cases. Specificity (true negative rate) measures how well the test identifies healthy individuals — a specific test rarely gives false alarms.
This calculator takes a 2×2 confusion matrix (TP, FP, FN, TN) and computes a complete diagnostic analysis: sensitivity, specificity, positive/negative predictive values, likelihood ratios, Youden's J index, diagnostic odds ratio, and confidence intervals. It also shows how PPV and NPV change with prevalence — a critical consideration in screening programs.
These metrics are essential in medical diagnostics, laboratory testing, quality control, machine learning model evaluation, and any binary classification context. Check the example with realistic values before reporting. Use the steps shown to verify rounding and units. Cross-check this output using a known reference case. Use the example pattern when troubleshooting unexpected results.
A single accuracy number hides crucial information about test performance. This calculator reveals the full picture: how the test performs on positive vs negative cases, how results change with disease prevalence, and whether the test is useful for ruling in or ruling out a condition. The prevalence-adjusted PPV/NPV table is especially valuable for clinical decision-making.
Sensitivity = TP / (TP + FN) Specificity = TN / (TN + FP) PPV = TP / (TP + FP) NPV = TN / (TN + FN) Likelihood Ratios: LR+ = Sensitivity / (1 − Specificity) LR− = (1 − Sensitivity) / Specificity Prevalence-adjusted PPV: PPV = (Sens × Prev) / (Sens × Prev + (1−Spec) × (1−Prev)) Youden's J = Sensitivity + Specificity − 1
Result: Sensitivity = 85.0%, Specificity = 99.4%
With 85 true positives, 5 false positives, 15 false negatives, and 895 true negatives: sensitivity is 85% (catches 85% of diseased patients) and specificity is 99.4% (correctly identifies 99.4% of healthy patients). LR+ = 152.2, indicating a positive result is very informative.
When screening for rare conditions (low prevalence), even highly accurate tests produce more false positives than true positives. If a disease affects 1 in 1,000 people and the test has 99% sensitivity and 99% specificity, a positive result still only means ~9% chance of disease (PPV ≈ 9%). This counter-intuitive result is the base rate fallacy, and the prevalence table in this calculator makes it explicit.
Sensitivity and specificity depend on the chosen diagnostic threshold. Lowering the threshold increases sensitivity but decreases specificity (more false alarms). The ROC curve plots all possible sensitivity-specificity pairs. Youden's J maximizes the sum of sensitivity and specificity, providing one optimal threshold. Other criteria weight false positives and false negatives differently.
In practice, single tests are often insufficient. A common strategy uses a sensitive screening test followed by a specific confirmatory test. The first test catches most cases (high sensitivity); the second test weeds out false positives (high specificity). Serial testing multiplies specificities and reduces overall false positive rate at the cost of some sensitivity.
Sensitivity = P(test positive | disease present) — how well the test detects disease. PPV = P(disease present | test positive) — how likely disease is given a positive test. They answer different questions and are affected differently by prevalence.
In a low-prevalence population, most people are healthy. Even a specific test will produce many false positives from the large healthy population, diluting the true positives. At 0.1% prevalence, even a 99% specific test has only ~8% PPV.
LR+ tells you how much to increase your estimate of disease probability after a positive test. LR− tells you how much to decrease it after a negative test. They're prevalence-independent, making them more generalizable than PPV/NPV.
It depends on the clinical context. Screening tests prioritize high sensitivity (>95%) to catch all cases. Confirmatory tests prioritize high specificity (>99%) to avoid false diagnoses. The optimal trade-off depends on the costs of false positives vs false negatives.
A perfect test (sensitivity = specificity = 100%) is rare in practice. Most tests trade off between the two. The ROC curve plots this trade-off at different thresholds, and the area under the ROC curve (AUC) summarizes overall discriminating ability.
DOR = (TP × TN) / (FP × FN). It combines sensitivity and specificity into a single measure. DOR > 1 means the test discriminates better than chance. DOR > 100 indicates excellent discrimination. It's useful for comparing tests in meta-analyses.