Calculate Youden's J statistic from sensitivity and specificity or a 2×2 table. Includes ROC space visualization, PPV/NPV, DOR, LR+/LR−, MCC, and a 13-metric performance dashboard.
Youden's Index (J) is the single most informative summary statistic for a diagnostic (or classification) test. Defined as Sensitivity + Specificity − 1, it ranges from 0 (useless) to 1 (perfect), combining both error types into one number. A test with J = 0.84 means you're capturing 84 percentage points more correct classifications than random chance.
This calculator computes J from either raw sensitivity/specificity percentages or a 2×2 contingency table (TP, FP, FN, TN). Beyond J itself, the dashboard reports 13 performance metrics: PPV, NPV, accuracy, balanced accuracy, diagnostic odds ratio, likelihood ratios, F1 score, Matthews correlation coefficient, and the number needed to screen.
The ROC space visualization plots the test's operating point and shows J as the vertical distance from the chance line — the same quantity maximized when finding the optimal ROC cutoff. The quality gauge maps J to interpretive bands (Uninformative through Excellent) for quick assessment. Check the example with realistic values before reporting.
Youden's Index distills a diagnostic test to its essence: how much better is it than guessing? This calculator goes further, computing 13 metrics alongside a visual ROC space plot, so you can evaluate a test from every angle — discrimination, prediction, odds ratios, and classification quality.
The preset library includes real-world medical tests, making it easy to benchmark your test. The contingency table mode accepts raw counts for when you have experimental data rather than published rates. The J quality gauge provides immediate visual feedback.
J = Sensitivity + Specificity − 1 = TPR − FPR. Equivalently, J = (TP × TN − FP × FN) / ((TP + FN)(FP + TN)). Ranges from −1 to +1; meaningful tests have J > 0.
Result: J = 0.8450, Quality: Excellent
J = 0.85 + 0.995 − 1 = 0.845. The test captures 84.5 percentage points more correct classifications than random assignment. At 5% prevalence, PPV = 89.5% and NPV = 99.9%. LR+ = 170, indicating strong positive discrimination.
When plotting an ROC curve from continuous test results, each possible cutoff gives a different (FPR, Sensitivity) pair. Youden's Index identifies the optimal cutoff — the point on the curve farthest from the chance diagonal. This maximum-J cutoff maximizes the sum of sensitivity and specificity simultaneously, providing a principled and widely cited selection criterion.
J assumes equal weight for sensitivity and specificity, which isn't always appropriate. Screening for a lethal cancer demands high sensitivity (catching all cases) even at the cost of specificity. In contrast, confirmatory tests must have high specificity. The full metric dashboard in this calculator — DOR, LR+, LR−, PPV, NPV — helps evaluate the test for your specific clinical scenario.
In binary classification, J appears as "informedness" or "bookmaker informedness." It's equivalent to balanced accuracy × 2 − 1 and closely related to Matthews Correlation Coefficient and Cohen's Kappa. When class imbalance makes accuracy misleading, J (and its relatives) provide a more honest assessment of classifier performance.
It summarizes a diagnostic test's discriminatory ability in a single number. It's most commonly used to find the optimal cutoff on an ROC curve — the point where J is maximized. It also enables quick comparison of different tests: higher J means better overall discrimination.
Geometrically, J is the maximum vertical distance between the ROC curve and the diagonal chance line. The optimal cutoff for a test is the threshold that maximizes this distance, balancing sensitivity and specificity.
J depends only on sensitivity and specificity, which are properties of the test itself (conditional on disease status). Unlike PPV and NPV, J doesn't change with disease prevalence. This makes it suitable for comparing tests across populations with different disease rates.
J ≥ 0.8 is excellent, 0.6–0.8 is good, 0.4–0.6 is fair, 0.2–0.4 is poor, and < 0.2 is essentially uninformative. However, what’s "good enough" depends on context — screening tests for serious diseases need high sensitivity even if J isn't perfect.
Yes, J ranges from −1 to +1. A negative J means the test performs worse than random — it's systematically wrong. This usually indicates the test labels are inverted or there's a fundamental methodological error.
J weights sensitivity and specificity equally and is prevalence-independent. F1 is the harmonic mean of precision (PPV) and sensitivity, making it prevalence-dependent. J is preferred in medical diagnostics; F1 is more common in machine learning classification tasks.