Residual Analysis Calculator

Compute residuals, standardized residuals, leverage, Cook's distance, Durbin-Watson, skewness, kurtosis, and outlier detection for regression diagnostics.

About the Residual Analysis Calculator

Fitting a regression line is only half the job — checking whether that line is trustworthy is the other half. Residual analysis reveals problems invisible in summary statistics: non-linearity, heteroscedasticity, autocorrelation, outliers, and influential points. This calculator provides a comprehensive residual diagnostic suite.

For each data point, we compute raw residuals, standardized (internally studentized) residuals, leverage values h_ii, and Cook's distance. Global diagnostics include the Durbin-Watson statistic for autocorrelation, a runs test for randomness, and skewness/kurtosis of the residual distribution.

Load the preset datasets to see healthy vs. pathological residual patterns. The "Heteroscedastic" preset shows increasing residual spread — violating a key OLS assumption. The "Nonlinear Pattern" preset shows residuals with a systematic curve — the linear model is fundamentally wrong. Check the example with realistic values before reporting. Use the steps shown to verify rounding and units. Cross-check this output using a known reference case. Use the example pattern when troubleshooting unexpected results.

Why Use This Residual Analysis Calculator?

Many analysts run a regression, report R², and stop. But a high R² with violated assumptions produces misleading confidence intervals, incorrect p-values, and unreliable predictions. Residual analysis is the safety check.

This calculator puts standard after-regression diagnostics in one place: outlier detection via standardized residuals, influence via Cook's distance, autocorrelation via Durbin-Watson, and normality via skewness/kurtosis. The visual residual bars immediately show patterns that numbers alone might hide.

How to Use This Calculator

  1. Enter X values and corresponding Y values (comma-separated).
  2. Or click a preset to load diagnostic scenarios.
  3. Set the outlier threshold for standardized residuals (default 2).
  4. Review the diagnostic output cards (RMSE, Durbin-Watson, etc.).
  5. Examine the residual table for outliers and influential points.
  6. Check Cook's distance — values > 1.0 indicate highly influential observations.
  7. Use the diagnostic reference table to interpret each metric.

Formula

Residual: eᵢ = yᵢ − ŷᵢ. Standardized: eᵢ* = eᵢ / (s√(1−hᵢᵢ)). Leverage: hᵢᵢ = 1/n + (xᵢ−x̄)²/Sxx. Cook's D: Dᵢ = eᵢ*²·hᵢᵢ / (p(1−hᵢᵢ)). Durbin-Watson: d = Σ(eᵢ−eᵢ₋₁)²/Σeᵢ².

Example Calculation

Result: R² = 0.9997, RMSE = 0.117, Durbin-Watson = 2.14, all |std. residuals| < 2.0, max Cook's D = 0.32

Residuals show no pattern, Durbin-Watson near 2.0 (no autocorrelation), no outliers or influential points. This is a healthy regression with all assumptions met.

Tips & Best Practices

Regression Assumptions and Residuals

OLS regression assumes: (1) Linearity — the true relationship is linear. (2) Independence — residuals are uncorrelated. (3) Homoscedasticity — residual variance is constant. (4) Normality — residuals are normally distributed. Each assumption maps to specific diagnostic tests.

Linearity: Plot residuals vs. predicted values. Random scatter = good. Curves = consider polynomial terms. Independence: Durbin-Watson tests first-order serial correlation. Homoscedasticity: Look for fan shapes in residual plots. Normality: Check skewness and kurtosis.

Influential Points vs. Outliers

An outlier has a large residual — the model predicts poorly for that point. A high-leverage point has an extreme X value. An influential point changes the regression substantially when removed. A point can be high-leverage without being influential (if it falls on the trend), or an outlier without being influential (if leverage is low). Cook's distance captures the combined effect.

What To Do When Diagnostics Fail

Non-linearity: Add polynomial terms or transform variables. Heteroscedasticity: Use weighted least squares or robust standard errors. Autocorrelation: Use generalized least squares or add lag terms. Non-normality: Transform Y (log, sqrt) or use robust regression. Outliers: Investigate data quality, use robust methods (LAD, Huber), or report with and without.

Frequently Asked Questions

What's the difference between raw and standardized residuals?

Raw residuals (eᵢ = yᵢ − ŷᵢ) retain Y-units. Standardized residuals divide by estimated standard deviation accounting for leverage, converting to a unit-free scale where values beyond ±2 indicate potential outliers.

What does the Durbin-Watson statistic mean?

DW tests for first-order autocorrelation in residuals. DW ≈ 2 means no autocorrelation. DW << 2 suggests positive autocorrelation (consecutive residuals similar). DW >> 2 suggests negative autocorrelation (consecutive residuals alternate sign).

When is Cook's distance concerning?

The traditional rule: Cook's D > 1 is influential. A stricter rule uses D > 4/n. Remove or investigate high-Cook's-D points — they may be data errors, outliers, or genuinely different observations that shouldn't be modeled together.

What does high leverage mean?

Leverage measures how far xᵢ is from x̄. Extreme X values have high leverage: they have outsized potential to pull the regression line. High leverage isn't always bad — compare Cook's D to see if the point actually affects the regression.

What if residuals aren't normally distributed?

Non-normal residuals don't affect coefficient estimates but do affect confidence intervals and p-values. Check skewness (should be near 0) and kurtosis (should be near 0 for excess kurtosis). With n > 30, the Central Limit Theorem provides some protection.

How do I detect heteroscedasticity?

Look for a fan or funnel shape in the residual visual — residuals getting larger (or smaller) as X increases. Our visual bars show this pattern clearly. Formal tests include Breusch-Pagan and White's test.

Related Pages