Plot data points, compute Pearson correlation, linear regression, R², residuals, standard error, and outlier detection interactively.
A scatter plot is the starting point of virtually every bivariate data analysis. It reveals the relationship between two variables at a glance — positive or negative trend, linear or curved, tight or dispersed, with or without outliers. Pair it with a linear regression line and correlation statistics and you have a powerful analysis toolkit.
This calculator lets you enter data as x,y pairs, instantly visualizes the scatter plot, computes the least-squares regression line (y = mx + b), Pearson correlation coefficient (r), coefficient of determination (R²), standard error, and flags outliers more than 2 standard errors from the line. A full residuals table shows each point's predicted value and deviation from the line with a visual bar.
Whether you are analyzing lab results, economic data, survey responses, or homework problems, this tool gives you a complete regression analysis in seconds. Use the presets to explore classic data patterns — strong positive, negative, no correlation, quadratic, and outlier scenarios — before entering your own data.
Data visualization and regression analysis are core skills in every quantitative field — from science and engineering to business and social sciences. This tool combines the scatter plot, correlation coefficient, regression line, residual analysis, and outlier detection into a single interactive experience.
It is ideal for students learning statistics, professionals doing quick data explorations, and anyone who wants to check the strength of a relationship between two variables without opening a spreadsheet or writing code.
Slope: m = Σ(xᵢ−x̄)(yᵢ−ȳ) / Σ(xᵢ−x̄)². Intercept: b = ȳ − m·x̄. Pearson r = Sxy / √(Sxx·Syy). R² = r². Standard error: SE = √(SSE/(n−2)).
Result: r = 0.9863, R² = 0.9728, y = 0.9879x + 0.6121
A very strong positive linear relationship — about 97% of the variance in Y is explained by X.
The absolute value of r indicates strength: |r| > 0.9 is very strong, 0.7–0.9 is strong, 0.5–0.7 is moderate, 0.3–0.5 is weak, and < 0.3 is very weak or no linear relationship. However, even a moderate r can be practically significant in some fields (e.g., psychology often considers r = 0.3 meaningful), while a high r can be trivial if the variables are measured redundantly.
In 1973, Francis Anscombe constructed four datasets with nearly identical summary statistics (mean, variance, r, regression line) but wildly different scatter plots — one has a clear non-linear pattern, one has an outlier, and one is perfectly linear except for one point. The lesson: never skip the scatter plot. This tool makes plotting so easy that there's no excuse for relying on numbers alone.
Simple linear regression (one predictor, one response) is the foundation, but real analysis often involves multiple regression (many predictors), polynomial regression (curved fits), logistic regression (binary outcomes), or machine learning models. This tool covers the foundational case; understanding it well is essential before tackling more complex methods.
r ranges from −1 to +1. Values near ±1 indicate a strong linear relationship; 0 means no linear relationship. It doesn't capture non-linear patterns.
The coefficient of determination. R² = 0.85 means 85% of the variance in Y is explained by X. It equals the square of the correlation coefficient for simple linear regression.
The difference between an observed y-value and the predicted ŷ from the regression line. Residual = y − ŷ. Ideally, residuals are randomly scattered around zero.
Points with residuals greater than 2 standard errors from the regression line are flagged as potential outliers. This is a simple rule of thumb; more rigorous methods exist.
This tool fits a linear model. If your data is curved, the linear regression will be a poor fit (low R²). Consider transforming your data (log, square root) or using polynomial regression for non-linear patterns.
At least 2 are required mathematically, but meaningful correlation analysis needs 10+ points. With very few points, random patterns can produce misleadingly high r values.