Scatter Plot & Regression Calculator

Plot data points, compute Pearson correlation, linear regression, R², residuals, standard error, and outlier detection interactively.

About the Scatter Plot & Regression Calculator

A scatter plot is the starting point of virtually every bivariate data analysis. It reveals the relationship between two variables at a glance — positive or negative trend, linear or curved, tight or dispersed, with or without outliers. Pair it with a linear regression line and correlation statistics and you have a powerful analysis toolkit.

This calculator lets you enter data as x,y pairs, instantly visualizes the scatter plot, computes the least-squares regression line (y = mx + b), Pearson correlation coefficient (r), coefficient of determination (R²), standard error, and flags outliers more than 2 standard errors from the line. A full residuals table shows each point's predicted value and deviation from the line with a visual bar.

Whether you are analyzing lab results, economic data, survey responses, or homework problems, this tool gives you a complete regression analysis in seconds. Use the presets to explore classic data patterns — strong positive, negative, no correlation, quadratic, and outlier scenarios — before entering your own data.

Why Use This Scatter Plot & Regression Calculator?

Data visualization and regression analysis are core skills in every quantitative field — from science and engineering to business and social sciences. This tool combines the scatter plot, correlation coefficient, regression line, residual analysis, and outlier detection into a single interactive experience.

It is ideal for students learning statistics, professionals doing quick data explorations, and anyone who wants to check the strength of a relationship between two variables without opening a spreadsheet or writing code.

How to Use This Calculator

Enter data points as x,y pairs separated by semicolons (e.g., 1,2;3,4;5,6).
Use presets to load example datasets for different correlation patterns.
Toggle the regression line on or off to compare visual impressions.
Read the output cards for r, R², the regression equation, slope, and standard error.
Examine the scatter plot for patterns, clusters, and outliers (red dots).
Review the residuals table to see how each point deviates from the fit.
Check summary statistics for descriptive measures of X and Y.

Formula

Slope: m = Σ(xᵢ−x̄)(yᵢ−ȳ) / Σ(xᵢ−x̄)². Intercept: b = ȳ − m·x̄. Pearson r = Sxy / √(Sxx·Syy). R² = r². Standard error: SE = √(SSE/(n−2)).

Example Calculation

Result: r = 0.9863, R² = 0.9728, y = 0.9879x + 0.6121

A very strong positive linear relationship — about 97% of the variance in Y is explained by X.

Tips & Best Practices

Always look at the scatter plot before trusting the correlation — Anscombe's quartet shows why.
A high R² does not imply causation — it only measures linear association.
Check the residuals for patterns (curves, fans) that indicate the linear model is inadequate.
Outliers can drastically affect r and slope — try removing them to see the impact.
For prediction, only interpolate within the range of your data — extrapolation is risky.
Enter data with semicolons between pairs: "x1,y1;x2,y2;x3,y3".

Understanding Correlation Strength

The absolute value of r indicates strength: |r| > 0.9 is very strong, 0.7–0.9 is strong, 0.5–0.7 is moderate, 0.3–0.5 is weak, and < 0.3 is very weak or no linear relationship. However, even a moderate r can be practically significant in some fields (e.g., psychology often considers r = 0.3 meaningful), while a high r can be trivial if the variables are measured redundantly.

Anscombe's Quartet

In 1973, Francis Anscombe constructed four datasets with nearly identical summary statistics (mean, variance, r, regression line) but wildly different scatter plots — one has a clear non-linear pattern, one has an outlier, and one is perfectly linear except for one point. The lesson: never skip the scatter plot. This tool makes plotting so easy that there's no excuse for relying on numbers alone.

Beyond Simple Regression

Simple linear regression (one predictor, one response) is the foundation, but real analysis often involves multiple regression (many predictors), polynomial regression (curved fits), logistic regression (binary outcomes), or machine learning models. This tool covers the foundational case; understanding it well is essential before tackling more complex methods.

Frequently Asked Questions

What does the Pearson correlation r tell me?

r ranges from −1 to +1. Values near ±1 indicate a strong linear relationship; 0 means no linear relationship. It doesn't capture non-linear patterns.

What is R²?

The coefficient of determination. R² = 0.85 means 85% of the variance in Y is explained by X. It equals the square of the correlation coefficient for simple linear regression.

What is a residual?

The difference between an observed y-value and the predicted ŷ from the regression line. Residual = y − ŷ. Ideally, residuals are randomly scattered around zero.

How are outliers detected?

Points with residuals greater than 2 standard errors from the regression line are flagged as potential outliers. This is a simple rule of thumb; more rigorous methods exist.

Can I use this for non-linear data?

This tool fits a linear model. If your data is curved, the linear regression will be a poor fit (low R²). Consider transforming your data (log, square root) or using polynomial regression for non-linear patterns.

How many data points do I need?

At least 2 are required mathematically, but meaningful correlation analysis needs 10+ points. With very few points, random patterns can produce misleadingly high r values.