Calculate the best-fit line (OLS), slope, intercept, R², correlation, residuals, and prediction confidence intervals from X/Y data points.
Linear regression is the foundation of predictive analytics — and our calculator makes it accessible to everyone. Enter your X and Y data points, and instantly get the best-fit regression line, slope, intercept, R² (coefficient of determination), Pearson correlation, standard error, and full residual analysis.
The calculator uses ordinary least squares (OLS) to minimize the sum of squared residuals, producing the mathematically optimal straight line through your data. Beyond the equation, it provides a prediction tool: enter any X value to get the predicted Y with confidence intervals (90%, 95%, or 99%).
Preset datasets let you explore real-world relationships — study hours vs grades, age vs income, ad spend vs revenue. The residuals table shows how far each data point deviates from the line, and the R² interpretation guide helps you assess the practical strength of your model. It is a quick way to move from raw paired data to a readable model summary without leaving basic statistical diagnostics behind.
Understanding relationships between variables is central to data-driven decision-making. Does increased ad spend actually boost revenue? Do more study hours improve grades? Linear regression quantifies these relationships with mathematical precision.
This calculator eliminates the need for Excel or statistical software for basic regression tasks. The complete output — equation, coefficients, diagnostics, predictions — provides everything needed for reports, homework, and quick analyses.
Slope b₁ = (n·ΣXY − ΣX·ΣY) / (n·ΣX² − (ΣX)²). Intercept b₀ = Ȳ − b₁·X̄. R² = 1 − SS_res/SS_tot. r = ±√R². Standard Error = √(SS_res/(n−2)).
Result: Y = 1.96X + 0.10, R² = 0.9993, r = 0.9997
Near-perfect linear relationship. Each unit increase in X predicts a 1.96 increase in Y. R² of 0.9993 means 99.93% of Y's variance is explained by X.
OLS regression finds slope b₁ and intercept b₀ that minimize Σ(yᵢ − ŷᵢ)², the sum of squared residuals. The closed-form solution yields b₁ = (n·ΣXY − ΣX·ΣY)/(n·ΣX² − (ΣX)²) and b₀ = Ȳ − b₁·X̄. This is computationally efficient and produces the unique global minimum for linear models.
The choice to minimize squared (not absolute) residuals gives OLS desirable statistical properties: unbiased estimates, minimum variance among linear estimators (Gauss-Markov theorem), and equivalence to Maximum Likelihood Estimation under normally distributed errors.
Linear regression assumes: (1) Linear relationship between X and Y, (2) Independent observations, (3) Homoscedasticity (constant residual variance), (4) Normally distributed residuals. Violations don't invalidate the regression but affect confidence intervals and hypothesis tests. The residuals table helps diagnose issues — look for patterns, increasing spread, or extreme outliers.
This calculator handles simple (one X) linear regression. Real-world problems often involve multiple predictors (multiple regression): Y = b₀ + b₁X₁ + b₂X₂ + ... + bₖXₖ. The concept is identical — OLS minimizes squared residuals — but the math uses matrix algebra. Tools like R, Python, and Excel handle multiple regression; our calculator provides the foundational understanding.
R² measures how much of Y's variation is explained by the model. 0.90+ is very strong (common in physical sciences). In social sciences, 0.30-0.50 is typical and often useful. There's no universal "good" value — it depends on your field.
r (Pearson correlation) measures the direction and strength of linear association (−1 to +1). R² is r² and measures the proportion of variance explained (0 to 1). R² doesn't indicate direction; r does.
When the relationship is clearly nonlinear (try polynomial or exponential regression). When there are significant outliers (use robust regression). When X and Y aren't truly related (spurious correlation). When data isn't independent (time series need special methods).
It's the average distance data points fall from the regression line, in Y units. Smaller SE means tighter fit. Roughly 68% of data falls within ±1 SE of the line, 95% within ±2 SE.
Minimum 3 for any meaningful regression, but 20-30+ is recommended for reliable R² and confidence intervals. With just 2 points, R² is always 1.0 (a line always passes through 2 points).
Residuals = actual Y − predicted Y. They should be randomly scattered around zero. Patterns in residuals (curves, fans, clusters) suggest the linear model is inadequate and you may need a different approach.