Calculate covariance, Pearson correlation, R², and regression line for paired data. Includes scatter plot, cross-product table, and correlation gauge.
The covariance calculator measures how two variables change together. Positive covariance means they increase together; negative means one decreases when the other increases. This tool computes both the covariance and its standardized version — the Pearson correlation coefficient (r) — along with R², regression parameters, and a visual scatter plot.
Understanding covariance is essential for portfolio diversification in finance, feature selection in machine learning, and hypothesis testing in science. This calculator makes the computation transparent with a step-by-step cross-product deviations table, a color-coded correlation gauge, and an SVG scatter plot with regression line.
Enter paired X and Y values, select population or sample, and get the complete bivariate analysis with interpretation. Check the example with realistic values before reporting. Use the steps shown to verify rounding and units. Cross-check this output using a known reference case. Use the example pattern when troubleshooting unexpected results. Validate that outputs match your chosen standards.
Understanding how two variables relate is fundamental to data analysis. This calculator provides the complete bivariate toolkit: covariance for direction/magnitude, Pearson r for standardized strength, R² for explained variance, regression for prediction, and a scatter plot for visual validation.
Whether you're diversifying a portfolio, running a scientific experiment, or building a predictive model, covariance analysis is your starting point.
Cov(X,Y) = Σ(xᵢ − x̄)(yᵢ − ȳ) / (n−1). Pearson r = Cov(X,Y) / (sₓ × sᵧ). R² = r². Regression: y = a + bx where b = Cov(X,Y) / sₓ² and a = ȳ − b × x̄.
Result: Covariance = 122.14, r = 0.997
Height (X) and weight (Y) have a very strong positive correlation (r = 0.997). The covariance of 122.14 indicates they increase together, with R² = 0.994 meaning 99.4% of weight variance is explained by height in this sample. Regression line: y = −165.8 + 1.31x.
Harry Markowitz's Modern Portfolio Theory uses covariance matrices to quantify diversification. If two assets have negative covariance, combining them reduces portfolio risk. The optimal portfolio minimizes variance for a given expected return — all based on the covariance structure of the assets.
Principal Component Analysis (PCA) begins by computing the covariance matrix of all variables, then finds the eigenvectors (principal components) that capture the most variance. The first principal component points in the direction of maximum covariance. This technique powers dimensionality reduction in machine learning.
Pearson covariance/correlation is sensitive to outliers. Alternatives include: Spearman rank correlation (based on ranks, not values), Kendall tau (based on concordant/discordant pairs), and the Minimum Covariance Determinant estimator. For non-linear relationships, consider mutual information or distance correlation.
Covariance measures the direction (positive/negative) and magnitude of the linear relationship, but its value depends on the scales of X and Y. Correlation standardizes covariance by dividing by the product of standard deviations, giving a dimensionless value between −1 and +1. Use correlation to compare relationships across different datasets.
R² (coefficient of determination) is the proportion of variance in Y explained by the linear relationship with X. An R² of 0.80 means 80% of the variation in Y can be predicted from X. The remaining 20% is unexplained variance (noise, other factors, or non-linear effects).
Yes — zero covariance means no linear relationship between X and Y. However, there could still be a non-linear relationship (like a U-shape or circle). Always check the scatter plot. Note that independent variables always have zero covariance, but zero covariance does not guarantee independence.
Use sample covariance (n−1 denominator) in almost all cases — whenever your data is a subset of a larger population. Use population covariance (n denominator) only when you have data for the entire population. The difference matters most for small sample sizes.
In finance, covariance between asset returns determines portfolio diversification benefits (Markowitz theory). In machine learning, the covariance matrix drives PCA (principal component analysis). In science, it quantifies how two measurements relate. In quality control, it helps identify related process variables.
The regression line y = a + bx is the best-fit straight line through the scatter plot (minimizing squared vertical distances). The slope (b) tells you: for each 1-unit increase in X, Y changes by b units. The intercept (a) is the predicted Y when X = 0.