RSQ function
The RSQ function in Excel calculates the coefficient of determination () for the relationship between two sets of data. This statistic is a measure of how well the variation in one dataset can be explained by the variation in another dataset, often used in regression analysis.
Syntax
=RSQ(known_y's, known_x's)
Parameters
- known_y’s (required): The dependent data points (Y-values).
- known_x’s (required): The independent data points (X-values).
Key Points
- (the coefficient of determination) is the square of the correlation coefficient (r).
- Values range from 0 to 1:
- 0: No relationship between the variables.
- 1: Perfect linear relationship.
- Both arrays (known_y’s and known_x’s) must have the same length and contain numeric values.
- Non-numeric values, empty cells, or logical values in the arrays are ignored.
Calculation Formula
The value is calculated as:
Where:
- SSreg: Regression sum of squares (explained variance).
- SStotal: Total sum of squares (total variance).
Alternatively, is the square of the Pearson correlation coefficient ():
Examples
1. Simple Calculation
Suppose you have the following datasets:
- Y-values (dependent) in A1:A5:
{10, 20, 30, 40, 50} - X-values (independent) in B1:B5:
{1, 2, 3, 4, 5}
To calculate :
=RSQ(A1:A5, B1:B5)
Result: 1
- This indicates a perfect linear relationship between X and Y.
2. Non-Perfect Relationship
For Y-values {3, 7, 8, 12, 15} in C1:C5 and X-values {1, 2, 3, 4, 5} in D1:D5:
=RSQ(C1:C5, D1:D5)
Result: 0.986 (approximately)
- This indicates a strong, but not perfect, linear relationship.
3. Handling Unequal Data
If Y-values {10, 20, 30} and X-values {1, 2, 3, 4}, Excel will return a #N/A error because the arrays are of unequal length.
4. Data with Noise
For Y-values {5, 15, 25, 35, 50} and X-values {1, 2, 3, 4, 5}:
=RSQ(E1:E5, F1:F5)
Result: A value less than 1, reflecting the presence of noise in the data.
Notes
- Error Handling:
- If arrays have different lengths, Excel returns a
#N/Aerror. - If the arrays are empty or contain fewer than two data points, Excel returns a
#DIV/0!error.
- If arrays have different lengths, Excel returns a
- Use Cases:
- Assessing the goodness-of-fit for linear regression.
- Evaluating the strength of the relationship between two variables.
- Compatible with Charts: Use with scatter plots and trendlines to display the value visually.
Related Functions
CORREL: Calculates the correlation coefficient ().LINEST: Provides a more detailed linear regression analysis.SLOPE: Returns the slope of the regression line.INTERCEPT: Returns the intercept of the regression line.FORECAST.LINEAR: Predicts future values based on a linear trend.
The RSQ function is particularly helpful for determining how well one variable predicts another, making it a key tool in statistical and predictive analysis.