PEARSON function
The PEARSON function in Excel is used to calculate the Pearson correlation coefficient between two sets of data. The Pearson correlation coefficient (denoted as ) measures the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, where:
- 1 indicates a perfect positive linear relationship,
- -1 indicates a perfect negative linear relationship,
- 0 indicates no linear relationship.
Syntax:
PEARSON(array1, array2)
Arguments:
- array1: Required. The first set of data values (the independent variable).
- array2: Required. The second set of data values (the dependent variable).
Both arrays must have the same number of data points, and the data points should correspond to each other (i.e., the -th data point of array1 should be paired with the -th data point of array2).
Output:
The function returns a single numeric value, which is the Pearson correlation coefficient between the two sets of data.
How It Works:
- The Pearson correlation coefficient is calculated using the following formula: Where:
- and are the individual data points from
array1andarray2, - and are the means of
array1andarray2, respectively.
- and are the individual data points from
Example 1: Finding the Pearson Correlation Coefficient
Suppose you have two sets of data:
array1: [1, 2, 3, 4, 5]array2: [2, 4, 6, 8, 10]
To calculate the Pearson correlation coefficient between these two arrays:
Use the formula:
=PEARSON(A1:A5, B1:B5)
Where A1:A5 contains the data from array1 and B1:B5 contains the data from array2.
Since these two arrays have a perfect positive linear relationship, the result will be 1.
Example 2: Negative Correlation
Suppose you have two sets of data:
array1: [1, 2, 3, 4, 5]array2: [10, 8, 6, 4, 2]
Use the formula:
=PEARSON(A1:A5, B1:B5)
The result will be -1, indicating a perfect negative linear relationship.
Example 3: No Correlation
Suppose you have two sets of data:
array1: [1, 2, 3, 4, 5]array2: [2, 3, 1, 5, 4]
Use the formula:
=PEARSON(A1:A5, B1:B5)
The result will be a value closer to 0, indicating that there is no linear correlation between these two sets of data.
Key Points:
- The Pearson correlation coefficient quantifies the linear relationship between two variables.
- 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship.
- The function is only appropriate for detecting linear relationships; it does not capture non-linear relationships.
Use Cases:
- Statistics: Determine the strength and direction of the relationship between two variables.
- Finance: Analyze the correlation between two financial assets (e.g., stock prices or returns).
- Research: Identify correlations between variables in scientific and social science studies.
- Quality control: Measure the correlation between variables such as input factors and output results.
Notes:
- The
PEARSONfunction works only for linear correlations. It does not assess non-linear relationships. - The data arrays must have the same number of data points. If they don’t, the function will return an error.
- If either of the arrays has constant values (i.e., no variation), the function will return a #DIV/0! error, as the correlation cannot be calculated for constant data.