How to Find the Correlation Coefficient

Assemble your data., Calculate the mean of x., Find the mean of y., Determine the standard deviation of x., Calculate the standard deviation of y., Review the basic formula for finding a correlation coefficient., Find the correlation coefficient...

8 Steps 4 min read Medium

Step-by-Step Guide

  1. Step 1: Assemble your data.

    To begin calculating a correlation efficient, first examine your data pairs.

    It is helpful to put them in a table, either vertically or horizontally.

    Label each row or column x and y.For example, suppose you have four data pairs for x and y.

    Your table may look like this: x || y 1 || 1 2 || 3 4 || 5 5 || 7
  2. Step 2: Calculate the mean of x.

    In order to calculate the mean, you must add all the values of x, then divide by the number of values.Using the example above, note that you have four values for x.

    To calculate the mean, add all the values given for x, then divide by
    4.

    Your calculation would look like this: μx=(1+2+4+5)/4{\displaystyle \mu _{x}=(1+2+4+5)/4} μx=12/4{\displaystyle \mu _{x}=12/4} μx=3{\displaystyle \mu _{x}=3} , To find the mean of y, follow the same steps, adding all the values of y together, then dividing by the number of values.In the example above, you also have four values for y.

    Add all these values, then divide by
    4.

    Your calculations would look like this: μy=(1+3+5+7)/4{\displaystyle \mu _{y}=(1+3+5+7)/4} μy=16/4{\displaystyle \mu _{y}=16/4} μy=4{\displaystyle \mu _{y}=4} , Once you have your means, you can calculate standard deviation.

    To do so, use the formula:σx=1n−1Σ(x−μx)2{\displaystyle \sigma _{x}={\sqrt {{\frac {1}{n-1}}\Sigma (x-\mu _{x})^{2}}}} With the sample data, your calculations should look like this: σx=14−1∗((1−3)2+(2−3)2+(4−3)2+(5−3)2){\displaystyle \sigma _{x}={\sqrt {{\frac {1}{4-1}}*((1-3)^{2}+(2-3)^{2}+(4-3)^{2}+(5-3)^{2})}}} σx=13∗(4+1+1+4){\displaystyle \sigma _{x}={\sqrt {{\frac {1}{3}}*(4+1+1+4)}}} σx=13∗(10){\displaystyle \sigma _{x}={\sqrt {{\frac {1}{3}}*(10)}}} σx=103{\displaystyle \sigma _{x}={\sqrt {\frac {10}{3}}}} σx=1.83{\displaystyle \sigma _{x}=1.83} , Using the same basic steps, find the standard deviation of y.

    You will use the same formula, using the y data points.With the sample data, your calculations should look like this: σy=14−1∗((1−4)2+(3−4)2+(5−4)2+(7−4)2){\displaystyle \sigma _{y}={\sqrt {{\frac {1}{4-1}}*((1-4)^{2}+(3-4)^{2}+(5-4)^{2}+(7-4)^{2})}}} σy=13∗(9+1+1+9){\displaystyle \sigma _{y}={\sqrt {{\frac {1}{3}}*(9+1+1+9)}}} σy=13∗(20){\displaystyle \sigma _{y}={\sqrt {{\frac {1}{3}}*(20)}}} σy=203{\displaystyle \sigma _{y}={\sqrt {\frac {20}{3}}}} σy=2.58{\displaystyle \sigma _{y}=2.58} , The formula for calculating a correlation coefficient uses means, standard deviations, and the number of pairs in your data set (represented by n).

    The correlation coefficient itself is represented by the lower-case letter r or the lower-case Greek letter rho, ρ.

    For this article, you will use the formula known as the Pearson correlation coefficient, shown below:ρ=(1n−1)Σ(x−μxσx)∗(y−μyσy){\displaystyle \rho =\left({\frac {1}{n-1}}\right)\Sigma \left({\frac {x-\mu _{x}}{\sigma _{x}}}\right)*\left({\frac {y-\mu _{y}}{\sigma _{y}}}\right)} You may notice slight variations in the formula, here or in other texts.

    For example, some will use the Greek notation with rho and sigma, while others will use r and s.

    Some texts may show slightly different formulas; but they will be mathematically equivalent to this one. , You now have the means and standard deviations for your variables, so you can proceed to use the correlation coefficient formula.

    Remember that n represents the number of values you have.

    You have already worked out the other relevant information in the steps above.Using the sample data, you would enter your data in the correlation coefficient formula and calculate as follows: ρ=(1n−1)Σ(x−μxσx)∗(y−μyσy){\displaystyle \rho =\left({\frac {1}{n-1}}\right)\Sigma \left({\frac {x-\mu _{x}}{\sigma _{x}}}\right)*\left({\frac {y-\mu _{y}}{\sigma _{y}}}\right)} ρ=(13)∗{\displaystyle \rho =\left({\frac {1}{3}}\right)*}ρ=(13)∗(6+1+1+64.721){\displaystyle \rho =\left({\frac {1}{3}}\right)*\left({\frac {6+1+1+6}{4.721}}\right)} ρ=(13)∗2.965{\displaystyle \rho =\left({\frac {1}{3}}\right)*2.965} ρ=(2.9653){\displaystyle \rho =\left({\frac {2.965}{3}}\right)} ρ=0.988{\displaystyle \rho =0.988} , For this data set, the correlation coefficient is
    0.988.

    This number tells you two things about the data.

    Look at the sign of the number and the size of the number.Because the correlation coefficient is positive, you can say there is a positive correlation between the x-data and the y-data.

    This means that as the x values increase, you expect the y values to increase also.

    Because the correlation coefficient is very close to +1, the x-data and y-data are very closely connected.

    If you were to graph these points, you would see that they form a very good approximation of a straight line.
  3. Step 3: Find the mean of y.

  4. Step 4: Determine the standard deviation of x.

  5. Step 5: Calculate the standard deviation of y.

  6. Step 6: Review the basic formula for finding a correlation coefficient.

  7. Step 7: Find the correlation coefficient.

  8. Step 8: Interpret your result.

Detailed Guide

To begin calculating a correlation efficient, first examine your data pairs.

It is helpful to put them in a table, either vertically or horizontally.

Label each row or column x and y.For example, suppose you have four data pairs for x and y.

Your table may look like this: x || y 1 || 1 2 || 3 4 || 5 5 || 7

In order to calculate the mean, you must add all the values of x, then divide by the number of values.Using the example above, note that you have four values for x.

To calculate the mean, add all the values given for x, then divide by
4.

Your calculation would look like this: μx=(1+2+4+5)/4{\displaystyle \mu _{x}=(1+2+4+5)/4} μx=12/4{\displaystyle \mu _{x}=12/4} μx=3{\displaystyle \mu _{x}=3} , To find the mean of y, follow the same steps, adding all the values of y together, then dividing by the number of values.In the example above, you also have four values for y.

Add all these values, then divide by
4.

Your calculations would look like this: μy=(1+3+5+7)/4{\displaystyle \mu _{y}=(1+3+5+7)/4} μy=16/4{\displaystyle \mu _{y}=16/4} μy=4{\displaystyle \mu _{y}=4} , Once you have your means, you can calculate standard deviation.

To do so, use the formula:σx=1n−1Σ(x−μx)2{\displaystyle \sigma _{x}={\sqrt {{\frac {1}{n-1}}\Sigma (x-\mu _{x})^{2}}}} With the sample data, your calculations should look like this: σx=14−1∗((1−3)2+(2−3)2+(4−3)2+(5−3)2){\displaystyle \sigma _{x}={\sqrt {{\frac {1}{4-1}}*((1-3)^{2}+(2-3)^{2}+(4-3)^{2}+(5-3)^{2})}}} σx=13∗(4+1+1+4){\displaystyle \sigma _{x}={\sqrt {{\frac {1}{3}}*(4+1+1+4)}}} σx=13∗(10){\displaystyle \sigma _{x}={\sqrt {{\frac {1}{3}}*(10)}}} σx=103{\displaystyle \sigma _{x}={\sqrt {\frac {10}{3}}}} σx=1.83{\displaystyle \sigma _{x}=1.83} , Using the same basic steps, find the standard deviation of y.

You will use the same formula, using the y data points.With the sample data, your calculations should look like this: σy=14−1∗((1−4)2+(3−4)2+(5−4)2+(7−4)2){\displaystyle \sigma _{y}={\sqrt {{\frac {1}{4-1}}*((1-4)^{2}+(3-4)^{2}+(5-4)^{2}+(7-4)^{2})}}} σy=13∗(9+1+1+9){\displaystyle \sigma _{y}={\sqrt {{\frac {1}{3}}*(9+1+1+9)}}} σy=13∗(20){\displaystyle \sigma _{y}={\sqrt {{\frac {1}{3}}*(20)}}} σy=203{\displaystyle \sigma _{y}={\sqrt {\frac {20}{3}}}} σy=2.58{\displaystyle \sigma _{y}=2.58} , The formula for calculating a correlation coefficient uses means, standard deviations, and the number of pairs in your data set (represented by n).

The correlation coefficient itself is represented by the lower-case letter r or the lower-case Greek letter rho, ρ.

For this article, you will use the formula known as the Pearson correlation coefficient, shown below:ρ=(1n−1)Σ(x−μxσx)∗(y−μyσy){\displaystyle \rho =\left({\frac {1}{n-1}}\right)\Sigma \left({\frac {x-\mu _{x}}{\sigma _{x}}}\right)*\left({\frac {y-\mu _{y}}{\sigma _{y}}}\right)} You may notice slight variations in the formula, here or in other texts.

For example, some will use the Greek notation with rho and sigma, while others will use r and s.

Some texts may show slightly different formulas; but they will be mathematically equivalent to this one. , You now have the means and standard deviations for your variables, so you can proceed to use the correlation coefficient formula.

Remember that n represents the number of values you have.

You have already worked out the other relevant information in the steps above.Using the sample data, you would enter your data in the correlation coefficient formula and calculate as follows: ρ=(1n−1)Σ(x−μxσx)∗(y−μyσy){\displaystyle \rho =\left({\frac {1}{n-1}}\right)\Sigma \left({\frac {x-\mu _{x}}{\sigma _{x}}}\right)*\left({\frac {y-\mu _{y}}{\sigma _{y}}}\right)} ρ=(13)∗{\displaystyle \rho =\left({\frac {1}{3}}\right)*}ρ=(13)∗(6+1+1+64.721){\displaystyle \rho =\left({\frac {1}{3}}\right)*\left({\frac {6+1+1+6}{4.721}}\right)} ρ=(13)∗2.965{\displaystyle \rho =\left({\frac {1}{3}}\right)*2.965} ρ=(2.9653){\displaystyle \rho =\left({\frac {2.965}{3}}\right)} ρ=0.988{\displaystyle \rho =0.988} , For this data set, the correlation coefficient is
0.988.

This number tells you two things about the data.

Look at the sign of the number and the size of the number.Because the correlation coefficient is positive, you can say there is a positive correlation between the x-data and the y-data.

This means that as the x values increase, you expect the y values to increase also.

Because the correlation coefficient is very close to +1, the x-data and y-data are very closely connected.

If you were to graph these points, you would see that they form a very good approximation of a straight line.

About the Author

H

Helen Knight

Committed to making creative arts accessible and understandable for everyone.

48 articles
View all articles

Rate This Guide

--
Loading...
5
0
4
0
3
0
2
0
1
0

How helpful was this guide? Click to rate: