What is Correlation?
In statistics, correlation measures the strength of a linear relationship between two variables. In other words, if the two variables were plotted against each other, correlation would measure how close they come to making a straight line.
Chances are you would have been asked to draw a line of best fit on a scatter diagram before. Most of the time the scatter diagram does not give an exact straight line – there are usually gaps between the line and the plotted points.
Strong and negative
Weak and positive
If there are a lot of large gaps the correlation is said to be weak. On the other hand, if there are few gaps and they are small, the correlation is said to be strong. Note that this does not indicate whether the linear relationship is positive or negative. Particularly, the correlation is positive if the line of best fit has a positive gradient. Conversely, the correlation is negative if the line of best fit has a negative gradient. Note that it says nothing about how steep the lines of best fit are. Click here for more examples.
We calculate a measure of correlation from the product moment correlation coefficient or the PMCC. The PMCC is a number between 1 and -1. Firstly, if the PMCC is positive and close to 1, this is a good indication that there is a strong positive relationship between the two variables. Secondly, if the PMCC is close to -1, this is a good indication that the relationship between the two variables is negative and strong. Finally, if the PMCC is close to 0, this is an indication that there is no relationship between the two variables.
Occasionally, there is a point that seems to be unusually far away from the line of best fit. This might indicate that there is an outlier in the original data. Click here to find out more about outliers.