Scatter Diagrams

Bivariate data/Explanatory (Independent) v Response (Dependent) – TBC

What is Correlation?

In statistics, correlation measures the strength of a linear relationship between two variables. In other words, if the two variables were plotted against each other, correlation would measure how close they come to making a straight line.

Chances are you would have been asked to draw a line of best fit on a scatter diagram before. Most of the time the scatter diagram does not give an exact straight line – there are usually gaps between the line and the plotted points.


Strong and negative

Correlation does not imply causation – TBC


Weak and positive

If there are a lot of large gaps the correlation is said to be weak. On the other hand, if there are few gaps and they are small, the correlation is said to be strong. Note that this does not indicate whether the linear relationship is positive or negative. Particularly, the correlation is positive if the line of best fit has a positive gradient. Conversely, the correlation is negative if the line of best fit has a negative gradient. Note that it says nothing about how steep the lines of best fit are. Click here for more examples.

Occasionally, there is a point that seems to be unusually far away from the line of best fit. This might indicate that there is an outlier in the original data. Click here to find out more about outliers.


Use of interpolations and dangers of extrapolation – TBC