Measures of Variation
In addition to measures of central tendency that indicate the central value of a dataset, there are also measures of variation that indicate how spread out a dataset is. Measures of variation are also called measures of spread or dispersion for this reason. Standard deviation and variance are measures of spread that pair naturally with the mean. Range, interquartile and percentile ranges, on the other hand, go with the median. There is no measure of spread that naturally pairs with the mode.
Variance and Standard Deviation
Variance and standard deviation (which is the square root of variance) make use of how far each data point is away from the mean.
Ungrouped Data
Essentially, the variance is the average of the differences squared (squared so that positive numbers are added together) and is given by:
The second formula can be thought of as ‘the mean of the datapoints squared subtract the square of the datapoints’ mean‘. The variance is often written in terms of a summary statistic where the summary statistic is given by
This summary statistic is given in the formula booklet for the Edexcel A-Level Maths syllabus.
The standard deviation is hence given by
When calculating variance, the differences are squared (so that positive numbers are added together) and so may suggest an unnatural measure of spread. Since standard deviation is the square root of variance, it gives a natural measure of spread and also has the same units as the original datapoints.
Grouped Data
Estimates for the variance and standard deviation can be obtained using midpoints when grouped data is given. This is very much like when estimating the mean for grouped data. The formulae change slightly to include the frequencies (f) and the x are now the midpoints of the intervals:
These formulae also apply to ungrouped data that is given in frequency tables.
Ranges
Range
It is likely that you have heard of the range before, it is simply given as follows:
Evidently, the range is affected by extreme values.
Quartiles and Percentiles
Similarly to the median, also known as Quartile 2 , Quartile 1 and Quartile 3 indicate where the first 25% and 75% of the data values lie respectively. All three quartiles can be seen on a box plot. See below for how to find quartiles. Similarly, percentiles indicate where any percentage of the data points lie. For example, , represents the value that 10% of the data points are less than. Percentiles can be identified on cumulative frequency diagrams (see the exam-style example here) or by using interpolation.
Finding the quartiles for discrete data: In some examples, such as Example 1 on the box plots page, it is easy to locate the quartiles after listing the data in order. Otherwise, in general, Q1 can be found by dividing the number of data points n by 4. If this is a whole number, then Q1 is halfway between this data point and the one above. If n/4 is not a whole number, round up to the next whole number and take this datapoint as Q1. For Q3, do the same thing but for 3n/4.
Finding the quartiles for grouped continuous data: similarly to finding a mean for grouped data, the quartiles for grouped data will be estimates. Q1, Q2 and Q3 can be found by estimating the (n/4)th, (n/2)th and (3n/4)th values respectively. This can be done using interpolation. Note that there are different conventions for finding quartiles and so answers within a certain interval are usually accepted.
Interquartile and Interpercentile Ranges
As well as the range which is calculated by subtracting the smallest from the largest data value, there are the interquartile and interpercentile ranges. The interquartile range (IQR) is given by
The IQR is not affected by extreme values but it does only take into account the spread of the middle 50% of data.
There are multiple interpercentile ranges, for example, , shows the spread of the middle 80% of data. Since this considers more data than the IQR and still isn’t affected by extreme values, it is often favoured over the IQR.