Measures of Central Tendency
Averages or measures of central tendency are single values that approximate the central value of the numbers in a univariable dataset (list of single numbers as opposed to a list of pairs, for example). The three most common measures of central tendency (informally said as averages) are given by the mean, median and mode.
Non-Grouped Data
The best way to illustrate the differences between the mean, median and mode for non-grouped data is to consider a given set of numbers. For example, consider the following dataset: 3,5,1,4,5
Mean
The mean is found by adding all of the data values together and dividing by how many there are. For the set of numbers given above the mean is given by: 3+5+1+4+55=3.6. The mean of the set is 3.6.
The mean is often denoted , i.e the sum of the data points (x) divided by how many there are (n).
Median
The median is the middle data entry once all of the data values have been put into ascending order: 1, 3, 4, 5, 5
The middle data point is 4 and so the median of the set is 4. See the example below for when there is an even number of data entries.
Mode
The mode is the most common number in the given set of numbers. For the set above, the most common number is 5, hence, the mode is 5.
These averages may also be found in a similar way if the data is given in a frequency table. See the example directly below.
Grouped Data
Finding an estimate for the mean and the median and modal classes is demonstrated for the following grouped dataset. This dataset shows the number of people in a given age range that work at a particular supermarket:
Age (yrs) | 16-25 | 26-35 | 36-45 | 46-55 | 56-65 | 66-75 |
Frequency | 6 | 9 | 10 | 15 | 12 | 8 |
Mean
An estimate for the mean can be found by approximating the combined age of all employees divided by how many there are. Since we donโt know the exact age, we estimate by using midpoints (see note below). The combined age is: 6ร21+9ร31+10ร41+15ร51+12ร61+8ร71=2880 Hence the mean is estimated as 2880/60=48.
Median
The total number of people in the dataset is 60, half of which is 30, and they are already listed in order. There are 25 people in the first three age intervals and 40 in the first four intervals. It follows that the median class is 46-55 years.
Interpolation may be used to find a more definitive approximation for the median.
Mode
For grouped data, rather than the mode which is a single number, we are looking for the most common interval or class. This is known as the modal class. In this example, the modal class is also 46-55 years โ the largest number of people fall within this age bracket.
NOTE: the midpoint of the interval 16-25 is 21, for example, since it includes all ages right up until the day before someoneโs 26th birthday.
Each of the measures of central tendency have their own advantages/disadvantages. The mean uses all of the data but can be skewed by extreme values. The median avoids this skewing but doesnโt necessarily give a true measure of the data as only one datapoint is used. The mode can be useful if many data points take the same value but not particularly useful when frequencies are low or nearly all the same.
In addition to measures of central tendency that indicate the central value of a dataset, there are also measures of variation that indicate how spread out a dataset is. Standard deviation is the measure of spread that pairs naturally with the mean. Interquartile Range, on the other hand, goes with the median. There is no measure of spread that naturally pairs with the mode.