Measures of Central Tendency

Averages or measures of central tendency are single values that approximate the central value of the numbers in a univariable dataset (list of single numbers as opposed to a list of pairs, for example). The three most common measures of central tendency (informally said as averages) are given by the meanmedian and mode.


Non-Grouped Data

The best way to illustrate the differences between the mean, median and mode for non-grouped data is to consider a given set of numbers. For example, consider the following dataset: 3,5,1,4,5

Mean

The mean is found by adding all of the data values together and dividing by how many there are. For the set of numbers given above the mean is given by: 3+5+1+4+55=3.6. The mean of the set is 3.6.

The mean is often denoted \bar{x}=\frac{\sum x}{n}, i.e the sum of the data points (x) divided by how many there are (n).

Median

The median is the middle data entry once all of the data values have been put into ascending order: 1, 3, 4, 5, 5

The middle data point is 4 and so the median of the set is 4. See the example below for when there is an even number of data entries.

Mode

The mode is the most common number in the given set of numbers. For the set above, the most common number is 5, hence, the mode is 5.


These averages may also be found in a similar way if the data is given in a frequency table. See the example directly below.

Grouped Data

Finding an estimate for the mean and the median and modal classes is demonstrated for the following grouped dataset. This dataset shows the number of people in a given age range that work at a particular supermarket:

Age (yrs)16-2526-3536-4546-5556-6566-75
Frequency691015128

Mean

An estimate for the mean can be found by approximating the combined age of all employees divided by how many there are. Since we donโ€™t know the exact age, we estimate by using midpoints (see note below). The combined age is: 6ร—21+9ร—31+10ร—41+15ร—51+12ร—61+8ร—71=2880 Hence the mean is estimated as 2880/60=48.

Median

The total number of people in the dataset is 60, half of which is 30, and they are already listed in order. There are 25 people in the first three age intervals and 40 in the first four intervals. It follows that the median class is 46-55 years.

Interpolation may be used to find a more definitive approximation for the median.

Mode

For grouped data, rather than the mode which is a single number, we are looking for the most common interval or class. This is known as the modal class. In this example, the modal class is also 46-55 years โ€“ the largest number of people fall within this age bracket.

NOTE: the midpoint of the interval 16-25 is 21, for example, since it includes all ages right up until the day before someoneโ€™s 26th birthday.


Each of the measures of central tendency have their own advantages/disadvantages. The mean uses all of the data but can be skewed by extreme values. The median avoids this skewing but doesnโ€™t necessarily give a true measure of the data as only one datapoint is used. The mode can be useful if many data points take the same value but not particularly useful when frequencies are low or nearly all the same.

In addition to measures of central tendency that indicate the central value of a dataset, there are also measures of variation that indicate how spread out a dataset is. Standard deviation is the measure of spread that pairs naturally with the mean. Interquartile Range, on the other hand, goes with the median. There is no measure of spread that naturally pairs with the mode.

Examples

The following table shows the number of eggs laid by a single chicken each day over a given time period.

No. of eggs012
Frequency15x3

1. Given that the daily average throughout this time was 0.6, find x.
2. Find the median and mode.

Solution:

1. The mean is given by \bar{x}=\frac{0\times 15+1\times x+2\times 3}{15+x+3}=0.6 or x+6=0.6(18+x). Solving for x gives x=12.

2. It follows that the chicken lays 0 eggs on 15 days and either 1 or 2 eggs on the other 15 days. Listed in order, it can be seen that the 15th and 16th measurements are 0 and 1 respectively. Hence, the median is halfway between at 0.5. The mode is 0.

The following table shows the weekly wages (to the nearest pound) earned by 17 employees in a given office:

Wage (ยฃ)251-275276-300301-350351-400401-450451-600601-650
Frequency2433401

1. Find an estimate for the mean weekly wage amongst these employees. Explain why it is an estimate.
2. Find the median and modal classes. What is the best average to use to describe central tendency in this example?

Solution:
1. As above, we use the midpoints to approximate the mean. An estimate for the total salaries is given by:

263\times 2+288\times 4 + 325.5\times 3+ 375.5\times 3+425.5\times 4+525.5\times 0 +625.5\times 1=6108.5
Hence, it follows that the mean weekly wage is given by \bar{x}=\frac{\sum x}{n}=\frac{6108.5}{17}=£ 359.32. This is an estimate because we have used mid-points. We are unsure of the exact weekly wage of each of the employees and so we assume that each of the employees in the ยฃ251-ยฃ275 bracket, for example, each get a weekly wage of the midpoint ยฃ263. In reality, these wages could be higher or lower and hence the mean calculation, thought it might be close, is not exact.

2. The 9th person is in the ยฃ301-ยฃ350 weekly wage bracket and so this is the median class. The modal classes are ยฃ276-ยฃ300 and ยฃ401-ยฃ450 as these intervals both have the highest frequency. For this example, the best average would be the median class. This is because there is a high wage skewing the mean upwards and since there are two modal classes, this doesnโ€™t really suggest a central tendency.