## Interpolation

It was mentioned on the Measures of Variation page that the quartiles and percentiles for grouped continuous data can be found using linear interpolation (or simply interpolation). We identify the quartiles according to the following:

• ${\text{Q1}} =\frac{n}{4}{\text{th value}}$
• ${\text{Q2}} =\frac{n}{2}{\text{th value}}$
• ${\text{Q3}} =\frac{3n}{4}{\text{th value}}$

We assume that the intervals are evenly distributed and so we split them evenly according to the frequency in each interval.

We use the following example for illustration. The following table shows the prices in millions of pounds of some houses in a given postcode:

 House Price, P (£m) Frequency $0.2\leq P\hspace{2pt}\textless\hspace{2pt} 0.3$ $21$ $0.3\leq P\hspace{2pt}\textless\hspace{2pt} 0.4$ $37$ $0.4\leq P\hspace{2pt}\textless\hspace{2pt} 0.5$ $24$ $0.5\leq P\hspace{2pt}\textless \hspace{2pt}0.6$ $11$ $0.6\leq P\hspace{2pt}\textless\hspace{2pt} 0.7$ $7$

Hence, the quartiles are identified as:

• ${\text{Q1}} =\frac{100}{4}{\text{th value = 25th value}}$ – the 25th value is the 4th value in the second interval. Split the interval evenly between 37 values and let the first value be 0.3. Q1 is then given by 4th value: $0.3+3\times \frac{0.1}{37}\approx 0.3081$. Hence, the lower quartile is £308,100.
• ${\text{Q2}} =\frac{100}{2}{\text{th value = 50th value}}$ – the 50th value is the 29th value in the second interval. Q2 is given by: $0.3+28\times \frac{0.1}{37}\approx 0.3757$. Hence, the median is £375,700.
• ${\text{Q3}} =\frac{300}{4}{\text{th value = 75th value}}$ – the 75th value is the 17th value in the third interval. Split the interval evenly between 24 values and let the first value be 0.4. Q3 is then given by 17th value: $0.4+16\times \frac{0.1}{24}\approx 0.4667$. Hence, the upper quartile is £466,700.

The percentiles can be found in a similar way. For example, $P_{5}$ and $P_{95}$ can be found by locating the 5th and 95th values. The 5th and 95th values are $0.2+4\times \frac{0.1}{21}\approx 0.2190$ and $0.6+1\times \frac{0.1}{7}\approx 0.6413$ respectively. Hence $P_{5}=\pounds 219,000$ and $P_{95}=\pounds 641,300$

If there were 70 values $P_{10}$ and $P_{90}$ would be found from the 7th and 63rd values, for example.

Note that, for listed data, the quartiles are found in a more specific way. This is because we know all of the data points exactly. For grouped data, we don’t them exactly and we obtain estimates hence why we find them according to the bullet points above. Also note that when we estimate the mean for grouped data, splitting the interval evenly according to the frequency is exactly the same as assigning the midpoint to each member (it balances out).

## Coding

Used to make it easier to handle values. Changing back to original mean and standard deviation inc. formulae. (could be given coefficients or not – two examples?)