The Normal Distribution

Continuous Random Variables

Recall that in AS Maths you learned about discrete random variables and their probability distributions. For example, we saw that the number of heads resulting from ten coin tosses was a binomially distributed variable where each toss had a probability of success of \frac{1}{2}. That is to say, if we let X be the number of heads in ten coin tosses then X\sim B(10,\frac{1}{2}). X is a discrete random variable because it can only take on certain values and nothing in between. On the other hand, a continuous random variable can take on any value. Examples include weight and height – although these are often rounded when measured. Unlike a discrete random variable, the probability of a continuous random variable being an exact value is 0. We saw that the probability of tossing 4 heads out of 10, i.e. P(X=4), was 0.2051. However, the probability that a person has a weight of exactly 70kg is impossible – instead, for continuous variables, we specify a range to capture the exact weight and make the probability non-zero. For any given population, it is possible to collect the weights of everyone, record the frequency of weights occurring in all given intervals, and plot them in a histogram against relative frequency density. The total area of the bars of such a histogram will equate to 1. Narrowing the interval widths will smooth out the histogram until a probability density function is obtained where the interval widths are infinitely small. The area beneath a probability density function is equal to 1.

The Normal Distribution

Weight and height are examples of random variables that are normally distributed. This means that their probability density functions are bell-shaped – they have a smooth peak, they tail off in both directions and they are symmetrical. It can be shown that the probability density function is given by f(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-0.5\left(\frac{x-\mu}{\sigma}\right)^2} where \mu is the mean and \sigma^2 is the variance. Recall that the variance is a measure of spread away from the mean, and if \sigma^2 is the variance, then \sigma is the standard deviation. If a random variable X is normally distributed with mean \mu and variance \sigma^2, then we write X\sim N\left(\mu,\sigma^2\right). For a normally distributed variable, around 68% of the data lies within 1 standard deviation of the mean, 95% within 2 standard deviations and almost 100% of the data within 3 standard deviations:

Facts about the probability density function for a normally distributed variable:

  • For a normal distribution mean = median = mode and they all occur where the probability density function has its peak. The mode (or infinitely small interval that contains it) occurs with highest probability, i.e at the peak. Since the probability density function is symmetrical, the mean will be in the middle, as will the median, and so they all occur in the same place.
  • The standard deviation determines how wide the probability density function is. A narrow probability density function has a small standard deviation – values remain close to the mean, whereas a wide probability density function has a large standard deviation and the spread of the data is vast.
  • Recall that the points of inflection of a curve are the points where the curve changes from convex to concave or vice versa. The probability density function for a normal distribution has inflection points when x=\mu\pm\sigma as can be seen from the first figure.

We mentioned earlier that for X\sim N\left(\mu,\sigma^2\right), the probability density function cannot be used to calculate a probability for a single value of X but it can be used to find probabilities of certain ranges, i.e. the area beneath the curve between two X-values. There is a function on your calculator that calculates these probabilities using the normal cumulative distribution function – see Example 1. Conversely, given a probability, there is also a function on your calculator to find the X-values that correspond to those probabilities using the inverse normal distribution – see Example 2.

The Standard Normal Distribution

The standard normal distribution is the normal distribution that has mean 0 and variance 1. We often call a random variable with a standard normal distribution Z and so Z\sim N(0,1^2). Probabilities for the standard normal distribution only are included in the Edexcel Formula Booklet (where P(Z<a) is written as \Phi(a)) but they can also be calculated on your calculator as in Examples 1 and 2.

Any normal distribution can be converted to a standard normal distribution using the coding formula Z=\frac{X-\mu}{\sigma}. This is useful for finding missing statistical values such as \mu, \sigma or both where the table in the Edexcel Formula Booklet must be used – see Example 3. Recall that when we learned about coding in AS Maths, the mean of the coded variable Z would be \frac{\mu-\mu}{\sigma}=0 and the standard deviation would be \frac{\sigma}{\sigma}=1.

Approximating Binomial Distributions

We met the discrete binomial distribution in AS Maths and we saw above that histograms plotted against relative frequency density for discrete variables become probability density functions when the interval widths become infinitely small. In a similar way, the probabilities of a binomially distributed random variable X\sim B(n,p) begin to resemble a normal probability density function for large n and p\approx \frac{1}{2}n must be large enough to smooth out the probabilities and p must be approximately a half so that the curve is symmetrical.

It can be difficult to calculate binomial probabilities when n is large – for instance, the Edexcel Formula Booklet doesn’t give binomial probabilities for n> 50. In these cases, it is desirable to use a normal approximation where we can set \mu = np and \sigma=\sqrt{np(1-p)}. Note that since a binomial distribution is discrete and a normal distribution is continuous, a continuity correction is required – see Example 4.

Hypothesis Testing

In AS Maths we learned how to apply hypothesis testing for binomial probabilities. These concepts are extended in A2 Maths where we conduct hypothesis tests for correlation between two variables and we also learn how to conduct hypothesis tests on a sample from a normal distribution.

If a random variable X is normally distributed such that X\sim N\left(\mu.\sigma^2\right), then the mean of any sample taken from the population is also normally distributed with the same mean but with a scaled standard deviation, that is \bar{X}\sim N\left(\mu, \frac{\sigma^2}{n}\right), where \bar{X}=\frac{\sum x}{n} given the observed values x in the sample and n is the size of the sample. This is a result that follows from finding the variance of a transformed variable:

\text{Var}\left(\bar{X}\right)=\text{Var}\left(\frac{\sum x}{n}\right)=\frac{1}{n^2}\text{Var}\left(\sum x\right)=\frac{1}{n^2}\times n \text{Var}(X)=\frac{\sigma^2}{n}

See more on this. It is possible to perform hypothesis tests on the mean of a sample to make inferences on the population mean or find critical regions for a sample mean – see Example 5.


A random variable X is normally distributed with mean 10 and standard deviation 3, that is X\sim N\left(10,3^2\right).

a) Without a calculator, calculate:
i) P(X=10)
ii) P(X\leq 10)
iii) P(X>13)
iv) P(X\leq 16)
v) P(13<X<16)

b) With a calculator, calculate:
i) P(12.4<X<13.6)
ii) P(X<11.5)
iii) P(X\geq 8.7)
iv) P(X<9.2\text{ or }X>10.9)

i) P(X=10)=0. As seen in the notes above, the probability of a continuous random variable being an exact number is always zero.
ii) P(X\leq 10)=0.5. Since the mean is 10, and normal distributions are symmetric about the mean, the probability of being less than 10 (as well as greater than 10, in fact) is a half.
iii) P(X>13)=0.16. We saw in the notes that 68% of the data lies within one standard deviation of the mean. This means that 68% of the data lies between X=7 and X=13. Half of this, that is 16%, is above 13. Note that also P(X\geq 13)=0.16 since P(X=13)=0 so it does not matter if you include equality or not.
iv) P(X\leq 16)=0.975. Similarly to part iii), 95% of the data lies between two standard deviations of the mean. This means that 95% of the data lies between X=4 and X=16. Half of this, that is 2.5%, is above 16. Hence, 97.5% of the data lies below 16. Again, it does not matter if the inequality includes 16 or not.
v) P(13<X<16)=P(X<16)-P(X<13)=0.975-0.84=0.135. The probability that X is between 13 and 16 can be found by subtracting the probability that X is less than 13 from the probability that X is less than 16. The probability that X is less than 13 is 1-P(X>13)=1-0.16=0.84.

b) There are a number of ways to answer these questions. We shall use the CLASSWIZ Casio calculator (what is this calculator?). Press MENU and 7 to select probability distribution functions and then 2 for the normal cumulative distribution (normal CD) function.
i) Enter a value of 12.4 for the lower limit, 13.6 for the upper limit, 3 for the standard deviation and 10 for the mean. Pressing enter gives P(12.4<X<13.6)=0.09678 to 4 significant figures.
ii) Again, there are a number of ways to answer this question but we will use the CLASSWIZ Casio normal cumulative distribution function this time with a lower limit of 0 and an upper limit of 11.5 (press ‘=’ to get back to the normal CD function). This gives us P(10<X<11.5)=0.1915 to 4 decimal places. We add 0.5 since we know that P(X\leq 10)=0.5 and so P(X\leq 11.5)=0.6915 to 4 decimal places.
iii) We answer this one by choosing a lower bound of 8.7 and a very large upper bound say 10^{10}. This works since we know that most of the data is within 3 standard deviations of the mean. It follows that P(X\geq 8.7)=0.668 to 3 decimal places.
iv) The simplest way to answer this question is to do 1-P(9.2\leq X\leq 10.9). That is, P(X<9.2\text{ or }X>10.9)=1-0.223=0.777 to 3 decimal places.

Press MENU and 1 to go back to arithmetic mode.

The birth weight of babies in the US can be modelled with the normal distribution W\sim\left(7.5,1.25^2\right) where measurements are given in pounds (lbs).

a) Find the value of the weight w_1 given that P(W<w_1)=0.75.
b) Find the interquartile range of birth weights.
c) Given that 60% of birth weights are above w_2 lbs, find the value of w_2.
d) Find the value of w_3 such that P(w_3<W<8)=0.35.

For these questions we use the inverse normal distribution function on the CLASSWIZ calculator. Similarly for the normal CD function, we press MENU then 7, but this time we press 3 for the inverse normal function.
a) Enter 0.75 for the area, 1.25 for standard deviation and 7.5 for the mean. This gives a value of w_1=8.343 to 3 decimal places.
b) Press ‘=’ to get back to the inverse normal function and this time enter 0.25 for the area. This gives a lower quartile value of 6.657 to 3 decimal places. Press MENU and 1 to get back to arithmetic mode. The interquartile range is 8.343-6.657=1.686 to 3 decimal places.
c) If 60% of birth weights are above w_2, then 40% are below. Using the inverse normal function with an area of 0.4 gives w_2=7.183 to 3 decimal places.
d) We find that P(W<8)=0.655 to 3 decimal places using the normal CD function as in Example 1. It follows that P(W<w_3)=0.655-0.35=0.305. Using the inverse normal function this gives w_3=6.862 to 3 decimal places.

a) Intelligence Quota (IQ) can be modelled as a normally distributed random variable I\sim N\left(100,15^2\right). Find the minimum IQ of a person who is considered, according to IQ, to be in the top 10% intelligent people.

b) A particular brand of battery has a life modelled as a normal distribution with mean \mu hours and standard deviation \sigma hours. Given that 15% of batteries last fewer than 35 hours and 5% last longer than 52 hours, find the values of \mu and \sigma.

a) We can use the ‘Percentage Points of the Normal Distribution’ table from the Edexcel Formula Booklet for this example. According to this table, the probability of exceeding a z-value of 1.2816 is 10%. Since this table is for a standard normally distributed variable, we need to convert back to the distribution given. The variables are related by the equation Z=\frac{X-\mu}{\sigma} and so the corresponding x-value satisfies 1.2816=\frac{x-100}{15} and so x=119.224. The top 10% most intelligent people, according to IQ, have an IQ of over 119.224.

b) We again use the ‘Percentage Points of the Normal Distribution’ table from the Edexcel Formula Booklet for this example. The table shows that 95% of the data will exceed a z-value of 1.6449 and 15% of the data will exceed 1.0364. The latter suggests that 15% will be below -1.0364 using the symmetry of the standard normal distribution around 0. This leads to the following simultaneous equations 1.6449=\frac{52-\mu}{\sigma} and -1.0364=\frac{35-\mu}{\sigma}. These give \frac{52-\mu}{1.6449}=\frac{35-\mu}{-1.0364} which can be solved to give \mu=41.57 to 2 decimal places. It follows that \sigma=6.34 to 2 decimal places – see more on simultaneous equations.

The probability of getting an even number when rolling a biased dice is 0.524. The number of times an even number is rolled, E, when the dice is rolled 150 times. H is binomially distributed such that H\sim B(150,0.524).
a) Explain why using normal distribution to approximate H is appropriate and state the values of \mu and \sigma for this approximation.
b) Use this normal distribution to estimate:
i) the probability that the number of times an even number is rolled is less than 90,
ii) the probability that the number of times an odd number is rolled is between 60 and 80 inclusively.

a) An approximation with a normal distribution is appropriate since n is large and p is close to a half. In this case \mu=np=150\times 0.524=78.6 and \sigma=np(1-p)=\sqrt{78.6\times 0.476}=6.1 to 1 decimal place. The approximating distribution is Y\sim N(78.6, 37.4).
b) For these questions we must apply a continuity correction.
i) P(E<90)\approx P(Y<89.5). This is because, for a discrete variable, less than 90 implies any value that is 89 or less – for a continuous variable, this includes anything up to 89.5. P(Y<89.5)=0.963 to 3 decimal places, using the normal CD function on the calculator.
ii) P(60\leq E\leq 80)\approx P(59.5<Y<80.5). This is because, for a continuous variable, we must include 0.5 above and below the range given for the discrete variable. P(59.5<Y<80.5)=0.621 to 3 decimal places, again using the normal CD function on the calculator.

  1. A company claims that the bags of sugar that they sell weigh 1kg. The amount of sugar in a bag follows the normal distribution with mean 1kg and standard deviation 15g. A sample of 20 bags of sugar is taken to test a suspicion that the bags are under weight. The mean amount of sugar in this sample is 0.974kg. Perform a hypothesis test at the 1% significance level to test whether or not there is evidence to suggest the company is underfilling the bags.
  2. A coffee machine makes cups of coffee that follow a normal distribution with mean 200ml and standard deviation 10ml. After a part in the machine is replaced, the owner of the machine wants to see if the mean pour has changed from 200ml. A sample of 10 coffees is taken and found to have a mean of 211.4ml. Find the critical region for this test, at the 5% significance level, stating your hypotheses clearly and comment on the findings of the sample.
  1. You should have noticed that we should change 1kg to 1000g to start. The null hypothesis and alternate hypothesis are H_0: \mu=1000 and H_1: \mu<1000 respectively, since we are testing to see if the sample has a mean less than 1000 (see more on hypothesis testing). If we let S be the mount of sugar in a bag, then S\sim N\left(1000,5^2\right). It follows that the mean of a sample of size 20, \bar{S}, follows the normal distribution \bar{S}\sim N\left(1000,\frac{15^2}{20}\right), that is \bar{S}\sim N(1000,11.25). In this case, P(\bar{S}<974)=0.0104 to 4 decimal places. 0.0104 is slightly bigger than 0.01 and so, at the 1% significance level, there is insufficient evidence to reject the null. In other words, the probability of a sample having this mean value is just high enough to support the claim that the mean value of the population is 1000.
  2. We perform a two-tailed test in this example as we are testing to see if the mean value has changed either way. Hence, H_0: \mu=200 and H_1: \mu\neq 200. Let C\sim N(200,10) be the distribution for the coffees poured from the machine. The mean of the sample size 10, \bar{C}, is normally distributed such that \bar{C}\sim N\left(200, \frac{10^2}{20}\right) or \bar{C}\sim N(200, 5). The standard normal distribution can be used to find the critical region but it is simpler to do this using the inverse normal function on the calculator. Splitting the 5% in each tail, we find that the value of c such that P(C<c)=0.025 is c=190.2. By symmetry of the normal distribution it follows that the critical region is \bar{C}\leq 190.2 or \bar{C}\geq 209.8. Since the sample mean is 211.4, it falls within the critical region and there is sufficient evidence to reject the null hypothesis at the 5% significance level. In other words, the new part in the machine may have changed the mean value of coffee pours.

AS Statistics Distributions

A2 Statistics Distributions