Hypothesis Testing for Correlation
We learned how to conduct hypothesis tests for binomial probabilities in AS Maths. In A2 Maths, we extend the ideas of hypothesis testing to normal distributions and also for testing correlation – click here to go to hypothesis testing for normal distributions. On this page we are going to learn how to conduct hypothesis tests for correlation, that is, to determine if there is a linear relationship within the whole population of two variables by looking at the product moment correlation coefficient from a paired sample. First, we recall a couple of other things we learned in AS Maths.
Product Moment Correlation Coefficient (PMCC)
When studying correlation and regression in AS Maths, we briefly learned about the Product Moment Correlation Coefficient (PMCC). It is a number that lies between -1 and 1 and is calculated from two related sets of data (or paired data) measuring two variables (how do I find the PMCC?). The PMCC is a measure of how closely the data points resemble a straight line when plotted on a scatter diagram. From here we will denote the PMCC with the letter  .
.
- Positive Correlation – If  is close to 1, then the data points will be close to a straight line that has a positive gradient – we say that the variables are ‘positively correlated’. This means that if one of the variables goes up, then the other variable will go up too. If is close to 1, then the data points will be close to a straight line that has a positive gradient – we say that the variables are ‘positively correlated’. This means that if one of the variables goes up, then the other variable will go up too. If is exactly 1, then the data points will sit exactly on the line with positive gradient. is exactly 1, then the data points will sit exactly on the line with positive gradient.
- Negative Correlation – Similarly, if  is close to -1, the data points will be close to a straight line with negative gradient and we say that the variables are ‘negatively correlated’. In this case, if one of the variables goes up, then the other variable will go down. If is close to -1, the data points will be close to a straight line with negative gradient and we say that the variables are ‘negatively correlated’. In this case, if one of the variables goes up, then the other variable will go down. If is exactly -1, then the data points will sit exactly on a straight with negative gradient. is exactly -1, then the data points will sit exactly on a straight with negative gradient.
- Zero Correlation – if  is close to 0, then there is no indication that the variables are linearly related at all. is close to 0, then there is no indication that the variables are linearly related at all.
For mid-range values of  , we can say that the variables are weakly correlated as in the images above. For example, a PMCC of 0.5 suggests that the variables are weakly positively correlated.
, we can say that the variables are weakly correlated as in the images above. For example, a PMCC of 0.5 suggests that the variables are weakly positively correlated. 
Note that the value of the  (how do I calculate
 (how do I calculate  ?) is a measure of the strength of the correlation and whether it is positive or negative – it does not tell you, however, the gradient of the straight line the data points resemble. This is done with regression.
?) is a measure of the strength of the correlation and whether it is positive or negative – it does not tell you, however, the gradient of the straight line the data points resemble. This is done with regression.
A value of  close to 0 suggests that the variables do not have a linear relationship. However, it is possible that there is another relationship between the variables or a linear relationship can be revealed by a transformation of the variables. See the next section.
 close to 0 suggests that the variables do not have a linear relationship. However, it is possible that there is another relationship between the variables or a linear relationship can be revealed by a transformation of the variables. See the next section.
Exponential Curves
Recall that when studying Exponential & Logarithmic Graphs in AS Maths, we learned that performing a log transformation to curves of the form 

- For , it follows that using the log rules. We leave the base out of these equations since it works for any chose of base, provided it is used consistently. It follows that if , then there is a linear relationship between and . Plotting against will show a straight line with gradient and y-intercept . 
- For , it follows that also using the log rules. It follows here that if , then there is a linear relationship between and . Plotting against will show a straight line with gradient and y-intercept . 
If a hypothesis test reveals evidence to suggest there is a correlation between 





Hypothesis Testing for No Correlation
We learned how to conduct one-tailed and two-tailed hypothesis tests for binomial probabilities in AS Maths – you should revise this first. We can now look at hypothesis testing with the PMCC. In order to make inferences about correlation in a population, we take a sample from it and calculate  from the sample. Performing a hypothesis test on
 from the sample. Performing a hypothesis test on  will tell us about the PMCC of the whole population which we will call
 will tell us about the PMCC of the whole population which we will call 


For a one-tailed test, the alternate hypothesis 



For a two-tailed test, the alternate hypothesis suggests that 
The hypotheses are tested against  found from the sample and we use the Product Moment Coefficient table, as supplied in the Formula Booklet, to compare it against the critical values given there.
 found from the sample and we use the Product Moment Coefficient table, as supplied in the Formula Booklet, to compare it against the critical values given there. 

Note that this table gives significance level against sample size so these must be known beforehand. See Example 1 for testing a linear relationship between two variables and Example 2 for testing a linear relationship after a transformation of variables.








 using the
 using the  , then there is a linear relationship between
, then there is a linear relationship between  and y-intercept
 and y-intercept  .
. also using the
 also using the  , then there is a linear relationship between
, then there is a linear relationship between  and y-intercept
 and y-intercept  .
.






 C)
C) to 3 decimal places. We state the null and alternate hypotheses as follows:
 to 3 decimal places. We state the null and alternate hypotheses as follows:
 . This means that the critical region is
. This means that the critical region is  . Since the sample value of
. Since the sample value of  ,
,  (log of the population rounded to the nearest 1000).
 (log of the population rounded to the nearest 1000). could be of the form
 could be of the form  and
 and  and
 and  are constants. Give a reason for your answer.
 are constants. Give a reason for your answer. . Find the unknown constants from part 2) to determine the approximate relationship between time and population for Cambridge.
. Find the unknown constants from part 2) to determine the approximate relationship between time and population for Cambridge.
 for constants
 for constants  and
 and  . If we write
. If we write  and
 and  , then
, then  using the
 using the 
 . This means that the critical region is
. This means that the critical region is  or
 or  . Since the sample value of
. Since the sample value of  ,
,  and
 and  . It follows that
. It follows that  and
 and  to 3 decimal places where we have used the
 to 3 decimal places where we have used the  .
.