Hypothesis Testing for Correlation
We learned how to conduct hypothesis tests for binomial probabilities in AS Maths. In A2 Maths, we extend the ideas of hypothesis testing to normal distributions and also for testing correlation – click here to go to hypothesis testing for normal distributions. On this page we are going to learn how to conduct hypothesis tests for correlation, that is, to determine if there is a linear relationship within the whole population of two variables by looking at the product moment correlation coefficient from a paired sample. First, we recall a couple of other things we learned in AS Maths.
Product Moment Correlation Coefficient (PMCC)
When studying correlation and regression in AS Maths, we briefly learned about the Product Moment Correlation Coefficient (PMCC). It is a number that lies between -1 and 1 and is calculated from two related sets of data (or paired data) measuring two variables (how do I find the PMCC?). The PMCC is a measure of how closely the data points resemble a straight line when plotted on a scatter diagram. From here we will denote the PMCC with the letter .
- Positive Correlation – If is close to 1, then the data points will be close to a straight line that has a positive gradient – we say that the variables are ‘positively correlated’. This means that if one of the variables goes up, then the other variable will go up too. If is exactly 1, then the data points will sit exactly on the line with positive gradient.
- Negative Correlation – Similarly, if is close to -1, the data points will be close to a straight line with negative gradient and we say that the variables are ‘negatively correlated’. In this case, if one of the variables goes up, then the other variable will go down. If is exactly -1, then the data points will sit exactly on a straight with negative gradient.
- Zero Correlation – if is close to 0, then there is no indication that the variables are linearly related at all.
For mid-range values of , we can say that the variables are weakly correlated as in the images above. For example, a PMCC of 0.5 suggests that the variables are weakly positively correlated.
Note that the value of the (how do I calculate ?) is a measure of the strength of the correlation and whether it is positive or negative – it does not tell you, however, the gradient of the straight line the data points resemble. This is done with regression.
A value of close to 0 suggests that the variables do not have a linear relationship. However, it is possible that there is another relationship between the variables or a linear relationship can be revealed by a transformation of the variables. See the next section.
Exponential Curves
Recall that when studying Exponential & Logarithmic Graphs in AS Maths, we learned that performing a log transformation to curves of the form and resulted in a linear relationship between the variables:
- For , it follows that using the log rules. We leave the base out of these equations since it works for any chose of base, provided it is used consistently. It follows that if , then there is a linear relationship between and . Plotting against will show a straight line with gradient and y-intercept .
- For , it follows that also using the log rules. It follows here that if , then there is a linear relationship between and . Plotting against will show a straight line with gradient and y-intercept .
If a hypothesis test reveals evidence to suggest there is a correlation between and , then there is evidence to suggest that the variables have a polynomial relationship in the form . Similarly, if a hypothesis test reveals evidence to suggest there is a correlation between and , then there is evidence to suggest that the variables have an exponential relationship in the form .
Hypothesis Testing for No Correlation
We learned how to conduct one-tailed and two-tailed hypothesis tests for binomial probabilities in AS Maths – you should revise this first. We can now look at hypothesis testing with the PMCC. In order to make inferences about correlation in a population, we take a sample from it and calculate from the sample. Performing a hypothesis test on will tell us about the PMCC of the whole population which we will call . The null hypothesis suggests that there is no correlation in the population, that is to say :
For a one-tailed test, the alternate hypothesis suggests that either is positive or is negative:
or
For a two-tailed test, the alternate hypothesis suggests that is different from 0:
The hypotheses are tested against found from the sample and we use the Product Moment Coefficient table, as supplied in the Formula Booklet, to compare it against the critical values given there.
Note that this table gives significance level against sample size so these must be known beforehand. See Example 1 for testing a linear relationship between two variables and Example 2 for testing a linear relationship after a transformation of variables.