Non-linear Regression
What is non-linear regression?
- You have already seen that linear regression is when you can use a straight line to fit bivariate data
- Non-linear regression is when you can use a curve (rather than a straight line) to fit bivariate data
- In your exam the regression could be:
- Linear:
- Quadratic:
- Cubic:
- Exponential: or
- Power:
- Sine:
How do I find the equation of the non-linear regression model?
- Using your GDC:
- Type the two sets of the data into your GDC
- Select the relevant model
- The exam question will tell you which model to use
- Your GDC will calculate the constants
- You can use logarithms to linearise exponential and power relationships
- Power: then
- and will have a linear relationship
- Exponential: then
- and will have a linear relationship
- Power: then
Exam Tip
- You can use your GDC to plot the scatter diagram and include the graph of a regression model
- This will allow you to get a sense of how well the model fits the data
Worked Example
Scarlett and Violet collect data on the length of a film ( minutes) and the audience rating ( %).
75 |
93 |
101 |
107 |
115 |
124 |
132 |
140 |
171 |
|
83 |
75 |
51 |
38 |
47 |
56 |
76 |
91 |
70 |
a)
Scarlett claims that there is a cubic relationship. Find the equation of a cubic regression model of the form .
b)
Violet claims that there is a sine relationship. Find the equation of a sine regression model of the form .
c)
Whose model predicts a higher audience rating for a film which is 100 minutes long?
Least Squares Regression Curves
What is a residual?
- Given a set of n pairs of data and a regression model y = f(x)
- A residual is the actual y-value (from the data) minus the predicted y-value (using the regression model)
- The sum of the square residuals is denoted by
- If you have two regression models using the same data then the one with the smaller fits the data better
What is a least squares regression curve?
- The least squares regression curve can be thought of as a “curve of best fit” y = f(x)
- For a given type of model the least squares regression curve minimises the sum of the square residuals
- Your GDC calculates the constants for the least squares regression curves
Why is the sum of the square residuals not always a good measure of fit?
- If two models are formed using the same number of pairs of data then the sum of the square residuals is a good measure of fit
- If two models use different number of pairs of data then is not always a good measure of fit
- The sum will increase with more pairs of data and so can no longer be compared against a data set with a different number of pairs
- Compare the two scenarios
- 10 pairs of data and the absolute value of each residual is 15 then
- 2250 pairs of data and the absolute value of each residual is 1 then
- They have the same value of but the residuals in the second scenario are much smaller
- Your GDC may give you the mean squared error
- This is a better measure of fit
- You do not need to know this for your exam but it might help with your understanding
Worked Example
Jet is the owner of a gym and he is testing different prices options. The table below shows the number of new members per month () and the price of a monthly membership ().
10 |
20 |
30 |
|
97 |
68 |
55 |
Jet believes that he can fit the data with either the model or the model .
Jet wants to choose the model with the smallest value for the sum of square residuals.
Determine which model Jet should choose.
The Coefficient of Determination
What is the coefficient of determination?
- The coefficient of determination is a measure of fit for a model
- If the coefficient of determination is 0.57 this means 57% of the variation of the y-variable can be explained by the variation in the x-variable
- The other 43% can be explained by other factors
- The higher this proportion the more the model fits the data
- The coefficient of determination is denoted by R²
- R² ≤ 1
- R² = 1 means the model is a perfect fit for the data
- The closer to 1 the better the fit
- R² is usually greater than or equal to zero
- R² can be negative but this is outside the scope of this course
- If the regression model is linear then the coefficient of determination is equal to square of the PMCC
- for linear models
- Some GDCs will simply denote R² as r² due to its connection to the PMCC for linear models
How do I calculate the coefficient of determination?
- When finding the constants for regression models your GDC might give you the value of
- You will only be asked to calculate the coefficient of determination for models for which GDCs give the value of R²
- The coefficient of determination can be calculated by
-
- Where
- You do not need to know this formula but it might help with your understanding
-
Does the coefficient of determination determine the validity of a model?
- If R² is close to 1 then the model fits the data well
- However this alone does not guarantee that it is a good model for the relationship between the two variables
- Consider the scenario where there are big gaps between data points and a model which fits the data well
- The model only fits the data at the data points
- As there are gaps between the data points the model might not be a good fit for these areas
- Different types of models have different number of parameters
- Therefore using different types of models to fit the same data will have different levels of accuracy
- Linear models need at least two pairs of data
- Quadratic models need at least three pairs of data
- Cubic models need at least four pairs of data
- Using four pairs of data will mean the cubic model will have R² = 1
This is because the cubic graph will go through all four pieces of data – the value is likely to decrease as extra pairs of data are included - However this does not mean it is a better fit than the quadratic model
- The quadratic model could be more accurate as it has one more pair of data than is needed
Worked Example
Data is collected on the lengths of cheetahs ( metres) and their average running speeds ( ms-1).
1.21 |
1.33 |
1.12 |
1.45 |
1.42 |
1.39 |
1.24 |
1.19 |
1.32 |
|
24.3 |
25.1 |
22.2 |
35.1 |
35.1 |
33.4 |
27.1 |
23.1 |
24.8 |
a)
Find the equation of the least squares regression curve using:
(i)
a quadratic model .
(ii)
an exponential model .
b)
Based solely on the coefficients of determination, suggest which model is better fit for the data.