IB Questionbank

User interface language: English | Español

Date	May 2021	Marks available	2	Reference code	21M.3.AHL.TZ2.1
Level	Additional Higher Level	Paper	Paper 3	Time zone	Time zone 2
Command term	Find	Question number	1	Adapted from	N/A

Question

Juliet is a sociologist who wants to investigate if income affects happiness amongst doctors. This question asks you to review Juliet’s methods and conclusions.

Juliet obtained a list of email addresses of doctors who work in her city. She contacted them and asked them to fill in an anonymous questionnaire. Participants were asked to state their annual income and to respond to a set of questions. The responses were used to determine a happiness score out of $100$ . Of the $415$ doctors on the list, $11$ replied.

Juliet’s results are summarized in the following table.

For the remaining ten responses in the table, Juliet calculates the mean happiness score to be $52.5$ .

Juliet decides to carry out a hypothesis test on the correlation coefficient to investigate whether increased annual income is associated with greater happiness.

Juliet wants to create a model to predict how changing annual income might affect happiness scores. To do this, she assumes that annual income in dollars, $X$ , is the independent variable and the happiness score, $Y$ , is the dependent variable.

She first considers a linear model of the form

$Y = a X + b$ .

Juliet then considers a quadratic model of the form

$Y = c X^{2} + d X + e$ .

After presenting the results of her investigation, a colleague questions whether Juliet’s sample is representative of all doctors in the city.

A report states that the mean annual income of doctors in the city is $$ 80 000$ . Juliet decides to carry out a test to determine whether her sample could realistically be taken from a population with a mean of $$ 80 000$ .

Describe one way in which Juliet could improve the reliability of her investigation.

[1]

a.i.

Describe one criticism that can be made about the validity of Juliet’s investigation.

[1]

a.ii.

Juliet classifies response $K$ as an outlier and removes it from the data. Suggest one possible justification for her decision to remove it.

[1]

Calculate the mean annual income for these remaining responses.

[2]

c.i.

Determine the value of $r$ , Pearson’s product-moment correlation coefficient, for these remaining responses.

[2]

c.ii.

State why the hypothesis test should be one-tailed.

[1]

d.i.

State the null and alternative hypotheses for this test.

[2]

d.ii.

The critical value for this test, at the $5 %$ significance level, is $0.549$ . Juliet assumes that the population is bivariate normal.

Determine whether there is significant evidence of a positive correlation between annual income and happiness. Justify your answer.

[2]

d.iii.

Use Juliet’s data to find the value of $a$ and of $b$ .

[1]

e.i.

Interpret, referring to income and happiness, what the value of $a$ represents.

[1]

e.ii.

Find the value of $c$ , of $d$ and of $e$ .

[1]

e.iii.

Find the coefficient of determination for each of the two models she considers.

[2]

e.iv.

Hence compare the two models.

[1]

e.v.

Juliet decides to use the coefficient of determination to choose between these two models.

Comment on the validity of her decision.

[1]

e.vi.

State the name of the test which Juliet should use.

[1]

f.i.

State the null and alternative hypotheses for this test.

[1]

f.ii.

Perform the test, using a $5 %$ significance level, and state your conclusion in context.

[3]

f.iii.

Markscheme

Any one from: R1

increase sample size / increase response rate / repeat process
check whether sample is representative
test-retest participants or do a parallel test
use a stratified sample
use a random sample

Note: Do not condone:
Ask different types of doctor
Ask for proof of income
Ask for proof of being a doctor
Remove anonymity
Remove response $K$ .

[1 mark]

a.i.

Any one from: R1

non-random sampling means a subset of population might be responding
self-reported happiness is not the same as happiness
happiness is not a constant / cannot be quantified / is difficult to measure
income might include external sources
Juliet is only sampling doctors in her city
correlation does not imply causation
sample might be biased

Note: Do not condone the following common but vague responses unless they make a clear link to validity:
Sample size is too small
Result is not generalizable
There may be other variables Juliet is ignoring
Sample might not be representative

[1 mark]

a.ii.

because the income is very different / implausible / clearly contrived R1

Note: Answers must explicitly reference "income" to get credit.

[1 mark]

$($) 90 200$ (M1)A1

[2 marks]

c.i.

$r = 0.558 (0.557723 \dots)$ A2

[2 marks]

c.ii.

EITHER
only looking for change in one direction R1

OR
only looking for greater happiness with greater income R1

OR
only looking for evidence of positive correlation R1

[1 mark]

d.i.

$H_{0} : ρ = 0; H_{1} : ρ > 0$ A1A1

Note: Award A1 for $ρ$ seen (do not accept $r$ ), A1 for both correct hypotheses, using their $ρ$ or $r$ . Accept an equivalent statement in words, however reference to “correlation for the population” or “association for the population” must be explicit for the first A1 to be awarded.

Watch out for a null hypothesis in words similar to “Annual income is not associated with greater happiness”. This is effectively saying $ρ \leq 0$ and should not be condoned.

[2 marks]

d.ii.

METHOD 1 – using critical value of $r$

$0.558 > 0.549 (0.557723 \dots > 0.549)$ R1

(therefore significant evidence of) a positive correlation A1

Note: Do not award R0A1.

METHOD 2 – using $p$ -value

$0.0469 < 0.05 (0.0469463 \dots < 0.05)$ A1

Note: Follow through from their $r$ -value from part (c)(ii).

(therefore significant evidence of) a positive correlation A1

Note: Do not award A0A1.

[2 marks]

d.iii.

$a = 0.000126 (0.000125842 \dots), b = 41.1 (41.1490 \dots)$ A1

[1 mark]

e.i.

EITHER
the amount the happiness score increases for every $$ 1$ increase in (annual) income A1

OR
rate of change of happiness with respect to (annual) income A1

Note: Accept equivalent responses e.g. an increase of $1.26$ in happiness for every $$ 10000$ increase in salary.

[1 mark]

e.ii.

$c = - 2.06 \times 10^{- 9} (- 2.06191 \dots \times 10^{- 9})$ ,

$d = 7.05 \times 10^{- 4} (7.05272 \dots \times 10^{- 4})$ ,

$e = 12.6 (12.5878 \dots)$ A1

[1 mark]

e.iii.

for quadratic model: $R^{2} = 0.659 (0.659145 \dots)$ A1

for linear model: $R^{2} = 0.311 (0.311056 \dots)$ A1

Note: Follow through from their $r$ value from part (c)(ii).

[2 marks]

e.iv.

EITHER
quadratic model is a better fit to the data / more accurate A1

OR
quadratic model explains a higher proportion of the variance A1

[1 mark]

e.v.

EITHER
not valid, $R^{2}$ not a useful measure to compare models with different numbers of parameters A1

OR
not valid, quadratic model will always have a better fit than a linear model A1

Note: Accept any other sensible critique of the validity of the method. Do not accept any answers which focus on the conclusion rather than the method of model selection.

[1 mark]

e.vi.

(single sample) $t$ -test A1

[1 mark]

f.i.

EITHER

$H_{0} : μ = 80 000; H_{1} : μ \neq 80 000$ A1

$H_{0} :$ (sample is drawn from a population where) the population mean is $$ 80 000$
$H_{1} :$ the population mean is not $$ 80 000$ A1

Note: Do not allow FT from an incorrect test in part (f)(i) other than a $z$ -test.

[1 mark]

f.ii.

$p = 0.610 (0.610322 \dots)$ A1

Note: For a $z$ -test follow through from part (f)(i), either $0.578$ (from biased estimate of variance) or $0.598$ (from unbiased estimate of variance).

$0.610 > 0.05$ R1

EITHER

no (significant) evidence that mean differs from $$ 80 000$ A1

the sample could plausibly have been drawn from the quoted population A1

Note: Allow R1FTA1FT from an incorrect $p$ -value, but the final A1 must still be in the context of the original research question.

[3 marks]

f.iii.

Examiners report

[N/A]

a.i.

[N/A]

a.ii.

[N/A]

c.i.

[N/A]

c.ii.

[N/A]

d.i.

[N/A]

d.ii.

[N/A]

d.iii.

[N/A]

e.i.

[N/A]

e.ii.

[N/A]

e.iii.

[N/A]

e.iv.

[N/A]

e.v.

[N/A]

e.vi.

[N/A]

f.i.

[N/A]

f.ii.

[N/A]

f.iii.

Syllabus sections

Topic 4—Statistics and probability » AHL 4.13—Non-linear regression

Show 56 related questions

22M.3.AHL.TZ1.1b.iii:
Write down the value of $R^{2}$ for this model.
EXM.1.AHL.TZ0.21a:
Explain why $28 - T$ can be modeled by an exponential function.
EXM.3.AHL.TZ0.9e:
Give two reasons why the prediction in part (b)(ii) might be lower than 14.
EXM.3.AHL.TZ0.9f.ii:
$c$ .
EXM.3.AHL.TZ0.9b.i:
the number of new people infected on day 6.
EXM.3.AHL.TZ0.7a.ii:
With reference to the shape of the graph, explain whether your answer to part (a)(i) will be an over-estimate or an underestimate of the area.
EXM.1.AHL.TZ0.21b:
Find the equation of the least squares exponential regression curve for $28 - T$ .
EXM.2.AHL.TZ0.12c.ii:
Find the value of this area.
EXM.3.AHL.TZ0.9a:
Use an exponential regression to find the value of $a$ and of $b$ , correct to 4 decimal places.
EXM.3.AHL.TZ0.7b.ii:
Write down the coefficient of determination.
22M.3.AHL.TZ1.1b.iv:
Hence comment on the suitability of the model from (b)(ii) in comparison with the linear model found in part (a).
21M.2.AHL.TZ1.4b.ii:
By considering the gradient of this curve when $x = 4$ , explain why it may not be a good model.
EXM.2.AHL.TZ0.12b.ii:
Write down the coefficient of determination.
EXM.3.AHL.TZ0.7c.ii:
Find the value of this area.
EXM.3.AHL.TZ0.7d.ii:
Hence explain how a straight line graph could be drawn using the coordinates in the table.
EXM.3.AHL.TZ0.9b.ii:
the day when the total number of people infected will be greater than 1000.
EXM.3.AHL.TZ0.9f.i:
$L$ .
EXM.3.AHL.TZ0.9f.iii:
$k$ .
EXM.3.AHL.TZ0.9c:
Use your answer to part (a) to show that the model predicts 16.7 people will be infected on the first day.
EXM.2.AHL.TZ0.12b.i:
Use all the coordinates in the table to find the equation of the least squares cubic regression curve.
EXM.1.AHL.TZ0.21d:
Hence predict the temperature of the water after 3 minutes.
SPM.1.AHL.TZ0.12:
Product research leads a company to believe that the revenue ( $R$ ) made by selling its goods at a price ( $p$ ) can be modelled by the equation.

$R\left( p \right) = cp{{\text{e}}^{dp}}$ , $c$ , $d \in \mathbb{R}$

There are two competing models, A and B with different values for the parameters $c$ and $d$ .

Model A has $c$ = 3, $d$ = −0.5 and model B has $c$ = 2.5, $d$ = −0.6.

The company experiments by selling the goods at three different prices in three similar areas and the results are shown in the following table.

The company will choose the model with the smallest value for the sum of square residuals.

Determine which model the company chose.
EXM.2.AHL.TZ0.12a:
Use the trapezoidal rule to find an estimate for the area.
EXM.3.AHL.TZ0.9h:
Use the logistic model to find the day when the rate of increase of people infected is greatest.
EXM.1.AHL.TZ0.21c.ii:
Interpret what the value of ${R^2}$ tells you about the model.
EXM.3.AHL.TZ0.7a.i:
Use the trapezoidal rule to find an estimate for the area.
EXM.3.AHL.TZ0.7b.i:
Use all the coordinates in the table to find the equation of the least squares cubic regression curve.
EXM.3.AHL.TZ0.9d.ii:
Perform a ${\chi ^2}$ goodness of fit test at the 5% significance level. You should clearly state your hypotheses, the p-value, and your conclusion.
EXM.3.AHL.TZ0.9d.i:
Explain why the number of degrees of freedom is 2.
EXM.3.AHL.TZ0.9g:
Hence predict the total number of people infected by this disease after several months.
EXM.2.AHL.TZ0.12c.i:
Write down an expression for the area enclosed by the cubic regression curve, the $x$ -axis, the $y$ -axis and the line $x = 10$ .
EXM.3.AHL.TZ0.7d.i:
Show that ${\text{ln}}\,y = qx + {\text{ln}}\,p$ .
EXN.2.AHL.TZ0.4e:
Find the sum of the square residuals for Jorge’s model using the values $t = 1, 2, 3, 4$ .
EXM.3.AHL.TZ0.7d.iii:
By finding the equation of a suitable regression line, show that $p = 1.83$ and $q = 0.986$ .
EXM.3.AHL.TZ0.7d.iv:
Hence find the area enclosed by the exponential function, the $x$ -axis, the $y$ -axis and the line $x = 4.4$ .
EXM.1.AHL.TZ0.21c.i:
Write down the coefficient of determination, ${R^2}$ .
21M.2.AHL.TZ1.4b.i:
Find the equation of the least squares regression quadratic curve for these four points.
21M.2.AHL.TZ1.4c:
Find the equation of the new model.
EXM.3.AHL.TZ0.7c.i:
Write down an expression for the area enclosed by the cubic function, the $x$ -axis, the $y$ -axis and the line $x = 4.4$ .
21M.3.AHL.TZ2.1e.v:
Hence compare the two models.
21M.3.AHL.TZ2.1e.iii:
Find the value of $c$ , of $d$ and of $e$ .
21M.3.AHL.TZ2.1e.vi:
Juliet decides to use the coefficient of determination to choose between these two models.

Comment on the validity of her decision.
21N.1.AHL.TZ0.12a:
Use the data in the second table to find the value of $m$ and the value of $b$ for the regression line, $\ln x = m (\ln d) + b$ .
21N.1.AHL.TZ0.12b:
Assuming that the model found in part (a) remains valid, estimate the percentage of trees in stock when $d = 25$ .
21N.3.AHL.TZ0.2a.i:
Find the equation of the regression line of $h$ on $t$ .
21N.3.AHL.TZ0.2a.ii:
Interpret the meaning of parameter $a$ in the context of the model.
21N.3.AHL.TZ0.2a.iii:
Suggest why Eva’s use of the linear regression equation in this way could be unreliable.
21N.3.AHL.TZ0.2b.i:
Find the equation of the least squares quadratic regression curve.
21N.3.AHL.TZ0.2b.ii:
Find the value of $k$ .
21N.3.AHL.TZ0.2b.iii:
Hence, write down a suitable domain for Eva’s function $h (t) = p t^{2} + q t + r$ .
21N.3.AHL.TZ0.2c:
Show that $\frac{d h}{d t} = - R^{2} \sqrt{70 560 h}$ .
21N.3.AHL.TZ0.2d:
By solving the differential equation $\frac{d h}{d t} = - R^{2} \sqrt{70 560 h}$ , show that the general solution is given by $h = 17 640 {(c - R^{2} t)}^{2}$ , where $c \in ℝ$ .
21N.3.AHL.TZ0.2e:
Use the general solution from part (d) and the initial condition $h (0) = 3.2$ to predict the value of $T$ .
21N.3.AHL.TZ0.2f:
Find this new height.
21N.3.AHL.TZ0.2g.i:
Show that $\frac{d H}{d t} \approx 0.2514 - 0.009873 t - 0.1405 \sqrt{H}$ , where $0 \leq t \leq T$ .
21N.3.AHL.TZ0.2g.ii:
Use Euler’s method with a step length of $0.5$ minutes to estimate the maximum value of $H$ .

Hide related questions

Topic 4—Statistics and probability

Question

Markscheme

Examiners report

Syllabus sections

View options