User interface language: English | Español

Date May 2021 Marks available 1 Reference code 21M.3.AHL.TZ2.1
Level Additional Higher Level Paper Paper 3 Time zone Time zone 2
Command term Find Question number 1 Adapted from N/A

Question

Juliet is a sociologist who wants to investigate if income affects happiness amongst doctors. This question asks you to review Juliet’s methods and conclusions.

Juliet obtained a list of email addresses of doctors who work in her city. She contacted them and asked them to fill in an anonymous questionnaire. Participants were asked to state their annual income and to respond to a set of questions. The responses were used to determine a happiness score out of 100. Of the 415 doctors on the list, 11 replied.

Juliet’s results are summarized in the following table.

For the remaining ten responses in the table, Juliet calculates the mean happiness score to be 52.5.

Juliet decides to carry out a hypothesis test on the correlation coefficient to investigate whether increased annual income is associated with greater happiness.

Juliet wants to create a model to predict how changing annual income might affect happiness scores. To do this, she assumes that annual income in dollars, X, is the independent variable and the happiness score, Y, is the dependent variable.

She first considers a linear model of the form

Y=aX+b.

Juliet then considers a quadratic model of the form

Y=cX2+dX+e.

After presenting the results of her investigation, a colleague questions whether Juliet’s sample is representative of all doctors in the city.

A report states that the mean annual income of doctors in the city is $80000. Juliet decides to carry out a test to determine whether her sample could realistically be taken from a population with a mean of $80000.

Describe one way in which Juliet could improve the reliability of her investigation.

[1]
a.i.

Describe one criticism that can be made about the validity of Juliet’s investigation.

[1]
a.ii.

Juliet classifies response K as an outlier and removes it from the data. Suggest one possible justification for her decision to remove it.

[1]
b.

Calculate the mean annual income for these remaining responses.

[2]
c.i.

Determine the value of r, Pearson’s product-moment correlation coefficient, for these remaining responses.

[2]
c.ii.

State why the hypothesis test should be one-tailed.

[1]
d.i.

State the null and alternative hypotheses for this test.

[2]
d.ii.

The critical value for this test, at the 5% significance level, is 0.549. Juliet assumes that the population is bivariate normal.

Determine whether there is significant evidence of a positive correlation between annual income and happiness. Justify your answer.

[2]
d.iii.

Use Juliet’s data to find the value of a and of b.

[1]
e.i.

Interpret, referring to income and happiness, what the value of a represents.

[1]
e.ii.

Find the value of c, of d and of e.

[1]
e.iii.

Find the coefficient of determination for each of the two models she considers.

[2]
e.iv.

Hence compare the two models.

[1]
e.v.

Juliet decides to use the coefficient of determination to choose between these two models.

Comment on the validity of her decision.

[1]
e.vi.

State the name of the test which Juliet should use.

[1]
f.i.

State the null and alternative hypotheses for this test.

[1]
f.ii.

Perform the test, using a 5% significance level, and state your conclusion in context.

[3]
f.iii.

Markscheme

Any one from:                R1

increase sample size / increase response rate / repeat process
check whether sample is representative
test-retest participants or do a parallel test
use a stratified sample
use a random sample


Note: Do not condone:
Ask different types of doctor
Ask for proof of income
Ask for proof of being a doctor
Remove anonymity
Remove response K.

 

[1 mark]

a.i.

Any one from:                R1

non-random sampling means a subset of population might be responding
self-reported happiness is not the same as happiness
happiness is not a constant / cannot be quantified / is difficult to measure
income might include external sources
Juliet is only sampling doctors in her city
correlation does not imply causation
sample might be biased


Note: Do not condone the following common but vague responses unless they make a clear link to validity:
Sample size is too small
Result is not generalizable
There may be other variables Juliet is ignoring
Sample might not be representative

 

[1 mark]

a.ii.

because the income is very different / implausible / clearly contrived              R1


Note: Answers must explicitly reference "income" to get credit.

 

[1 mark]

b.

($)90200           (M1)A1


[2 marks]

c.i.

r=0.558  0.557723                  A2


[2 marks]

c.ii.

EITHER
only looking for change in one direction                R1

OR
only looking for greater happiness with greater income                R1

OR
only looking for evidence of positive correlation                R1


[1 mark]

d.i.

H0:ρ=0; H1:ρ>0               A1A1


Note: Award A1 for ρ seen (do not accept r), A1 for both correct hypotheses, using their ρ or r. Accept an equivalent statement in words, however reference to “correlation for the population” or “association for the population” must be explicit for the first A1 to be awarded.

Watch out for a null hypothesis in words similar to “Annual income is not associated with greater happiness”. This is effectively saying ρ0 and should not be condoned.


[2 marks]

d.ii.

METHOD 1 – using critical value of r

0.558>0.549  0.557723>0.549      R1

(therefore significant evidence of) a positive correlation          A1


Note: Do not award R0A1.

 

METHOD 2 – using p-value

0.0469<0.05  0.0469463<0.05         A1


Note: Follow through from their r-value from part (c)(ii).


(therefore significant evidence of) a positive correlation          A1


Note: Do not award A0A1.


[2 marks]

d.iii.

a=0.000126  0.000125842,  b=41.1  41.1490         A1


[1 mark]

e.i.

EITHER
the amount the happiness score increases for every $1 increase in (annual) income       A1

OR
rate of change of happiness with respect to (annual) income       A1


Note: Accept equivalent responses e.g. an increase of 1.26 in happiness for every $10000 increase in salary.


[1 mark]

e.ii.

c=-2.06×10-9  -2.06191×10-9,

d=7.05×10-4  7.05272×10-4,

e=12.6  12.5878       A1


[1 mark]

e.iii.

for quadratic model: R2=0.659  0.659145       A1

for linear model: R2=0.311  0.311056       A1


Note: Follow through from their r value from part (c)(ii).


[2 marks]

e.iv.

EITHER
quadratic model is a better fit to the data / more accurate      A1

OR
quadratic model explains a higher proportion of the variance      A1


[1 mark]

e.v.

EITHER
not valid, R2 not a useful measure to compare models with different numbers of parameters     A1

OR
not valid, quadratic model will always have a better fit than a linear model     A1


Note: Accept any other sensible critique of the validity of the method. Do not accept any answers which focus on the conclusion rather than the method of model selection.

[1 mark]

e.vi.

(single sample) t-test    A1


[1 mark]

f.i.

EITHER

H0:μ=80000; H1:μ80000             A1

OR

H0: (sample is drawn from a population where) the population mean is $80000
H1: the population mean is not $80000             A1


Note: Do not allow FT from an incorrect test in part (f)(i) other than a z-test.


[1 mark]

f.ii.

p=0.610  0.610322             A1


Note: For a z-test follow through from part (f)(i), either 0.578 (from biased estimate of variance) or 0.598 (from unbiased estimate of variance).


0.610>0.05            R1


EITHER

no (significant) evidence that mean differs from $80000            A1


OR

the sample could plausibly have been drawn from the quoted population         A1


Note: Allow R1FTA1FT from an incorrect p-value, but the final A1 must still be in the context of the original research question.


[3 marks]

f.iii.

Examiners report

[N/A]
a.i.
[N/A]
a.ii.
[N/A]
b.
[N/A]
c.i.
[N/A]
c.ii.
[N/A]
d.i.
[N/A]
d.ii.
[N/A]
d.iii.
[N/A]
e.i.
[N/A]
e.ii.
[N/A]
e.iii.
[N/A]
e.iv.
[N/A]
e.v.
[N/A]
e.vi.
[N/A]
f.i.
[N/A]
f.ii.
[N/A]
f.iii.

Syllabus sections

Topic 4—Statistics and probability » AHL 4.13—Non-linear regression
Show 56 related questions
Topic 4—Statistics and probability

View options