User interface language: English | Español

Date May 2008 Marks available 2 Reference code 08M.2.sl.TZ2.2
Level SL only Paper 2 Time zone TZ2
Command term Show that Question number 2 Adapted from N/A

Question

A survey of \(400\) people is carried out by a market research organization in two different cities, Buenos Aires and Montevideo. The people are asked which brand of cereal they prefer out of Chocos, Zucos or Fruti. The table below summarizes their responses.

The following table shows the cost in \({\text{AUD}}\) of seven paperback books chosen at random, together with the number of pages in each book.

One person is chosen at random from those surveyed. Find the probability that this person

(i) does not prefer Zucos;

(ii) prefers Chocos, given that they live in Montevideo.

[4]
i.a.

Two people are chosen at random from those surveyed. Find the probability that they both prefer Fruti.

[3]
i.b.

The market research organization tests the survey data to determine whether the brand of cereal preferred is associated with a city. A chi-squared test at the \(5\% \) level of significance is performed.

State the null hypothesis.

[1]
i.c.

The market research organization tests the survey data to determine whether the brand of cereal preferred is associated with a city. A chi-squared test at the \(5\% \) level of significance is performed.

State the number of degrees of freedom.

[1]
i.d.

The market research organization tests the survey data to determine whether the brand of cereal preferred is associated with a city. A chi-squared test at the \(5\% \) level of significance is performed.

Show that the expected frequency for the number of people who live in Montevideo and prefer Zucos is \(63\).

[2]
i.e.

The market research organization tests the survey data to determine whether the brand of cereal preferred is associated with a city. A chi-squared test at the \(5\% \) level of significance is performed.

Write down the chi-squared statistic for this data.

[2]
i.f.

The market research organization tests the survey data to determine whether the brand of cereal preferred is associated with a city. A chi-squared test at the \(5\% \) level of significance is performed.

State whether the market research organization would accept the null hypothesis. Clearly justify your answer.

[2]
i.g.

Plot these pairs of values on a scatter diagram. Use a scale of \(1{\text{ cm}}\) to represent \(50\) pages on the horizontal axis and \(1{\text{ cm}}\) to represent \(1{\text{ AUD}}\) on the vertical axis.

[3]
ii.a.

Write down the linear correlation coefficient, \(r\), for the data.

[2]
ii.b.

Stephen wishes to buy a paperback book which has \(350\) pages in it. He plans to draw a line of best fit to determine the price. State whether or not this is an appropriate method in this case and justify your answer.

[2]
ii.c.

Markscheme

(i) \(\frac{{280}}{{400}}{\text{ }}(0.7{\text{, }}70\% {\text{ or equivalent}})\)    (A1)(A1)(G2)

Note: (A1) for correct numerator, (A1) for correct denominator.

(ii) \(\frac{{57}}{{210}}{\text{ }}\left( {\frac{{19}}{{70}}{\text{, }}0.271{\text{, }}27.1\% } \right)\)     (A1)(A1)(G2)

Note: (A1) for correct numerator, (A1) for correct denominator.

[4 marks]

i.a.

\(\frac{{180}}{{400}} \times \frac{{179}}{{399}}\)     (A1)(M1)


Note: (A1)
for correct values seen, (M1) for multiplying their two values, (A1) for correct answer.


\( = \frac{{537}}{{2660}}{\text{ }}( = 0.202)\)     (A1)(G3)

[3 marks]

i.b.

\({{\text{H}}_0}\) : ‘the preference of brand of cereal is independent of the city’.     (A1)

OR

\({{\text{H}}_0}\) : ‘there is no association between the brand of cereal and city’.

[1 mark]

i.c.

\(df = 2\)     (A1)

[1 mark]

i.d.

\(\frac{{210 \times 120}}{{400}}\)     (M1)(A1)

Note: (M1) for substituting in correct formula, (A1) for correct values.

\( = 63\)     (AG)

Note: Final line must be seen or previous (A1) mark is lost.

[2 marks]

i.e.

\(39.3\)     (G2)

Note: Award (G1)(A0)(AP) if answers not to 3 significant figures.

[2 marks]

i.f.

\(p - {\text{value}} < 0.05\)     (R1)(ft)

Do not accept \({{{\text{H}}_0}}\) .     (A1)(ft)

Notes: Allow ‘Reject \({{{\text{H}}_0}}\) or equivalent’. (ft) from their \({\chi ^2}\) statistic.
Award (R1)(ft) for comparing the appropriate values. (A1)(ft) can be awarded only if the conclusion is valid according to the comparison given. If no reason given or if reason is wrong both marks are lost. Note that (R1)(A0)(ft) can be awarded but (R0)(A1)(ft) cannot.

[2 marks]

i.g.

     (A1)(A1)(A1)

Notes: (A1) for label and scales, (A2) for all points correct, (A1) for 5 or 6 correct. Award a maximum of (A2) if points are joined.

[3 marks]

ii.a.

\(r = - 0.141\)     (G2)

Note: If negative sign is missing award (G1)(G0).

[2 marks]

ii.b.

‘The coefficient of correlation is too low, (very) weak (linear) relationship’.     (R1)

Not a sensible thing to do, accept ‘no’.     (A1)

Note: Do not award (R0)(A1). The correlation coefficient has to be mentioned in their reasoning.

[2 marks]

ii.c.

Examiners report

Candidates answered part (a) correctly. Some lost one out of the 4 marks for making an error in the denominator of the conditional probability. In (b) many students failed to see that (b) was 'without replacement'. Parts (c), (d) and (e) seemed to be very well done by some centres and uniformly badly by others. In (e) many gave the table from the GDC and highlighted the value 63 for which no mark was gained. Expected value formula should have been used instead.

i.a.

Candidates answered part (a) correctly. Some lost one out of the 4 marks for making an error in the denominator of the conditional probability. In (b) many students failed to see that (b) was 'without replacement'. Parts (c), (d) and (e) seemed to be very well done by some centres and uniformly badly by others. In (e) many gave the table from the GDC and highlighted the value 63 for which no mark was gained. Expected value formula should have been used instead.

i.b.

Candidates answered part (a) correctly. Some lost one out of the 4 marks for making an error in the denominator of the conditional probability. In (b) many students failed to see that (b) was 'without replacement'. Parts (c), (d) and (e) seemed to be very well done by some centres and uniformly badly by others. In (e) many gave the table from the GDC and highlighted the value 63 for which no mark was gained. Expected value formula should have been used instead.

i.c.

Candidates answered part (a) correctly. Some lost one out of the 4 marks for making an error in the denominator of the conditional probability. In (b) many students failed to see that (b) was 'without replacement'. Parts (c), (d) and (e) seemed to be very well done by some centres and uniformly badly by others. In (e) many gave the table from the GDC and highlighted the value 63 for which no mark was gained. Expected value formula should have been used instead.

i.d.

Candidates answered part (a) correctly. Some lost one out of the 4 marks for making an error in the denominator of the conditional probability. In (b) many students failed to see that (b) was 'without replacement'. Parts (c), (d) and (e) seemed to be very well done by some centres and uniformly badly by others. In (e) many gave the table from the GDC and highlighted the value 63 for which no mark was gained. Expected value formula should have been used instead.

i.e.

Candidates answered part (a) correctly. Some lost one out of the 4 marks for making an error in the denominator of the conditional probability. In (b) many students failed to see that (b) was 'without replacement'. Parts (c), (d) and (e) seemed to be very well done by some centres and uniformly badly by others. In (e) many gave the table from the GDC and highlighted the value 63 for which no mark was gained. Expected value formula should have been used instead.

i.f.

Candidates answered part (a) correctly. Some lost one out of the 4 marks for making an error in the denominator of the conditional probability. In (b) many students failed to see that (b) was 'without replacement'. Parts (c), (d) and (e) seemed to be very well done by some centres and uniformly badly by others. In (e) many gave the table from the GDC and highlighted the value 63 for which no mark was gained. Expected value formula should have been used instead.

i.g.

The graph was well done with almost all candidates labelling and scaling the axes correctly. A minority of students joined the points or drew the graph on lined paper which prevented them from gaining full marks in this part of the question.

In (b) some candidates were not able to calculate the linear correlation coefficient. A few G2 comments pointed out that the command term used may have been ambiguous to some candidates and they did not think that they could use their GDC to find r. Some attempted to use the formula even though the value of \({S_{xy}}\) was not given. The guide says that 'A GDC can be used to calculate r when raw data is given'. This potential unfairness was taken into consideration during the setting of boundaries so that no candidate was disadvantaged by the possible ambiguous wording of the question. In future the command term 'Using your GDC' or 'Write down' will be used in similar questions.

Some students who did use the GDC gave \({r^2}\) instead of \(r\). This really caught the attention of many examiners as \({r^2}\) is not in the syllabus.

ii.a.

The graph was well done with almost all candidates labelling and scaling the axes correctly. A minority of students joined the points or drew the graph on lined paper which prevented them from gaining full marks in this part of the question.

In (b) some candidates were not able to calculate the linear correlation coefficient. A few G2 comments pointed out that the command term used may have been ambiguous to some candidates and they did not think that they could use their GDC to find r. Some attempted to use the formula even though the value of \({S_{xy}}\) was not given. The guide says that 'A GDC can be used to calculate r when raw data is given'. This potential unfairness was taken into consideration during the setting of boundaries so that no candidate was disadvantaged by the possible ambiguous wording of the question. In future the command term 'Using your GDC' or 'Write down' will be used in similar questions.

Some students who did use the GDC gave \({r^2}\) instead of \(r\). This really caught the attention of many examiners as \({r^2}\) is not in the syllabus.

ii.b.

The graph was well done with almost all candidates labelling and scaling the axes correctly. A minority of students joined the points or drew the graph on lined paper which prevented them from gaining full marks in this part of the question.

In (b) some candidates were not able to calculate the linear correlation coefficient. A few G2 comments pointed out that the command term used may have been ambiguous to some candidates and they did not think that they could use their GDC to find r. Some attempted to use the formula even though the value of \({S_{xy}}\) was not given. The guide says that 'A GDC can be used to calculate r when raw data is given'. This potential unfairness was taken into consideration during the setting of boundaries so that no candidate was disadvantaged by the possible ambiguous wording of the question. In future the command term 'Using your GDC' or 'Write down' will be used in similar questions.

Some students who did use the GDC gave \({r^2}\) instead of \(r\). This really caught the attention of many examiners as \({r^2}\) is not in the syllabus.

ii.c.

Syllabus sections

Topic 4 - Statistical applications » 4.4 » The \({\chi ^2}\) test for independence: formulation of null and alternative hypotheses; significance levels; contingency tables; expected frequencies; degrees of freedom; \(p\)-values.
Show 143 related questions

View options