User interface language: English | Español

Date May Example question Marks available 4 Reference code EXM.2.AHL.TZ0.27
Level Additional Higher Level Paper Paper 2 Time zone Time zone 0
Command term Show and Find Question number 27 Adapted from N/A

Question

In a reforested area of pine trees, heights of trees planted in a specific year seem to follow a normal distribution. A sample of 100 such trees is selected to test the validity of this hypothesis. The results of measuring tree heights, to the nearest centimetre, are recorded in the first two columns of the table below.

Describe what is meant by

a goodness of fit test (a complete explanation required);

[2]
a.i.

the level of significance of a hypothesis test.

[1]
a.ii.

Find the mean and standard deviation of the sample data in the table above. Show how you arrived at your answers.

[4]
b.

Most of the expected frequencies have been calculated in the third column. (Frequencies have been rounded to the nearest integer, and frequencies in the first and last classes have been extended to include the rest of the data beyond 15 and 225. Find the values of a , b  and c and show how you arrived at your answers.

[4]
c.

In order to test for the goodness of fit, the test statistic was calculated to be 1.0847. Show how this was done.

[3]
d.

State your hypotheses, critical number, decision rule and conclusion (using a 5% level of significance).

[5]
e.

Markscheme

A goodness of fit test is a statistical test of the hypothesis that a set of observed counts of k cells of a certain large population is consistent with a set of theoretical counts.                (R1)

The test statistic has a χ 2  distribution with k n degrees of freedom. One degree of freedom is lost for every parameter that has to be estimated from the sample.            (R1)

[2 marks]

a.i.

The level of significance of a hypothesis test is the maximal probability that we reject a true null hypothesis.      (R1)

[1 mark]

a.ii.

We use the class midpoints in the calculation of the mean and standard deviation.

x ¯ = x i f ( x i ) f ( x i ) = 30 × 6 + 60 × 11 + 90 × 15 + 100 = 13350 100                 (M1)

= 133.5                (A1)

s = x i 2 f ( x i ) f ( x i ) ( x ¯ ) 2 = 900 × 6 + 3600 × 11 + 100 133.5 2                 (M1)

= 56.345  (= 56.3 to 3 sf)                (A1)

[4 marks]

b.

Every frequency is the product of the number of observations and the probability of a number in each class. Since by hypothesis we have a normal distribution, the probabilities can be read from a normal table with mean 133.5 and standard deviation 56.345                 (M1)

E1 = 100 × P(45 ≤ x  ≤ 75) ≈ 9          so a  = 9              (A1)

E2 = 100 × P(135 ≤ x  ≤ 165) ≈ 20    so b = 20              (A1)

E3 = 100 × P(195 ≤ x  ≤ 225) ≈ 9      so c  = 9              (A1)

[4 marks]

c.

The test statistic is a χ 2  variable. Hence                 (M1)

χ 2 = ( f e f o ) 2 f e = ( 6 6 ) 2 6 + ( 9 11 ) 2 9 + ( 5 6 ) 2 5                  (M1)

= 1.0847              (A1)

[3 marks]

d.

H0: The distribution of tree heights is normally distributed

H1: The distribution is not normal            (M1)

Since the mean and standard deviation were estimated from the sample, the number of degrees of freedom is 8 – 1 – 2 = 5            (A1)

The critical number is χ 5 , 0.05 2 = 110705

If χ 2  > 11.0705 we reject H0            (A1)

Since χ 2  = 1.0847 < 11.0705, we fail to reject H0            (R1)

Conclusion: we do not have enough evidence to claim that the distribution of tree heights is not normal            (R1) 

[5 marks]

e.

Examiners report

[N/A]
a.i.
[N/A]
a.ii.
[N/A]
b.
[N/A]
c.
[N/A]
d.
[N/A]
e.

Syllabus sections

Topic 4—Statistics and probability » SL 4.9—Normal distribution and calculations
Show 181 related questions
Topic 4—Statistics and probability » SL 4.11—Expected, observed, hypotheses, chi squared, gof, t-test
Topic 4—Statistics and probability » AHL 4.12—Data collection, reliability and validity tests
Topic 4—Statistics and probability

View options