4.1.1 Sampling & Data Collection

Types of Data

What are the different types of data?

Qualitative data is data that is usually given in words not numbers to describe something
- For example: the colour of a teacher's car
Quantitative data is data that is given using numbers which counts or measures something
- For example: the number of pets that a student has
Discrete data is quantitative data that needs to be counted
- Discrete data can only take specific values from a set of (usually finite) values
- For example: the number of times a coin is flipped until a ‘tails’ is obtained
Continuous data is quantitative data that needs to be measured
- Continuous data can take any value within a range of infinite values
- For example: the height of a student
Age can be discrete or continuous depending on the context or how it is defined
- If you mean how many years old a person is then this is discrete
- If you mean how long a person has been alive then this is continuous

What is the difference between a population and a sample?

The population refers to the whole set of things which you are interested in
- For example: if a vet wanted to know how long a typical French bulldog slept for in a day then the population would be all the French bulldogs in the world
A sample refers to a subset of the population which is used to collect data from
- For example: the vet might take a sample of French bulldogs from different cities and record how long they sleep in a day
A sampling frame is a list of all members of the population
- For example: a list of employees’ names within a company
Using a sample instead of a population:
- Is quicker and cheaper
- Leads to less data needing to be analysed
- Might not fully represent the population
- Might introduce bias

Sampling Techniques

What is a random sample and a biased sample?

A random sample is where every member of the population has an equal chance of being included in the sample
A biased sample is where the sample is not random

What sampling techniques do I need to know?

Simple random sampling

Simple random sampling is where every group of members from the population has an equal probability of being selected for the sample
To carry this out you would...
- uniquely number every member of a population
- randomly select n different numbers using a random number generator or a form of lottery (where numbers are selected randomly)
Effectiveness:
- Useful when you have a small population or want a small sample (such as children in a class)
- It can be time-consuming if the sample or population is large
- This can not be used if it is not possible to number or list all the members of the population (such as fish in a lake)

Systematic sampling

Systematic sampling is where a sample is formed by choosing members of a population at regular intervals using a list
To carry this out you would...
- calculate the size of the interval $k = \frac{size of population (N)}{size of sample (n)}$
- choose a random starting point between 1 and k
- select every kth member after the first one
Effectiveness:
- Useful when there is a natural order (such as a list of names or a conveyor belt of items)
- Quick and easy to use
- This can not be used if it is not possible to number or list all the members of the population (such as penguins in Antarctica)

Stratified sampling

Stratified sampling is where the population is divided into disjoint groups (called strata) and then a random sample is taken from each group (stratum)
The proportion of a stratum that is sampled is equal to the proportion of the population that belong to that stratum
To carry this out you would...
- Calculate the number of members sampled from each stratum
  - $\frac{size of sample (n)}{size of population (N)} \times number of members in the stratum$
- Take a random sample from each stratum
Effectiveness:
- Useful when there are very different groups of members within a population
- The sample will be representative of the population structure
- The members selected from each stratum are chosen randomly
- This can not be used if the population can not be split into groups or if the groups overlap

Quota sampling

Quota sampling is where the population is split into groups (like stratified sampling) and members of the population are selected until each quota is filled
To carry this out you would...
- Calculate how many people you need from each group
- Select members from each group until that quota is filled
  - The members do not have to be selected randomly
Effectiveness:
- Useful when collecting data by asking people who walk past you in a public place or when a sampling frame is not available
- This can introduce bias as some members of the population might choose not to be included in the sample

Convenience sampling

Convenience sampling is where a sample is formed using available members of the population who fit the criteria
To carry this out you would...
- Select members that are easiest to reach
Effectiveness:
- Useful when a list of the population is not possible
- This is unlikely to be representative of the population structure
- This is likely to produce biased results

What are the main criticisms of sampling techniques?

Most sampling techniques can be improved by taking a larger sample
Sampling can introduce bias - so you want to minimise the bias within a sample
- To minimise bias the sample should be as close to random as possible
A sample only gives information about those members
- Different samples may lead to different conclusions about the population

Worked Example

Mike is a biologist studying mice in an open enclosure. He has access to approximately 540 field mice and 260 harvest mice. Mike wants to sample 10 mice and he wants the proportions of the two types of mice in his sample to reflect their respective proportions of the population.

Calculate the number of field mice and harvest mice that Mike should include in his sample.

4-1-1-ib-ai-aa-sl-sampling-a-we-solution

Given that Mike does not have a list of all mice in the enclosure, state the name of this sampling method.

4-1-1-ib-ai-aa-sl-sampling-b-we-solution

Suggest one way in which Mike could improve his sampling method.

4-1-1-ib-ai-aa-sl-sampling-c-we-solution

Reliability of Data

How can I decide if data is reliable?

Data from a sample is reliable if similar results would be obtained from a different sample from the same population
The sample should be representative of the population
The sample should be big enough
- Sampling a small proportion of a population is unlikely to be reliable

What can cause data to be unreliable?

If the sample is biased
- It is not random
If errors are made when collecting data
- Numbers could be recorded incorrectly, duplicated or missed out
If the person collecting the data favours some members over others
- They might seek out members who will lead to a desired outcome
- They might exclude members if they would cause the sample to oppose the desired outcome
If a significant proportion of data is missing
- Some data may be unavailable
- Some members might decide not to be part of the sample
  - This will mean the results are not necessarily representative of the population

DP IB Maths: AA SL

Revision Notes