Outliers
What are outliers?
- Outliers are extreme data values that do not fit with the rest of the data
- They are either a lot bigger or a lot smaller than the rest of the data
- Outliers are defined as values that are more than 1.5 × IQR from the nearest quartile
- x is an outlier if x < Q1 - 1.5 × IQR or x > Q3 + 1.5 × IQR
- Outliers can have a big effect on some statistical measures
Should I remove outliers?
- The decision to remove outliers will depend on the context
- Outliers should be removed if they are found to be errors
- The data may have been recorded incorrectly
- For example: The number 17 may have been recorded as 71 by mistake
- Outliers should not be removed if they are a valid part of the sample
- The data may need to be checked to verify that it is not an error
- For example: The annual salaries of employees of a business might appear to have an outlier but this could be the director’s salary
Worked Example
The ages, in years, of a number of children attending a birthday party are given below.
2, 7, 5, 4, 8, 4, 6, 5, 5, 29, 2, 5, 13
a)
Identify any outliers within the data set.
b)
Suggest which value(s) should be removed. Justify your answer.