What to do with outliers
One of the common questions from students is whether they can "just get rid of outliers." The answer is - it's complicated.
Students should never simply discard data because they feel that it someone compromises the significance of their study. Any time that outliers are eliminated from a data set, that choice needs to be justified. Below is an explanation of the two ways that outliers may be eliminated: a practical justification and a statistical justification.
Practical justifications for eliminating outliers
Data can be discarded when it is clear that there is a problem with the data that is not related to its value.
For example, if there is a person in the sample that clearly did not understand directions and did the task incorrectly, this data would be removed from the sample. Another example might be if the response is not clear. For example, if asked to estimate the speed two cars were traveling when they "smashed" into each other, if a participant writes "between 100 and 120 km/hr", then this data would be discarded. It would not be appropriate to find the median value - 110 - and include this in the calculations.
During the debriefing it may be clear that the participant had already done the experiment in a different setting, didn't understand the directions, or guessed the goal of the study, leading to demand characteristics. If that is the case, this data may also be excluded from the data set.
Any time data is removed from the data set, the reason that it has been excluded must be explained in the analysis of the report.
Statistical justifications for eliminating outliers
A second way to eliminate outliers is to determine if they mathematically "qualify" as outliers. To do this we use two simple formulas:
Low outliers: Q1 - (1.5 * IQR)
High outliers: Q3 + (1.5 * IQR)
IQR stands for the Interquartile range. The formula for IQR is Q3 - Q1.
This is actually easier than you may think.
Let's say that you asked a set of participants to memorize a list of 20 words while listening to music and while listening in silence. In each group, there were 10 participants. In group 1, the participants recalled the following number of words:
3 | 3 | 4 | 6 | 7 | 8 | 8 | 8 | 12 | 13 |
The first step is to find the median. You always start with the lowest value and then count to the halfway point. In this data-set, there are 10 data points, so we will find the mid-point between data points #5 and #6. In this case, that is 7.5.
Once you find the median, you will see that this leaves us four scores below the median and four above. You find the mid-point of each data set. For the lower set (what we call Quartile 1), the value is 4. In the upper set (Quartile 3), the value is 8.
The Interquartile range is the value of Q3 - Q1 - or 8 - 4. Our IQR is 4.
Once we have these values, we can decide if there are any outliers.
Remember that low outliers are defined by Q1 - (1.5 * IQR) or in this case 4 - (1.5 * 4) or 4 - 6. In other words, in order for our low values to be considered outliers, they would have to be - 2 or less. So, in this data set, there are no outliers on the low end. What about on the high end?
High outliers are defined by Q3 + (1.5 * IQR) or in this case 8 + (1.5 * 4) or 8 + 6. In other words, in order for our high values to be considered outliers, they would have to be 14 or greater. So, in this data set, there are also no outliers on the higher end.
So, now take a look at the second group's data. Is there anything you would exclude from this data set?
4 | 4 | 5 | 5 | 5 | 6 | 7 | 7 | 8 | 15 |
The median score is 5.5; the value for Q1 is 5 and the value for Q3 is 7. The IQR = 2
This means that for the lower outliers, the value would have to be less than or equal to 2.
For the higher outliers, the value would have to be greater than or equal to 10.
So, in this data set, the researcher would be justified in eliminating the score of 15.
Warning: The importance of sample size
When planning the IA, if one thinks that the experiment may lead to outliers, then they should try to modify the experiment in such a way as to avoid this. For example, giving participants choices rather than having them make estimates of the speed of a car.
If, however, this is unavoidable or undesirable, then be sure to have a large enough sample so that your final sample remains 10 per condition in repeated measures or 20 total in an independent samples design, even if you have to discard participants’ data.
Why is this important? When running a Wilcoxon Signed Ranks test, if a participant receives the same score in both conditions, then the data is excluded from the sample when the statistic is calculated. If you are using an online calculator, you won't even know that this has happened. If the size of the sample falls below 10, it will not be possible to calculate a p-value and it will be necessary to gather more data.