4.5 Measures of dispersion 4.5.1 Calculating the range and interquartile range

Text begins

To calculate the range, you need to find the largest observed value of a variable (the maximum) and subtract the smallest observed value (the minimum). The range only takes into account these two values and ignore the data points between the two extremities of the distribution. It's used as a supplement to other measures, but it is rarely used as the sole measure of dispersion because it’s sensitive to extreme values.

The interquartile range and semi-interquartile range give a better idea of the dispersion of data. To calculate these two measures, you need to know the values of the lower and upper quartiles. The lower quartile, or first quartile (Q1), is the value under which 25% of data points are found when they are arranged in increasing order. The upper quartile, or third quartile (Q3), is the value under which 75% of data points are found when arranged in increasing order. The median is considered the second quartile (Q2). The interquartile range is the difference between upper and lower quartiles. The semi-interquartile range is half the interquartile range.

When the data set is small, it is simple to identify the values of quartiles. Let’s look at an example.

Example 1 – Range and interquartile range of a data set

Find the quartiles of this data set: 6, 47, 49, 15, 43, 41, 7, 39, 43, 41, 36.

You first need to arrange the data points in increasing order. As you do so, you can give them a rank to indicate their position in the data set. Rank 1 is the data point with the smallest value, rank 2 is the data point with the second-lowest value, etc.

﻿
Table 4.5.1.1
Rank of data points
Table summary
This table displays the results of Rank of data points. The information is grouped by Rank (appearing as row headers), Value (appearing as column headers).
Rank Value
1   6
2   7
3   15
4   36
5   39
6   41
7   41
8   43
9   43
10   47
11   49

Then you need to find the rank of the median to split the data set in two. As we have seen in the section on the median, if the number of data points is an uneven value, the rank of the median will be

(n + 1) ÷ 2 = (11 + 1) ÷ 2 = 6

The rank of the median is 6, which means there are five points on each side.

Then you need to split the lower half of the data in two again to find the lower quartile. The lower quartile will be the point of rank (5 + 1) ÷ 2 = 3. The result is Q1 = 15. The second half must also be split in two to find the value of the upper quartile. The rank of the upper quartile will be 6 + 3 = 9. So Q3 = 43.

Once you have the quartiles, you can easily measure the spread. The interquartile range will be Q3 - Q1, which gives 28 (43-15). The semi-interquartile range is 14 (28 ÷ 2) and the range is 43 (49-6).

For larger data sets, you can use the cumulative relative frequency distribution to help identify the quartiles or, even better, the basic statistics functions available in a spreadsheet or statistical software that give results more easily.

What happens when the data set includes a data point whose value is considered extreme compared to the rest of the distribution?

Example 2 – Range and interquartile range in presence of an extreme value

Find the range and interquartile range of the data set of example 1, to which a data point of value 75 was added.

The range would now be 69 (75-6). The median would be the mean of the values of the data point of rank 12 ÷ 2 = 6 and the data point of rank (12 ÷ 2) + 1 = 7. Because it falls between ranks 6 and 7, there are six data points on each side of the median. The lower quartile is the mean of the values of the data point of rank 6 ÷ 2 = 3 and the data points of rank (6 ÷ 2) + 1 = 4. The result is (15 + 36) ÷ 2 = 25.5. The upper quartile is the mean of the values of data point of rank 6 + 3 = 9 and the data point of rank 6 + 4 = 10, which is (43 + 47) ÷ 2 = 45. The interquartile range is 45 - 25.5 = 19.5.

In summary, the range went from 43 to 69, an increase of 26 compared to example 1, just because of a single extreme value. The more robust interquartile range went from 28 to 19.5, a decrease of only 8.5.

The second example demonstrated that the interquartile range is more robust than the range when the data set includes a value considered extreme. It’s not a perfect measure, though. In this example, we might have expected that when adding an extreme value, the measure of dispersion would increase, but the opposite happened because there was a great difference between the values of data points of ranks 3 and 4.

The five-value series formed by the minimum, the three quartiles and the maximum is often referred to as “the five-number summary.” It is a well-known manner to summarize data sets. In the following section on box and whisker plot, we will see a useful method to visualize this five-number summary.

﻿
Date modified: