Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.
The range is very easy to calculate because it is simply the difference between the largest and the smallest observed values in a data set. Thus, range, including any outliers, is the actual spread of data.
A great deal of information is ignored when computing the range, since only the largest and smallest data values are considered.
The range value of a data set is greatly influenced by the presence of just one unusually large or small value (outlier).
The range can be expressed as an interval such as 4–10, where 4 is the lowest value and 10 is highest. Often, it is expressed as interval width. For example, the range of 4–10 can also be expressed as a range of 6. The latter convention will be used throughout this chapter.
The disadvantage of using range is that it does not measure the spread of the majority of values in a data set—it only measures the spread between highest and lowest values. As a result, other measures are required in order to give a better picture of the data spread. The range is an informative tool used as a supplement to other measures such as the standard deviation or semi-interquartile range, but it should rarely be used as the only measure of spread.
The median divides the data into two equal sets. For more information on the median, refer to the chapter on Measures of central tendency:
It should be noted that the median takes the notation Q2, the second quartile.
The interquartile range is another range used as a measure of the spread. The difference between upper and lower quartiles (Q3–Q1), which is called the interquartile range, also indicates the dispersion of a data set. The interquartile range spans 50% of a data set, and eliminates the influence of outliers because, in effect, the highest and lowest quarters are removed.
Interquartile range = difference between upper quartile (Q3) and lower quartile (Q1)
A year ago, Angela began working at a computer store. Her supervisor asked her to keep a record of the number of sales she made each month.
The following data set is a list of her sales for the last 12 months:
34, 47, 1, 15, 57, 24, 20, 11, 19, 50, 28, 37.
Use Angela's sales records to find:
These results can be summarized as follows:
Note: This example has an even number of observations. The median, Q2, lies between the centre of two observations (24 and 28), so the calculation of Q1 includes the observation 24 as it is below the value of Q2. Similarly, 28 is also included in the calculation of Q3 as it is above the value of Q2.
Consider an odd number of observations such as 1, 2, 3, 4, 5, 6, 7. Here the value of Q2 is 4. As the location of the median is right on the fourth observation, this value is not included in calculating Q1 and Q3 , as we are interested only in the data above and below Q2. In the above example, Q1 = 2 and Q3 = 6.
The semi-quartile range is another measure of spread. It is calculated as one half the difference between the 75th percentile (often called Q3) and the 25th percentile (Q1). The formula for semi-quartile range is:
(Q3–Q1) ÷ 2.
Since half the values in a distribution lie between Q3 and Q1, the semi-quartile range is one-half the distance needed to cover half the values. In a symmetric distribution, an interval stretching from one semi-quartile range below the median to one semi-quartile above the median will contain one-half of the values. However, this will not be true for a skewed distribution.
The semi-quartile range is hardly affected by higher values, so it is a good measure of spread to use for skewed distributions, but it is rarely used for data sets that have normal distributions. In the case of a data set with a normal distribution, the standard deviation is used instead.