Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.
Cumulative frequency is used to determine the number of observations that lie above (or below) a particular value in a data set. The cumulative frequency is calculated using a frequency distribution table, which can be constructed from stem and leaf plots or directly from the data.
The cumulative frequency is calculated by adding each frequency from a frequency distribution table to the sum of its predecessors. The last value will always be equal to the total for all observations, since all frequencies will already have been added to the previous total.
Variables in any calculation can be characterized by the value assigned to them. A discrete variable consists of separate, indivisible categories. No values can exist between a variable and its neighbour. For example, if you were to observe a class attendance registered from day-to-day, you may discover that the class has 29 students on one day and 30 students on another. However, it is impossible for student attendance to be between 29 and 30. (There is simply no room to observe any values between these two values, as there is no way of having 29 and a half students.)
Not all variables are characterized as discrete. Some variables (such as time, height and weight) are not limited to a fixed set of indivisible categories. These variables are called continuous variables, and they are divisible into an infinite number of possible values. For example, time can be measured in fractional parts of hours, minutes, seconds and milliseconds. So, instead of finishing a race in 11 or 12 minutes, a jockey and his horse can cross the finish line at 11 minutes and 43 seconds.
It is essential to know the difference between the two types of variables in order to properly calculate their cumulative frequency.
The total rock climber count of Lake Louise, Alberta was recorded over a 30-day period. The results are as follows:
31, 49, 19, 62, 24, 45, 23, 51, 55, 60, 40, 35 54, 26, 57, 37, 43, 65, 18, 41, 50, 56, 4, 54, 39, 52, 35, 51, 63, 42.
Each interval can be located in the Stem column. The numbers within this column represent the first number within the class interval. (For example, Stem 0 represents the interval 0–9, Stem 1 represents the interval of 10–19, and so forth.)
The Leaf column lists the number of observations that lie within each class interval. For example, in Stem 2 (interval 20–29), the three observations, 23, 24, and 26, are represented as 3, 4 and 6.
The Frequency column lists the number of observations found within a class interval. For example, in Stem 5, nine leaves (or observations) were found; in Stem 1, there are only two.
Use the Frequency column to calculate cumulative frequency.
The Upper value column lists the observation (variable) with the highest value in each of the class intervals. For example, in Stem 1, the two observations 8 and 9 represent the variables 18 and 19. The upper value of these two variables is 19.
Stem | Leaf | Frequency (f) | Upper value | Cumulative frequency |
---|---|---|---|---|
0 | 4 | 1 | 4 | 1 |
1 | 8 9 | 2 | 19 | 1 + 2 = 3 |
2 | 3 4 6 | 3 | 26 | 3 + 3 = 6 |
3 | 1 5 5 7 9 | 5 | 39 | 6 + 5 = 11 |
4 | 0 1 2 3 5 9 | 6 | 49 | 11 + 6 = 17 |
5 | 0 1 1 2 4 4 5 6 7 | 9 | 57 | 17 + 9 = 26 |
6 | 0 2 3 5 | 4 | 65 | 26 + 4 = 30 |
Always label the graph with the cumulative frequency—corresponding to the number of observations made—on the vertical axis. Label the horizontal axis with the other variable (in this case, the total rock climber counts) as shown below:
The following information can be gained from either graph or table:
When a continuous variable is used, both calculating the cumulative frequency and plotting the graph require a slightly different approach from that used for a discrete variable.
For 25 days, the snow depth at Whistler Mountain, B.C. was measured (to the nearest centimetre) and recorded as follows:
242, 228, 217, 209, 253, 239, 266, 242, 251, 240, 223, 219, 246, 260, 258, 225, 234, 230, 249, 245, 254, 243, 235, 231, 257.
In the Snow depth column, each 10-cm class interval from 200 cm to 270 cm is listed.
The Frequency column records the number of observations that fall within a particular interval. This column represents the observations in the Tally column, only in numerical form.
The Endpoint column functions much like the Upper value column of Exercise 1, with the exception that the endpoint is the highest number in the interval, regardless of the actual value of each observation. For example, in the class interval of 210–220, the actual value of the two observations is 217 and 219. But, instead of using 219, the endpoint of 220 is used.
The Cumulative frequency column lists the total of each frequency added to its predecessor.
Snow depth (x) | Tally | Frequency (f) | Endpoint | Cumulative frequency |
---|---|---|---|---|
200 | 0 | |||
200 to < 210 | 1 | 210 | 1 | |
210 to < 220 | 2 | 220 | 3 | |
220 to < 230 | 3 | 230 | 6 | |
230 to < 240 | 5 | 240 | 11 | |
240 to < 250 | 7 | 250 | 18 | |
250 to < 260 | 5 | 260 | 23 | |
260 to < 270 | 2 | 270 | 25 |
Remember, the cumulative frequency (number of observations made) is labelled on the vertical y-axis and any other variable (snow depth) is labelled on the horizontal x-axis as shown in Figure 2.
The following information can be gained from either graph or table:
Another calculation that can be obtained using a frequency distribution table is the relative frequency distribution. This method is defined as the percentage of observations falling in each class interval. Relative cumulative frequency can be found by dividing the frequency of each interval by the total number of observations. (For more information, see Frequency distribution in the chapter entitled Organizing data.)
A frequency distribution table can also be used to calculate cumulative percentage. This method of frequency distribution gives us the percentage of the cumulative frequency, as opposed to the percentage of just the frequency.