Analytical Studies: Methods and References
The 2001 Canadian Census–Tax–Mortality Cohort: A 10-Year Follow-up

by Lauren Pinault, Philippe Finès, Michael Tjepkema
Health Analysis Division, Statistics Canada
Félix Labrecque-Synnott, Abdelnasser Saidi
Household Survey Methods Division, Statistics Canada

Release date: October 26, 2016

Start of text box


The authors wish to acknowledge the contributions of Saeeda Khan and Radivoje Bradic, who assisted with manual validation of the cohort, and Jacques Dubois and Barry Zaid, who assisted in accessing census microfiche files for the manual validation.

End of text box


Large national mortality cohorts are used to estimate mortality rates for different socioeconomic and population groups, and to conduct research on environmental health. In 2008, Statistics Canada created a cohort linking the 1991 Census to mortality. The present study describes a linkage of the 2001 Census long-form questionnaire respondents aged 19 years and older to the T1 Personal Master File and the Amalgamated Mortality Database. The linkage tracks all deaths over a 10.6-year period (until the end of 2011, to date). Mortality statistics were calculated for this cohort, and results were generally as expected, based on Canada life tables from the mid-point of the follow-up period (2005 to 2007) and the 1991 Census cohort. This paper is meant to inform users of the representativeness of the cohort due to the steps of the linkage (i.e., the linkage to T1 tax files).

Keywords: Census, cohort, mortality, Amalgamated Mortality Database, data linkage, age-standardized mortality rate, survival

1. Introduction

Large, nationally representative health cohorts have been used in various countries to estimate mortality outcomes based on differences in socioeconomic status (e.g., Blakely et al. 2002; Stringhini et al. 2012; Lazzarino et al. 2013; Mishra et al. 2013; Cho et al. 2016). They have generally identified higher mortality rates among persons of lower socioeconomic status (e.g., with lower educational attainment or in lower-status occupations).

In Canada, a broadly nationally representative mortality cohort was created that linked 1991 Census long-form questionnaire respondents to the Canadian Mortality Database (CMDB) and T1 tax files. This cohort was then used to estimate mortality rates in Canada for the period from 1991 to 2001 (Wilkins et al. 2008), although follow-up for mortality has since been extended to the end of 2011. The cohort has also been spatially linked to environmental datasets and used in several environmental health studies (e.g., Crouse et al. 2012; Crouse et al. 2015; Weichenthal et al. 2016). It is also referred to in the literature as the Canadian Census Health and Environment Cohort.

Although the original cohort provides a large, broadly representative sample for epidemiological studies, it includes persons who were aged 25 years or older at baseline (in 1991). As a result, the cohort population has been aging, with the youngest cohort respondents aged 45 by the end of the follow-up period in 2011. More recent environmental datasets are being created that require a more recent baseline population for exposure assessment purposes. A new cohort based on a more recent census year would align better with the vintage of these environmental datasets. The 2001 Census was selected because it is more recent than the 1991 Census, but also allows for some years of mortality follow-up.

The purpose of this paper is to describe the census–tax–mortality cohort that linked 3.5 million respondents to the 2001 Census long-form questionnaire with the T1 Personal Master File, which includes social insurance numbers (SINs). The SINs were used to link the 2001 Census cohort members to the Amalgamated Mortality Database (AMDB), and cohort members were followed for mortality until December 31, 2011 (347,000 deaths). The linkage methodology was similar to that used for the 1991 Census cohort (Statistics Canada 2015b), with some differences noted below. The following results provide an overview of any known differences between the cohort and the general population, and the general patterns of mortality in the dataset across socioeconomic and demographic groups.

2. Data

The 2001 Canadian Census of Population was carried out on May 15, 2001, and enumerated the whole Canadian population, including citizens, landed immigrants and non-permanent residents. In comparison with the basic census short-form questionnaire (Form 2A) that is distributed to most households, the census long-form questionnaire (Forms 2B and 2D) collects a wider range of socioeconomic and demographic data and is distributed to approximately 20% of Canadian households. In remote areas and enumerated Indian reserves, the long-form questionnaire is distributed to all households (approximately 2% of the Canadian population). The gross undercoverage rate (i.e., all persons in the household who were missed) estimated for the 2001 Census was 3.95% (or approximately 1,222,300 individuals). Undercoverage was higher in younger adults (aged 20 to 34), divorced or never-married persons, and persons who had a mother tongue other than English or French (Statistics Canada 2003, 2014).

The T1 tax files (from the T1 Personal Master File) of 2000 and 2001 were used as a linkage step to link the census respondents to mortality records using their SIN. T1 tax files were provided by the Canada Revenue Agency and preprocessed for linkage by Statistics Canada. These files include the SIN, a temporary taxation number and, where applicable, a dependant identification number. T1 tax files (from the T1 Personal Master File) were also used to provide a history of taxfiler postal codes from 1981 to 2011 to allow researchers to consider changes to residence over time.

Mortality data were derived from the AMDB, version 2014b (Mayer and Charbonneau 2015). The AMDB is the result of a prior linkage between the CMDB and tax files (including the T1 file described above, T4 income tax files, and files for the labour-sponsored funds tax credit and the child tax benefit). The CMDB is an amalgamation of provincial and territorial death registries and includes hospital data on deaths that occurred between 1950 and 2011 (as of the present). It also includes data on the underlying cause of death and autopsy confirmation of deaths, which are required for cause-specific mortality analyses. With some exceptions, the CMDB does not include deaths that occurred outside of Canada, nor deaths that were not confirmed by a medical professional (as is sometimes the case in remote areas). The AMDB is a linked dataset of all deaths from both the CMDB and tax files, and includes the SIN (from T1 tax files) and a consolidated date of death that prioritizes the date recorded in the CMDB over that in the tax files.

3. Methods

3.1 Linkage

The Executive Management Board at Statistics Canada approved the record linkage in 2015 (Record Linkage no. 045-2015). The first step of the linkage project was to link census respondents to T1 tax files to attach a SIN. In the second step, SINs were then used to link records to deaths registered in the AMDB.

The first step of the linkage project was conducted using deterministic linkage groups first, and then probabilistic linkage groups. An initial linkage was made between all respondents to the 2001 Census (both long-form and short-form questionnaire respondents) and T1 tax records for 2000 and 2001, through three deterministic linkage groups. In the first deterministic linkage group, the linkage keys between the two datasets were sex, date of birth, 2001 postal code and marital status. The second linkage group was the same as the first, but did not consider marital status. The third group used sex, date of birth and 2000 postal code. In each deterministic linkage group, only records with unique keys on both the tax and census files were retained in the cohort. Duplicates and non-matches were considered in subsequent linkage groups. Census short-form questionnaire respondents were included at this stage to reduce the number of records to be considered for probabilistic linkage and to reduce the number of false-positive matches.

For all census long-form questionnaire respondents who were not linked in the deterministic linkage groups, records were then considered in several groups of highest linkage weights using a probabilistic linkage methodology based on the Fellegi-Sunter theory of record linkage (Fellegi and Sunter 1969). The following variables were used for probabilistic linkage: birth date, spouse’s birth date (if applicable), sex, marital status, postal code, and rural or urban status based on postal code. A total of eight probabilistic linkage groups were considered sequentially, based on the ranges of linkage weights. To determine the linkage threshold and the groups that would be included in the cohort, linkage accuracy was estimated for all potential linkage groups by verifying a random sample of linked records with the original scanned census questionnaires. Respondents for whom the name and birth dates were a match (allowing for minor spelling differences) were considered a successful match. Groups for which 90% of records were not successfully matched were not included in the cohort. As a result, six groups were included in the final probabilistic linkage.

The initial cohort was formed by combining the deterministic links of respondents who answered the long-form questionnaire and the probabilistic links. At this point, persons residing in institutional collective dwellings were excluded, because the series of questions on socioeconomic and demographic characteristics was not asked. The majority of cohort members (92.6%) were linked by deterministic linkage, for which 99.9% of records were considered true links (n=169 manual record reviews). Fewer cohort members (7.4%) were linked in the first six probabilistic linkage groups, for which 98.9% of links were estimated to be true links (n=478 manual record reviews). The final analytical size of the cohort was 3,537,490, and 99.8% of the records were estimated to be true links (i.e., the false-positive error rate was less than 0.2%).

From the original 6,448,980 respondents to the 2001 Census long-form questionnaire (Form 2B or 2D), a total of 4,500,245 persons were considered to be “in scope.” The in-scope population excluded persons who lived in institutions, persons who resided overseas, and persons who were younger than 19 years old on Census Day. Respondents of younger ages were excluded because of a much lower linkage rate to T1 tax records (data not shown). Sample sizes of all exclusions are provided in a flowchart in Figure 1.

Figure 1 Steps in the derivation of the cohort, from the census respondents to the in-scope population and the linked (cohort) population

Description for Figure 1

The title of Figure 1 is “Steps in the derivation of the cohort, from the census respondents to the in-scope population and the linked (cohort) population.

This flowchart shows 5 steps that serve to establish the number of persons in the linked (cohort) population. Step 1, Respondents to the 2001 Census long-form questionnaire: In this step, the number of persons who responded to the questionnaire is established: 6,448,980 persons. Step 2, non-institutional residents: in this step, the number of institutional residents (368,060) is excluded from the population; there are 6,080,920 persons (non-institutional residents) remaining. Step 3, living in Canada: in this step, the number of persons living overseas (5,275) is excluded from the population; there are 6,075,645 persons (living in Canada) remaining in the population. Step 4, aged 19 years or older: the number of persons younger than 19 years (1,575,400) is excluded from the population; there are 4,500,245 persons (aged 19 years or older) remaining in the population. Step 5, linked to cohort: the number of persons not linked to the T1 tax file (962,725) is excluded from the population; from the original 6,448,980 respondents to the 2001 Census long-form questionnaire, there are 3,537,520 persons linked to the T1 tax file.

The author calculated these figures based on the 2001 Census of Population and the 2014 Amalgamated Mortality Database.

Of the in-scope 2001 Census respondents, 78.6% of respondents (n=3,537,520, or 15.1% of the population aged 19 or older) were linked to the cohort (i.e., linked to a T1 tax file). This percentage was similar to that of the 1991 Census cohort, where 80.0% of in-scope census respondents were linked to T1 tax files (before a random sample was removed to reach a 15% sample of the Canadian population) (Statistics Canada 2015b). To examine any differences between the linked cohort and the in-scope population, the number of respondents with selected socioeconomic and demographic characteristics was calculated for each group and compared.

The second step of the linkage was to link the cohort to the AMDB database using SINs, which were first added to the cohort through linkage to T1 records. The linkage of T1 tax files to the cohort, using SINs, also provided historical postal codes compiled by the Canada Revenue Agency. This additional data would allow for improved environmental exposure by taking into account change in residence over time (from 1981 to 2011).

3.2 Analytical file

Analytical files included all of the variables collected in the long-form census questionnaire, including content on employment, education, income, occupation, immigrant status, community size, Aboriginal identity and visible minority status. There are a few key differences in variable definitions between the 1991 and 2001 cohorts. Aboriginal status in 1991 was determined based on answers to the ethnic origin question. However, the 2001 Census included a question on respondents’ perception of their own Aboriginal identity (Statistics Canada 2003). Similarly, visible minority status was determined using a direct question in 2001, whereas it was inferred in 1991 based on answers to other questions, including on ethnic origin and ancestry (Statistics Canada 2003).

Several derived variables based on census variables were created in the analytical file for the purpose of analysis, using methods described by Wilkins et al. (2008). Employment status was grouped into three categories: employed, unemployed, and not in the labour force. Educational attainment was grouped into four categories: not a high school graduate, high school graduate with or without a trades certificate, postsecondary (non-university) certificate or diploma, and university degree. Occupation was grouped into five occupational categories (professional; managerial; skilled, technical or supervisory; semi-skilled; and unskilled) and one non-occupation category, based on the National Occupational Classification variable and on the same criteria used in 1991.

Quintiles (and deciles) of income adequacy were created with the same methodology used by Wilkins et al. (2008). First, the ratio of pre-tax income of economic families (or unattached individuals) to the Statistics Canada low-income cut-off for family and community size was calculated. Then, this ratio was used to rank the non-institutional population to construct quintiles and deciles, both nationally and within each census metropolitan area or census agglomeration and rural and small-town area. Area-based income adequacy ratios adjust for regional differences in family economic status, such as housing costs.

For all deaths during the follow-up period, the AMDB dataset includes the date of death recorded by the AMDB (n=347,000 deaths). For all deaths that were recorded in the CMDB (96.6% of all deaths), the dataset also includes the underlying cause of death, coded using the World Health Organization’s International Classification of Diseases, version 10 (ICD-10) (World Health Organization 1992). It also includes a variable indicating autopsy-confirmed deaths. The approximately 11,800 deaths that were recorded only in the T1 tax files do not include data on cause of death. A small number of deaths (approximately 200) had incomplete date-of-death records in the AMDB or occurred at the very start of the follow-up period; these deaths were not included in the mortality analyses. As a result, and also because of differences in the age limits that were used, the sums of deaths are slightly different from those in the cohort overview.

3.3 Mortality analyses

For each cohort member, person-years at risk were calculated based on the number of years in the follow-up period from Census Day (May 15, 2001) to death (based on the AMDB date of death) or, if the person did not die, to the end of the follow-up period (December 31, 2011). The number of person-years at risk for cohort members who did not die was 10.6 years.

As one method of external validation, the percentage of cohort members who survived until the end of the follow-up period was calculated for every single year of age (at baseline), and for each sex. This survival curve was compared with the expected percentage of survival for a 10.6-year period based on the 2005-to-2007 life tables for Canada (Statistics Canada 2013).

Abridged life tables were constructed to estimate remaining life expectancy at age 25, by sex, and also by educational attainment, income adequacy quintile, Aboriginal identity and visible minority status. Life tables were constructed using five-year age intervals and included cohort members aged 20 to 100, using the methods outlined by Chiang (1984).

Age-standardized mortality rates (ASMRs) were calculated based on the population aged 20 to 100, by sex and age (five-year age groups at baseline), for different socioeconomic and demographic population groups. The 1991 standard population of Canada was used to calculate standard weights for most of the population groups. For Aboriginal identity, internal weights were derived based on the Aboriginal population within the same dataset, since Aboriginal populations in Canada have a different age structure than the general population. The 95% confidence intervals were calculated as described by Carrière and Roos (1997). ASMR rate ratios (RRs) and rate differences (RDs) were calculated in a similar manner.

Cox proportional hazard ratios for all-cause mortality were calculated for both sexes, and for men and women separately, for different socioeconomic and demographic population groups (Cox 1972). Hazard ratios were adjusted by age (five-year classes) and, in the case of the whole cohort, by sex.

Finally, the number of deaths in the cohort during the follow-up period was determined by selected causes of death, as per ICD-10 codes. The proportion of each cause of death to all CMDB deaths (for which an ICD-10 code was available) was calculated.

All counts (including counts in Figure 1 and all tables) have been randomly rounded to base 5 to ensure confidentiality in analyses.

4. Results

4.1 Cohort characteristics

A total of 3,537,520 persons (or 79% of in-scope 2001 Census Forms 2B and 2D respondents) were linked to T1 tax files (for 2000 and 2001) and were followed for mortality until the end of the follow-up period (10.6 years). The percentage of respondents who were linked varied among demographic and socioeconomic groups. Table 1 provides an overview of the in-scope respondents by selected characteristics, the in-cohort population, and the percentage of the cohort in the in-scope population (by the same characteristics). The success rate for linkage to tax files (and the cohort) was lower for younger persons (66% linked for ages 19 to 24) and persons of any Aboriginal identity (64% linked). As expected, linkage percentages were lower for persons who had moved in the past year (63% linked), since they were less likely to have been matched to a postal code, which was one of the linkage keys (Table 1).

4.2 Mortality analyses

Chart 1 provides one measure for validating survival in the cohort. For every age at baseline and for each sex, the percentage of cohort respondents who survived until the end of the follow-up period (10.6 years) was compared with the same measure calculated from the 2005-to-2007 Canada life tables for the same period. The survival curves were almost identical for the cohort and life-table estimates for men and women, although there were some small differences for older ages (80 years or older). These are likely because of a much-reduced cohort size and the exclusion of institutionalized adults, such as seniors in nursing homes.

Chart 1 Percentage of respondents surviving 10.6 years, by age and sex, followed for mortality from 2001 to 2011, compared with Canada life tables (CANSIM) for 2005 to 2007

Data table for Chart 1
Chart 1
Percentage of respondents surviving 10.6 years, by age and sex, followed for mortality from 2001 to 2011, compared with Canada life tables (CANSIM) for 2005 to 2007
Table summary
This table displays the results of Percentage of respondents surviving 10.6 years. The information is grouped by Age (appearing as row headers), Life table, men, Cohort, men, Life table, women and Cohort, women, calculated using percentage surviving units of measure (appearing as column headers).
Age Life table, men Cohort, men Life table, women Cohort, women
percentage surviving
19 99.13 99.04 99.67 99.56
20 99.12 99.10 99.66 99.58
21 99.12 99.04 99.66 99.60
22 99.12 99.05 99.65 99.53
23 99.12 99.02 99.63 99.55
24 99.11 99.16 99.61 99.61
25 99.09 99.10 99.59 99.47
26 99.06 99.00 99.56 99.50
27 99.02 99.07 99.52 99.39
28 98.98 98.97 99.48 99.41
29 98.92 98.99 99.43 99.49
30 98.86 98.92 99.38 99.39
31 98.79 98.88 99.32 99.22
32 98.71 98.91 99.26 99.24
33 98.62 98.74 99.20 99.16
34 98.53 98.75 99.12 99.07
35 98.42 98.59 99.04 99.10
36 98.30 98.57 98.95 98.92
37 98.17 98.44 98.86 98.89
38 98.02 98.26 98.75 98.69
39 97.86 98.10 98.64 98.72
40 97.68 97.94 98.52 98.65
41 97.48 97.64 98.38 98.46
42 97.25 97.59 98.24 98.03
43 97.00 97.18 98.08 98.00
44 96.72 96.93 97.90 97.89
45 96.41 96.75 97.71 97.68
46 96.07 96.32 97.51 97.62
47 95.69 95.97 97.28 97.20
48 95.28 95.74 97.03 97.00
49 94.82 94.94 96.75 96.78
50 94.32 94.67 96.45 96.54
51 93.77 94.21 96.11 95.98
52 93.17 93.61 95.75 95.83
53 92.51 93.07 95.34 95.53
54 91.78 92.29 94.90 94.92
55 91.00 91.47 94.41 94.26
56 90.13 90.42 93.87 93.44
57 89.19 89.73 93.27 93.14
58 88.16 88.98 92.61 92.46
59 87.04 87.32 91.89 91.60
60 85.82 86.38 91.09 90.92
61 84.50 84.78 90.21 89.59
62 83.05 83.57 89.24 88.98
63 81.49 81.91 88.18 88.00
64 79.79 80.49 87.00 86.39
65 77.96 77.99 85.71 85.37
66 75.98 76.76 84.28 83.66
67 73.85 74.46 82.72 82.95
68 71.57 71.78 81.01 80.76
69 69.12 69.29 79.13 78.94
70 66.50 66.88 77.08 76.74
71 63.72 63.53 74.84 75.00
72 60.76 60.19 72.41 73.08
73 57.65 56.63 69.76 70.49
74 54.38 53.80 66.90 68.18
75 50.96 50.03 63.81 64.10
76 47.40 45.67 60.50 61.60
77 43.74 42.71 56.96 57.82
78 39.99 38.60 53.21 55.16
79 36.19 35.30 49.26 50.35
80 32.37 31.12 45.12 47.62
81 28.58 26.61 40.85 43.13
82 24.88 22.14 36.51 38.69
83 21.35 20.71 32.17 34.62
84 18.05 16.06 27.94 31.10
85 15.05 13.12 23.87 25.65
86 12.38 11.64 20.06 22.82
87 10.03 10.11 16.57 20.35
88 8.01 8.81 13.45 16.00
89 6.30 5.29 10.73 13.97
90 4.89 4.90 8.41 11.64

The remaining life expectancy at age 25 was 56.8 years for both sexes, 54.6 years for men and 59.0 years for women (Table 2). As expected, the remaining life expectancy was longer for persons with greater educational attainment, persons in a higher income adequacy quintile, and persons who were a member of a visible minority. Notable differences in life expectancy were 6.7 years between the men with the lowest and highest educational attainment, and 6.8 years between the men with the lowest and highest income.

Large differences in the remaining life expectancy at age 25 were observed between Aboriginal persons and non-Aboriginal persons: 5.9 years for men and 7.0 years for women. The difference between Aboriginal persons and non-Aboriginal persons was least among the Métis group (4.2 years for men and 4.1 years for women) and greatest among the Inuit group (9.7 years for men and 11.5 years for women). The remaining life expectancy at age 25 was lower for the Inuit group than for any other group examined: 45.2 years for men and 47.8 years for women (Table 2).

ASMRs, RRs and RDs were calculated for different socioeconomic and demographic groups. Mortality rates were greater among persons who were not employed, persons of lower educational attainment, persons with lower-status occupations (or without an occupation) and persons with lower income. This trend was observed among both men and women (Tables 3 and 4). The socioeconomic gradients in mortality were steeper for men than for women for all of these characteristics (Tables 3 and 4).

Among Aboriginal cohort members, ASMRs, RRs and RDs indicated greater mortality rates for persons of Aboriginal status, particularly among Inuit respondents (Tables 3 and 4). Unlike with socioeconomic indicators, the differences in mortality (rate ratios) between non-Aboriginal persons and Aboriginal persons (and those who identify as only North American Indian, Métis or Inuit) were greater among women than men (Tables 3 and 4).

Persons who were members of a visible minority had lower ASMRs, RRs and RDs than those who were not, and these differences in rates were similar for men and women. The lowest mortality rates were observed for men and women of the Chinese and Southeast Asian populations (Tables 3 and 4). This grouping of visible minorities does not differentiate between foreign-born and Canadian-born populations.

Immigrants had lower ASMRs, RRs and RDs than the Canadian-born population, and this effect was particularly pronounced for men and women who immigrated in the most recent immigration years (Tables 3 and 4).

Urban cohort members had the lowest mortality (by ASMR, RR and RD), with mortality observed to increase as community size decreases. This disparity was slightly greater among men than women (Tables 3 and 4).

Cox proportional hazard ratios were determined for the same socioeconomic and demographic groups as in Tables 3 and 4, and the patterns of mortality were the same as those indicated by the previous analyses. Mortality risk was greater for persons who were not employed, had lower educational attainment, were part of lower income quintiles, or had lower-status occupations. The risk of mortality was greater for Aboriginal persons than non-Aboriginal persons, and greater for non-visible-minority populations than visible-minority populations (Table 5).

Table 6 provides counts of deaths by different causes of death. Approximately 34.4% of the CMDB deaths in the cohort during the follow-up period were caused by cancers (neoplasms), 30.4% were caused by circulatory diseases, 8.4% were caused by respiratory diseases, and 5.5% were caused by external causes, including both unintentional injury and intentional injury.

5. Discussion

Mortality analyses of the 2001 Canadian Census–Tax–Mortality Cohort (2001 Census cohort) indicated that mortality rates were generally higher among persons who were not employed, had lower-status occupations, had lower educational attainment and were in lower income adequacy quintiles (both national and area-based quintiles). Mortality rates were higher among Aboriginal respondents (particularly among Inuit cohort members) and were lower among persons of visible minority status and immigrants. These results are consistent with analyses of the 1991 Census cohort (Wilkins et al. 2008) and broadly consistent with those of cohorts in other countries (e.g., Blakely et al. 2002; Stringhini et al. 2012).

One particular strength of the cohort is the use of the census long-form questionnaire, which was distributed to approximately 20% of the Canadian population, providing a large analytical cohort (3.5 million respondents, including 347,000 deaths). Successful linkage to the in-scope population was similar to that of the 1991 Census cohort. Based on a manual review, the false positive error rate was estimated to be very low (less than 0.2%). Detailed cause-of-death information is also available for 96.6% of all deaths recorded in the cohort.

Although the 2001 Census cohort was broadly representative of national trends, comparing the characteristics of the cohort with those of the in-scope population indicated some notable differences that were primarily caused by the linkage to T1 tax records—there were likely missed links. Since cohort members needed to be linked to a T1 tax file so that their SIN could be used as a linkage key for mortality, the cohort population was different in that it included a greater proportion of persons who submitted T1 tax returns. The lower-than-expected proportions of cohort members who were younger (aged 19 to 24) or older (aged 85 and older), were not married or in a common-law union, were in the lowest income adequacy quintile, or reported an Aboriginal identity (Table 1) were likely attributable to lower rates of taxfilers among these groups. Persons who had moved in a previous year were also less likely to have been linked to the cohort, probably because the postal code of residence was used as one of the linkage keys.

In comparing the cohort with the estimated characteristics of the general population, these same differences were still observed. For example, the cohort had a higher-than-expected proportion of respondents who were married or in a common-law union (69%), compared with estimates calculated from the 2001 Census data (61.8%) (Statistics Canada 2015a). Similarly, the cohort had a lower-than-expected percentage of persons who had not graduated from high school (28%), given the census estimates (33.8%, based on persons aged 15 years or older) (Statistics Canada 2009).

Survival curves based on the cohort and 2005-to-2007 life tables were nearly identical for both men and women, indicating a strong concordance with nationally representative data. Small differences for older age groups are likely attributable to the smaller numbers of cohort members in these age groups and the exclusion of institutionalized adults, such as seniors in nursing homes.

Though the estimation method used differed slightly, remaining life expectancy at age 25 was generally similar to that reported from the 1991 Census cohort, though it was higher for men (54.6 years compared to 52.6 years in the 1991 Census cohort) (Wilkins et al. 2008; Tjepkema and Wilkins 2011). Remaining life expectancy at age 25 was about one year longer in the 2001 Census cohort than the remaining life expectancy calculated from the 2005-to-2007 life tables: 53.7 years for men and 58.0 years for women (Statistics Canada 2013). This slightly longer life expectancy might be explained by the slightly higher socioeconomic status of the cohort members than the general population (i.e., greater educational attainment), and the fact that the census was more likely to miss persons of lower socioeconomic status (Statistics Canada 2014). However, disparities in the remaining life expectancy between the lowest and highest income quintiles, between the lowest and highest levels of educational attainment, and between Aboriginal and non-Aboriginal respondents were similar in the 1991 and 2001 cohorts (Wilkins et al. 2008; Tjepkema and Wilkins 2011).

In general, the mortality statistics were similar for men and women in the 1991 and 2001 cohorts, though ASMR values were slightly higher in 2001. This difference was likely because the current analysis included CMDB and tax-only deaths (from the AMDB), whereas the 1991 paper included only the deaths recorded in the CMDB (Wilkins et al. 2008). The RRs in both cohorts were very similar among socioeconomic and demographic groups. However, there were some notable exceptions. Men in the poorest income adequacy quintile (area-based) had a lower RR in 2001 (RR=1.57; 95% confidence interval: 1.54 to 1.60) than in 1991 (RR=1.68; 95% confidence interval: 1.65 to 1.71). In 1991, women in lower-status occupations and lower income adequacy quintiles had greater risk ratios (i.e., a steeper mortality gradient) than those reported in 2001. Hazard ratios were generally similar among educational attainment groups, occupational groups, and income adequacy quintiles between the 1991 and 2001 cohorts (Wilkins et al. 2008).

The 2001 Census cohort is an analytical dataset that can be used to examine differences in mortality among socioeconomic and population groups, and that can also be used for environmental health research. The results of the mortality analyses are consistent with those of the 1991 Census cohort and the 2005-to-2007 life tables. Further analysis may be warranted to examine patterns of mortality across other socioeconomic dimensions. It may also be possible to combine the 1991 and 2001 cohorts to examine patterns of cause-specific mortality for rare causes of death or among smaller population groups.

6. Conclusion

This paper describes the linkage of the 2001 Census long-form questionnaire respondents aged 19 years or older, living in Canada, and not living in institutions, to the Amalgamated Mortality Database, which follows cohort respondents for mortality over a 10.6-year period (to the end of 2011). In general, the survival of cohort respondents was similar to that calculated from Canada life tables, and mortality statistics were similar to those that were expected based on a similar cohort from a previous decade. The cohort was slightly different—more likely to be married or in a common-law union, have higher income, have higher educational attainment, and be employed—because the linkage methodology relied on respondents being taxfilers.


Blakely, T., A. Woodward, N. Pearce, C. Salmond, C. Kiro, and P. Davis. 2002. “Socioeconomic factors and mortality among 25-64 year olds followed from 1991 to 1994: The New Zealand census-mortality study.” The New Zealand Medical Journal 115 (1149): 93−97.

Carrière, K.C., and L. Roos. 1997. “A method of comparison for standard rates of low-incidence events.” Medical Care 35 (1): 57−69.

Chiang, C.L. 1984. The Life Table and Its Applications. Ed. R.E. Krieger. Malabar, Florida: Krieger Pub.

Cho, K.H., C.M. Nam, E.J. Lee, Y. Choi, K.-B. Yoo, S.-H. Lee, and E.-C. Park. 2016. “Effects of individual and neighbourhood socioeconomic status on the risk of all-cause mortality in chronic obstructive pulmonary disease: A nationwide population-based cohort study, 2002-2013.” Respiratory Medicine 114: 9−17.

Cox, D.R. 1972. “Regression models and life tables.” Journal of the Royal Statistical Society: Series B 34 (2): 187−220.

Crouse, D.L., P.A. Peters, A. van Donkelaar, M.S. Goldberg, P.J. Villeneuve, O. Brion, S. Khan, D. Odwa Atari, M. Jerrett, C.A. Pope III, M. Brauer, J.R. Brook, R.V. Martin, D. Stieb, and R.T. Burnett. 2012. “Risk of nonaccidental and cardiovascular mortality in relation to long-term exposure to low concentrations of fine particulate matter: A Canadian national-level cohort study.” Environmental Health Perspectives 120 (5): 708−714.

Crouse, D.L., P.A. Peters, P.J. Villeneuve, M.O. Proux, H.H. Shin, M.S. Goldberg, M. Johnson, A.J. Wheeler, R.W. Allen, D. Odwa Atari, M. Jerrett, M. Brauer, J.R. Brook, S. Cakmak, and R.T. Burnett. 2015. “Within- and between-city contrasts in nitrogen dioxide and mortality in 10 Canadian cities; a subset of the Canadian Census Health and Environment Cohort (CanCHEC).” Journal of Exposure Science and Environmental Epidemiology 25 (5): 482−489. doi:10.1038/jes.2014.89.

Fellegi, I.P., and A.B. Sunter. 1969. “A theory for record linkage.” Journal of the American Statistical Association 64 (328): 1183−1210.

Lazzarino, A.I., M. Hamer, E. Stamatakis, and A. Steptoe. 2013. “The combined association of psychological distress and socioeconomic status with all-cause mortality: A national cohort study.” JAMA Internal Medicine 173 (1): 22−27.

Mayer, É., and C. Charbonneau. 2015. Amalgamated Mortality Database (AMDB) 2014b. Ottawa: Household Survey Methods Division, Statistics Canada. Internal report.

Mishra, G.D., F. Chiesa, A. Goodman, B. De Stavola, and I. Koupil. 2013. “Socio-economic position over the life course and all-cause and circulatory diseases mortality at age 50-87 years: Results from a Swedish birth cohort.” European Journal of Epidemiology 28 (2): 139−147.

Statistics Canada. 2003. 2001 Census Dictionary. Statistics Canada Catalogue no. 92-378-X. Ottawa: Statistics Canada.

Statistics Canada. 2009. Population 15 years and over by highest degree, certificate or diploma (1986 to 2006 Census) (table). Census of Population (database). Last updated October 6, 2009. (accessed March 22, 2016).

Statistics Canada. 2013. Life Tables, Canada, Provinces and Territories 2005 to 2007. Statistics Canada Catalogue no. 84-537-X. Ottawa: Statistics Canada.

Statistics Canada. 2014. Coverage, 2001 Census Technical Report. Statistics Canada Catalogue no. 92-394-X. Ottawa: Statistics Canada.

Statistics Canada. 2015a. Table 051-0042 Estimates of population, by marital status or legal marital status, age and sex for July 1, Canada, provinces and territories (table). CANSIM (database). Last updated October 30, 2015. (accessed March 22, 2016).

Statistics Canada. 2015b. User Guide: 1991 Canadian Census Cohort: Mortality and Cancer Follow-up. Ottawa: Statistics Canada. Research Data Centre user guide.

Stringhini, S., L. Berkman, A. Dugravot, J.E. Ferrie, M. Marmot, M. Kivimaki, and A. Singh-Manoux. 2012. “Socioeconomic status, structural and functional measures of social support, and mortality: The British Whitehall II cohort study, 1985-2009.” American Journal of Epidemiology 175 (12): 1275−1283. doi:10.1093/aje/kwr461.

Tjepkema, M., and R. Wilkins. 2011. “Remaining life expectancy at age 25 and probability of survival to age 75, by socio-economic status and Aboriginal ancestry.” Health Reports 22 (4): 31−36. Statistics Canada Catalogue no. 82-003-X.

Weichenthal, S., D.L. Crouse, L. Pinault, K. Godri-Pollitt, E. Lavigne, G. Evans, A. van Donkelaar, R.V. Martin, and R.T. Burnett. 2016. “Oxidative burden of fine particulate air pollution and risk of cause-specific mortality in the Canadian Census Health and Environment Cohort (CanCHEC).” Environment Research 146: 92−99.

Wilkins, R., M. Tjepkema, C. Mustard, and R. Choinière. 2008. “The Canadian census mortality follow-up study, 1991 through 2001.” Health Reports 19 (3): 25−43. Statistics Canada Catalogue no. 82-003-X.

World Health Organization. 1992. International Statistical Classification of Diseases and Related Health Problems. Tenth Revision. Geneva: World Health Organization.
Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: