Health Reports
Cohort profile: The Canadian Census Health and Environment Cohorts (CanCHECs)

by Michael Tjepkema, Tanya Christidis, Tracey Bushnik and Lauren Pinault

Release date: December 18, 2019

DOI: https://www.doi.org/10.25318/82-003-x201901200003-eng

The reduction and elimination of health inequalities is an ongoing goal of health policy in Canada.Note 1Note 2Note 3 To support and inform progress toward this goal, a sample of 1991 Census respondents and 10 years of mortality data were linked in 2008.Note 4 This dataset was then used to determine the distribution of mortality outcomes across groups defined by income, education, occupation, marital status, language, ethnicity, immigration status, Indigenous identity and disability status. In 2009, approval was granted to add more years of mortality follow-up, and to include cancer incidence data and annual place of residence data.Note 5 The primary purpose of this expanded dataset was to assess the impact of long-term exposure to air pollution on human health, with the objective to inform the development of Canada-wide standards for key criteria pollutants.Note 6 In subsequent years, a 2001 Census linked datasetNote 7 and a 1996 Census linked datasetNote 8 were created and branded as the Canadian Census Health and Environment Cohorts (CanCHECs).

These linked datasets were instrumental in examining health inequalities and the impacts of environmental exposures on mortality. However, they had important differences in population eligibility, linkage methodology, health outcomes and availability of sampling weights. This limited their full analytical potential, particularly their capacity for inter-cohort comparisons. The routine, ongoing measurement of population health status indicators is essential to assess progress in the reduction of inequalities by population and socioeconomic group.Note 9Note 10 Thus, a more standardized approach to creating the CanCHEC datasets was warranted.

In 2017, a Statistics Canada initiative called the Social Data Linkage Environment (SDLE) was approved. The goals of the SDLE were to promote the innovative use of existing administrative and survey data, to address important questions, and to inform socioeconomic policy through record linkage.Note 11 The SDLE provided an opportunity to create a series of CanCHEC datasets using a common methodology in an efficient and cost-effective manner.

The new CanCHEC datasets (census years 1991 to 2011) provide a rich national data resource that can be used to measure and examine health inequalities (e.g., mortality, cancer incidence, hospitalizations) across socioeconomic and ethnocultural dimensions for different periods and locations. These datasets can also be used to examine the effects of exposure to environmental factors on human health. The main CanCHEC objectives are

Because of their large size, the CanCHECs are an excellent resource for examining rare health outcomes and small population groups. They are ideally suited for environmental health research because of their geographic coverage across all regions of Canada, their long follow-up periods and their linkage to annual postal code history.

Data resource description

Creation of the cohorts

The CanCHECs are a series of population-based, probabilistically linked datasets that combine data from respondents to the long-form census or the 2011 National Household Survey (NHS) with administrative health data (e.g., mortality, cancer incidence, hospitalizations, emergency ambulatory care) and annual postal code history.

Individuals were eligible to be included in the CanCHECs if they were usual residents of Canada on Census Day (including permanent and non-permanent residents) and if they were in the long-form censusNote 12 or 2011 NHSNote 13 records. The institutional population (e.g., those living in nursing homes, penitentiaries, group homes) at the time of census collection was not eligible for CanCHEC inclusion, and the 2011 CanCHEC excludes all collective households (including both institutional and non-institutional). Census data quality reports show that a small proportion of the Canadian population (⋜4.3%) is missed in any given census. In general, these individuals are more likely to be young, mobile, low income, homeless or Indigenous.Note 14 Unlike previous census years, participation in the 2011 NHS was voluntary. As a result, the response rate for the NHS was lower (69%) than the response rates for censuses (∼95%).

The CanCHEC datasets were created using the SDLE, which facilitates the creation of linked population data files using the Derived Record Depository (DRD). The DRD is a dynamic relational database that contains only basic personal identifiers. It was created from birth, death, immigration and tax files. Eligible census and NHS respondents were probabilistically and deterministically linked to the DRD using standard record linkage methodology. Based on previously linked census–tax datasets,Note 4Note 7Note 8 the 1991, 1996 and 2001 CanCHECs were deterministically linked to the DRD using social insurance numbers. The 2006 and 2011 CanCHECs were created within the SDLE from a probabilistic linkage between eligible census and NHS records and the DRD. The linkage rate of in-scope census and NHS records to the DRD differed by age group, marital status, place of residence, socioeconomic status and Indigenous identity.Note 4Note 7Note 8 To account for these differences in linkage rates, and to ensure representativeness for each CanCHEC cohort, weights were created from existing census and NHS weights to adjust for non-linkage to the DRD. Bootstrap weights were created to account for variance. Table 1 shows the age eligibility, number of in-scope census and NHS records, cohort size, cohort size compared with the number of in-scope records (percentage), cohort size with annual postal codes, and estimated population for each CanCHEC year. Figure 1 shows the flow of how the CanCHECs were created.

Administrative health data

After the CanCHECs were linked to the DRD, administrative data (previously linked to the DRD) were linked to the CanCHEC datasets. Currently, data from the Canadian Vital Statistics DeathNote 15 Database (CVSD), Canadian Cancer RegistryNote 16 (CCR), Discharge Abstract DatabaseNote 17 (DAD) and National Ambulatory Care Reporting SystemNote 18 (NACRS) are available to be linked to the CanCHEC cohorts (the administrative data linked to each CanCHEC varies by province; see Table 2). The linkage rate for applicable years of these health administrative datasets to the DRD ranged from from 92.7% to 95.8% for the DAD and NACRS, from from 98.1% to 99.1% for the CCR, and from 99.6% to 99.9% for the CVSD.

Annual postal codes

The T1 Personal Master File (T1PMF) was the principal data source for addresses—including postal codes—in the DRD. Every resident of Canada who earns taxable income is required to complete an income tax return, or T1 form, after the end of the year in which the income was received. Therefore, the T1PMF includes almost all individuals who filed an individual T1 tax return for the reference year. However, some late filers may not be included. A filer’s address may be updated later in the year if they communicated with the Canada Revenue Agency. The DRD also obtains addresses from other files, such as the Canadian Vital Statistics Birth Database, the CVSD, the Canada Child Tax Benefit Identifier file and the Immigrant Landing File.

Annual postal codes (available as early as 1981) were linked to the cohort members aged 15 or older within the DRD. While the earlier CanCHECs had at least one annual postal code linked to each record, this was not the case for the more recent CanCHECs for two reasons. First, the 2006 and 2011 CanCHECs include long-form census respondents younger than age 15. Second, only those who agreed on both the 2006 Census and 2011 NHS to have their information linked to their tax records could have annual postal codes linked to their records.

Between 1981 and 2016, the overall percentage of cohort members aged 20 or older with at least one annual postal code was between 93% and 95%, depending on the CanCHEC year.

CanCHEC characteristics

Each CanCHEC consists of data from long-form census records or NHS records, mortality records, and cancer incidence records—with hospitalization records and emergency ambulatory care records available for some provinces, starting with the 2006 CanCHEC. The long-form censuses and the NHS collected a wide range of information on topics such as education, labour, income, language, immigration, ethnocultural diversity, Indigenous identity and housing (Table 3). Administrative health outcomes linked to the CanCHECs include underlying and contributing causes of death, date of death, primary cancer diagnoses, date of cancer diagnoses, inpatient hospitalizations, admission and discharge dates, length of stay, procedures and diagnoses, emergency room diagnoses, and date of emergency room arrival.

Because exclusion criteria were applied during the creation of the CanCHECs (i.e., CanCHEC records include only the non-institutional population at baseline who has been enumerated by the census or NHS), the cohort population is healthier than the Canadian population. Linkage error (estimated to be small) is also present (e.g., false negative or missed links to a health outcome file) and could result in underestimations (e.g., underestimated mortality rates, hospitalization rates, cancer incidence rates). When compared with official mortality tables, which are based on the total Canadian population, CanCHEC mortality rates are similar for younger age groups and lower for the oldest age groups—particularly among respondents aged 85 and older (Table 4). This pattern is consistent with the age distribution of the institutional population. The proportion of the population living in institutions—and thus excluded from the CanCHECs—is lowest in the younger age groups (less than 1%) and highest in the oldest age groups (greater than 5% for those aged 80 and older) (data not shown).

Annual postal code history is an important feature of the CanCHECs, particularly for environmental health research and attaching environmental exposures. More than two-thirds of CanCHEC cohort members aged 20 or older had complete postal code histories, with the 2011 CanCHEC having the highest completion rate, at 74% (Table 5). The percentage of CanCHEC members who moved (i.e., changed postal codes) in any given year was between 6.0% and 10.0%, depending on the cohort. The percentage who moved to a different region of Canada, based on the first letter of the postal code, was between 1.1% and 1.7% per year.

Data resource use

The CanCHECs have been widely used over the past decade across a range of topics, resulting in more than 65 peer-reviewed journal articles since 2008. Research conducted using CanCHEC data can be grouped into four broad research themes: health outcomes by socioeconomic position such as income, education and occupational skill level; health outcomes by specific occupation, occupational group and job characteristic; health outcomes by population group, such as Indigenous peoples and immigrants; and the measurement of the effect of environmental exposures on health.

Socioeconomic position

CanCHEC data have been used extensively to examine socioeconomic indicators and their relationship with life expectancy, cause-specific mortality, cancer incidence and hospitalizations. Research findings highlighting these inequalities have been cited in published reports by the Chief Public Health Officer of CanadaNote 2 and the Organisation for Economic Co-operation and Development.Note 19 In general, higher levels of educational attainment, household income and occupational skill are all linked to increased longevity,Note 20 lower mortality rates for most causes of death,Note 21Note 22Note 23Note 24 lower rates of cancer incidence for many cancersNote 25Note 26 and lower rates of opioid-related hospitalizations.Note 27 Further, mortality inequalities by socioeconomic status appear to have increased between 1991 and 2016 for both all-cause mortalityNote 28 and health-adjusted life expectancy,Note 29 suggesting that the reduction of health inequalities remains a relevant issue in Canada. In general, the CanCHEC datasets have been identified as an important resource to examine the effects of socioeconomic status on mortality.Note 30 Going forward, the CanCHECs are well suited to study the intersection of multiple dimensions of socioeconomic status and population groups across different periods.

Occupation

Research identifying elevated risks of mortality and cancer in specific occupations has also benefitted from the CanCHECs. For example, cancer incidence rates by cancer site have been estimated for police officers, firefighters, armed forces personnel,Note 31 welders,Note 32 agricultural workers,Note 33 workers employed in more sedentary occupationsNote 34 and workers who may be exposed to whole-body vibration.Note 35 For mortality outcomes, suicide by major and minor occupation group has been estimated.Note 36 All-cause and cardiovascular mortality rates have been estimated by job overqualification status,Note 37 and avoidable mortality rates have been estimated by occupation.Note 38 These findings can contribute to occupational surveillance programs and can help identify at-risk occupations.

Population groups

The Truth and Reconciliation Commission of Canada identified enhanced data as a priority in their Calls to Action.Note 39 CanCHEC data can respond to this priority by overcoming previous methodological challenges in estimating health outcomes such as life expectancy and cancer burden among First Nations people, Métis and Inuit living in Canada. CanCHEC research has shown that life expectancy is substantially and consistently shorter for First Nations people, Métis and Inuit compared with the non-Indigenous population across all CanCHEC years (1991 to 2011).Note 40 The magnitude of mortality inequalities between non-Indigenous and First Nations people and Métis varies considerably by cause of death.Note 41 Using the 1991 CanCHEC and the linked cancer data, research has shown that First Nations people and Métis have lower cancer survival rates and higher incidence rates for some cancer sites.Note 42Note 43Note 44 Going forward, the CanCHEC data can be an excellent resource for ongoing surveillance of mortality, cancer and hospitalization outcomes for Indigenous people in Canada. These results will be able to identify inequalities not previously estimated and measure any progress made to reduce these gaps.

Researchers have used CanCHEC data to examine mortality and cancer outcomes for immigrants. Several studies have demonstrated a healthy immigrant effect—lower mortality and cancer rates among immigrants compared with the Canadian-born population—with the greatest advantage observed among recent immigrants.Note 45Note 46Note 47Note 48 These same studies have shown significant variation across immigrant subgroups by region of birth, time in country, cause of death and cancer site. In the future, it will be possible to examine health outcomes by immigrant category over time as new data sources, such as the Immigrant Landing File, are added to the DRD.

Environmental health

During the past two decades, the CanCHEC datasets have been used as a basis for influential work in Canadian environmental health, particularly in the field of air pollution. The availability of annual postal code history has enabled researchers to follow cohort members through time with a reasonably accurate residential history, and attach them to various environmental exposures.Note 49 Researchers have examined the effects of exposure to long-term ambient air pollution on cause-specific mortality.Note 50Note 51Note 52Note 53 This is of particular importance internationally, since deleterious effects have been observed even at the lower levels of exposure that are typical in Canada. In addition to providing valuable input for reaching international scientific consensus on air pollution and health,Note 54Note 55 these studies have helped national (i.e., the Public Health Agency of Canada)Note 56 and internationalNote 57 bodies identify future challenges for reducing air pollution.

CanCHEC data have been used to link ambient ultraviolet radiation to increasing rates of melanoma,Note 58 a finding that has been informative for public health outreach campaigns in Canada.Note 59

In addition to environmental hazards, CanCHEC data have been used in environmental benefit studies. Living near greenspace or water has been linked to reductions in mortality.Note 60Note 61 Findings from these studies have been included in a knowledge translation tool to share environmental health assessments across Canada.Note 62

Strengths and weaknesses

The CanCHEC datasets have many strengths. By their nature, the datasets fill a gap by linking individual-level national administrative health data that lack socioeconomic and ethnocultural identifiers with individual-level census long-form data that contain these variables. As a result, administrative health outcomes can be examined across characteristics such as income; education; occupation; language; ethnicity; First Nations, Métis and Inuit status; and immigration status. The large sample size and extended follow-up periods allow for the examination of health outcomes for small populations (e.g., First Nations people, Métis, Inuit) and for rare or uncommon disease outcomes, especially those related to occupational exposures.

The CanCHECs offer a breadth of health data—from emergency ambulatory care visits to death records—that can be used to answer a variety of research questions. The DRD is regularly updated to include new years of administrative data and new administrative data sources. Likewise, the CanCHECs can be updated continuously to prolong follow-up periods and to include new data sources.

Trends over time both within and between CanCHECs can now be examined because of the consistent SDLE-based linkage methodology used to create the cohorts. The cohorts can be examined in concert, or stacked for a combined analysis, which is useful for examining rare health outcomes (e.g., rare causes of death).

Annual historical postal codes from 1981 onward provide an indication of where cohort members live year after year, and more importantly, can be used to attach environmental exposures to a person’s place of residence. Postal codes provide a reasonably accurate approximation of residence location in most cities, particularly for urban street addresses or apartment buildings, where they can be geocoded within about 200 metres.Note 63 However, postal codes in rural areas and those that represent businesses or post office boxes can be quite far from the residence’s location (on average, five to six kilometres), creating a limitation to this geocoding approach.

Researchers should consider the following aspects of the CanCHECs when assessing whether the data are appropriate for their needs. Census characteristics are collected only at the time of census collection, and may not reflect changes in a person’s socioeconomic characteristics over their lifespan. No information regarding baseline health (with the exception of activity limitation) or health behaviour risk factors—such as smoking and physical activity—is collected by the census. However, population-level and individual-level indirect adjustment techniques have been applied using Canadian Community Health Survey data.Note 64Note 65 Moreover, administrative health data are available for some years prior to Census Day for certain CanCHECs. Specifically, the CCR has data starting in 1992, the DAD in 2000 and the NACRS in 2002.

The CanCHEC datasets grossly underestimate infant mortality (deaths that occur within the first year after birth) because about three-quarters of all infant deaths occur within the first 28 days. This makes census enumeration uncertain.Note 66 As a result, life expectancy at birth cannot be reliably estimated using the CanCHECs.

The institutional population on Census Day and individuals not enumerated by the census are excluded from the CanCHECs. Because of these exclusions, the population represented by CanCHEC data should be considered a healthier population than the general population. With sampling weights, the CanCHEC data can be considered representative of the non-institutional population at the time of census collection, but bias might exist if those missing from the cohort differed systematically from those who were included.

The annual postal code file is primarily derived from mailing addresses provided on T1 tax records, and may not be associated with a person’s place of residence.Note 49 For example, the mailing address may represent a post office box, an accountant or lawyer’s office, a parent’s address for young adults, or a child’s address for elderly parents. Moreover, missing postal codes are imputed whenever possible. For example, if a person does not file taxes for two years, but uses the same mailing address in their tax filings in the years before and after this gap, the SDLE will automatically impute this address as the mailing address for the missing years. This assumes that the person did not move to a different address in the intervening years before returning to the initial address.

The introduction of the voluntary 2011 NHS was an important methodological change from previous long-form censuses. It is not fully known how this change affects the comparability with CanCHEC cycles that are based on mandatory long-form censuses. However, the development of weights may mitigate some of the comparability issues among census cycles.

Finally, for analysis across CanCHEC years, it is important to consider changes over time to the census questionnaire, and to be aware that some census variables may not be available in a given year, or may have been modified over time.

Data resource access

Approved researchers can access the CanCHECs in research data centresNote 67 (RDCs) and in the Federal Research Data Centre. Information on the RDC Program, including the application process and guidelines, is available at www.statcan.gc.ca/eng/rdc/index.

Acknowledgements

The authors are grateful to all the people, too numerous to name individually, who have worked on these census linkages over the past two decades. The authors would like to acknowledge Russell Wilkins (Statistics Canada, retired), who was instrumental in developing the original 1991-to-2001 Census Mortality Follow-Up Study, and Richard Burnett (Health Canada), who envisioned the enhancement and use of these data for ambient air pollution research. The authors would also like to thank the Canadian Institute for Health Information and Health Canada for their financial assistance in creating some of the earlier linkages.

References
Date modified: