The 2006 Canadian Birth-Census Cohort
by Tracey Bushnik, Seungmi Yang, Michael S. Kramer, Jay S. Kaufman, Amanda J. Sheppard and Russell Wilkins
Reducing health disparities is an ongoing population health goal in Canada and other countries.Note 1 A step toward achieving this goal is to exploit existing data on the nature and extent of variations in health across socioeconomic and ethnocultural groups. Evidence on disparities in perinatal health in Canada has generally been limited to analyses by neighbourhood characteristics,Note 2 or for selected provinces,Note 3Note 4Note 5 owing to a lack of socioeconomic and ethnocultural information in most routinely collected perinatal data.
Perinatal health is monitored and documented by the Canadian Perinatal Surveillance System, led by the Public Health Agency of Canada in collaboration with Statistics Canada. As in previous years, the System’s 2013 reportNote 6 presented data from existing databases, including vital statistics, for live births, stillbirths, and infant deaths. However, these national databases contain little socioeconomic or ethnocultural information.
In 2010, the Canadian Institutes of Health Research funded a project on socioeconomic position, ethnocultural background, and perinatal outcomes. The aim was to link the databases used for perinatal surveillance to long-form census data for 1996 and 2006, thereby creating birth-census cohorts.
Approval for this linkage was granted by Statistics Canada’s Policy Committee (now known as the Executive Management Board) in 2012. The project received ethics approval from the Pediatric Research Ethics Board of the McGill University Health Centre Research Institute, and was supported by the Public Health Agency of Canada, the Canadian Perinatal Surveillance System, and Health Canada.
This article is an overview of the creation, content, and quality of the 2006 Canadian Birth-Census Cohort Database. Similar information about the 1996 Canadian Birth-Census Cohort Database is available in its userguide.Note 7
Births that met the inclusion criteria (within the study period; to women resident in Canada) were selected from a database of previously linked live birth, infant death and stillbirth records. Those records were matched to 2006 Census records from both short- and long-form questionnaires and then restricted to those from long-form questionnaires to create the 2006 Canadian Birth-Census Cohort Database. The match to all census records was performed to reduce the possibility of false positive matches to a long-form record, as birth records best matched to a short-form census record were flagged as ineligible for inclusion in the cohort. It also allowed evaluation of the linkage rate of births to census data across various birth characteristics. To ensure respondent privacy, Statistics Canada employees involved in the process accessed only the identifying information required for linkage, not health-related information. When linkage was completed, identifying information was removed from the final analytical file.
Provincial and territorial vital statistics registrars collect information on live births, stillbirths and deaths in their respective jurisdictions. In collaboration with the registrars, Statistics Canada compiles this information into three national databases, which have been combined to create the Canadian Live Birth, Infant Death and Stillbirth Database. That database is a census of all live births and stillbirths, in which live births have been matched to the Canadian Mortality Data Base to identify infants who died during their first year (0 to 364 days). The database identifies four types of events: live births for which a death record within the first year was not found (births surviving to age 1); stillbirths; deaths during the first year for which a birth registration was found (infant deaths matched to a birth registration); and deaths during the first year for which a birth registration was not found (infant deaths not matched to a birth registration). The database contains information such as birth weight and gestational age; maternal and paternal age at child’s birth; and information about the death such as cause.
From the 1985 to 2008 Canadian Live Birth, Infant Death and Stillbirth Database, 687,340 records of children born in Canada from May 16, 2004 through May 15, 2006 (one day before census day) to mothers whose usual place of residence was Canada were selected. These were the in-scope births. The two-year period before census day was chosen to generate adequate sample size, while limiting the time elapsed between the birth date and census day. The latter is important for the analysis of associations between perinatal outcomes at the time of birth and time-varying characteristics captured on the census such as maternal education.
2006 Census of Population
Using short- or long-form (20% sample) questionnaires, the 2006 Census of Population collected information on individuals living in all households that were enumerated. The short-form questionnaire collected each person’s name, address and postal code, date of birth, sex, marital status, mother tongue, and relationship to “Person 1” (head of household). The long-form questionnaire collected the short-form information, plus data on characteristics such as ethnicity, Aboriginal identity, education, and income.Note 8
Only people enumerated by the census could be linked to corresponding birth records. Reasons for not being enumerated included emigration, death, and census undercoverage. Net undercoverage of the 2006 Census was 2.7% for the population younger than age 5 (an estimated 47,213 children), 2% to 6% for women aged 20 to 44 (undercoverage was greater for women without a partner), and 10.6% for people living in Indian reserves and settlements that participated in the census.Note 9
A total of 30,537,738 individuals were in the 2006 Census data to which the births were initially linked.
Five key linkage variables were available in the Canadian Live Birth, Infant Death and Stillbirth Database: child’s date of birth (DOB), child’s sex, mother’s DOB, father’s DOB, and postal code of mother’s residence at the time of the child’s birth. Names were used only in conjunction with one or more of the above linkage variables. The child’s DOB was complete for all records; sex of the child, postal code and mother’s DOB were complete for almost all records (99%). The father’s information was less complete: year of birth was complete for 95% of births surviving to age 1; 73% for stillbirths; and 87% for infant deaths matched to a birth registration. The child’s DOB and sex could not be used as linkage variables for stillbirths or for infant deaths that occurred before census day, as those children would not have been enumerated in the census household. Nor was parental DOB or postal code at birth available as linkage variables for infant deaths not matched to a birth registration, because of the lack of birth registration data for those events.
Based on a series of linkage rules (referred to as waves) that were ordered hierarchically from most to least discriminatory, each person on the birth record—child, mother, father—was linked deterministically (by exact matches) to the census data. The strongest waves included perfect matches between records for at least two DOBs, the full postal code and child sex, and accounted for 71% of the total number of matches. Waves of lesser strength used names and allowed for links among fewer linkage variables. This was important because up to two years had passed between the day of birth and census day, and the family composition and place of residence listed on the birth record could have changed. The linkage strategy did not allow for individuals on the same birth record to be linked to multiple census households. When this occurred (rarely), the link based on the greater number and/or better quality of linkage variables was retained, and the other links were discarded.
Most waves involved a two-step match process. The initial match was as described above. The second match dealt with instances where not everyone from the birth record was found within a given census household during the initial match. The second step aimed to find, within such census households, the person or persons from the birth record who initially had not been matched. Not finding these other persons could reflect a situation such as a lone-parent family, but it could also have resulted from data error. Thus, the second match included permutations of the linkage variables such as month/day inversions.
Lower linkage rates were achieved for stillbirths and infant deaths, and for births surviving to age 1 to mothers younger than 25. To improve these rates, some linkage constraints were relaxed, such as allowing a first name versus a full name match. Additional potential links to census households that had completed a long-form questionnaire (approximately 1,400) were evaluated manually. The addition of manually approved links increased the combined number of stillbirths and infant deaths by 10 percentage points, and the number of births to young mothers by one percentage point.
When linkage work was complete, a 100% manual review and verification of all in-scope records matched to a census household that had completed a long-form questionnaire showed that the overall false positive match rate was less than 1%.
Creation of analytical cohort
In-scope birth records linked to a census household that completed a long-form questionnaire were considered part of the analytical cohort. Person-, census-family- and economic-family-level census variables were assigned to each person on the birth record (child, mother, father) who had linked to an individual in a census household. Household- and dwelling-level census variables were added to each linked birth record as a whole because all persons on the birth record were presumed to reside in the same household.
Creation of a cohort weight
Because the analytical cohort was roughly a 20% sample of in-scope births, a cohort weight was generated to produce estimates about the characteristics of all in-scope births. The cohort weight was developed from the census household weight, calibrated to marginal totals for the in-scope births, to adjust for missed linkages.Note 10 Those marginal totals were based on known characteristics for which the linkage rates varied, including type of birth event, year of birth, province of birth, and maternal age group. A set of bootstrap weights that captured both sampling and stochastic variability was also developed to allow users to calculate the corresponding variance of an estimate.Note 10
All estimates were produced using SUDAAN 11.0.1. The SMCOUNT and SMCONF options were used to produce small proportion confidence intervals (CIs) proposed by Korn and GraubardNote 11 (also known as “exact” CIs or CIs based on the binomial distribution) for in-scope population and unweighted cohort estimates. Weighted cohort estimates with corresponding logit vs were produced using the cohort weight and the bootstrap weights, respectively.Note 10
The initial match from in-scope birth records to all census records resulted in matches for 90% of the in-scope births. The linkage rates were 91% for births surviving to age 1, 76% for stillbirths, 80% for infant deaths matched to a birth registration, and 0% for infant deaths not matched to a birth registration (Table 1).
Stillbirths and infant deaths were less likely to be linked (fewer linkage variables available), as were births in British Columbia, the Northwest Territories and Nunavut, and births to mothers younger than age 25 or to mothers not born in Canada. Birth records missing a postal code, sex of child, age of mother, or maternal place of birth were also less likely to be linked to a census household (data not shown).
Representativeness of cohort
The 2006 Birth-Census Cohort consists of 135,426 linked records. Cohort membership was dependent on overall linkage rates and on the sampling strategy for the 2006 Census. In 2006, one in five occupied private dwellings in self-enumeration areas (householders completed questionnaires) received a long-form questionnaire. All dwellings in areas enumerated by canvassers (generally, remote and northern areas and most Indian reserves, Indian settlements, Indian government districts and “terres reserves”), and most persons in non-institutional collective dwellings (excluding children in orphanages and children's homes) received a long-form questionnaire.Note 12 This resulted in overrepresentation of certain groups in the long-form questionnaire sample.
Table 1 shows the number of in-scope births and the number of births (unweighted and weighted) in the cohort. Two sets of ratios are presented: the ratio of the percentage of the cohort to the percentage of in-scope records across selected birth characteristics, and the ratio of the percentage of the weighted cohort to the percentage of in-scope records. Categories with a ratio greater than 1 were more likely to be in the cohort. The ratios comparing the unweighted cohort to all in-scope births reflect the sampling strategy of the census: cohort members were more likely to be from Manitoba, Saskatchewan, Yukon, the Northwest Territories and Nunavut, from rural areas, or to be born to mothers younger than age 20; they were less likely to be from Prince Edward Island, to have mothers who were born outside of Canada, or to be from triplets or higher-order births. When the cohort weight was applied, the ratios comparing the weighted cohort to all in-scope births almost always rounded to 1.0.
Rates of five birth outcomes for the cohort and for all in-scope births were compared: preterm birth (less than 37 weeks), small-for-gestational age (SGA, sex-specific birth weight below 10th percentile for gestational age), large-for-gestational age (LGA, sex-specific birth weight above 90th percentile for gestational age), fetal mortality (gestational age of 20 or more weeks or birth weight of at least 500 grams), and infant mortality (death 0 to 364 days after birth). All outcomes were derived as described in the Perinatal Health Report 2008,Note 13 with SGA and LGA based on Canadian reference values.Note 14 Tables 2 and 3 show the rates (with 95% confidence intervals (CIs) for the cohort estimates generated using the bootstrap weights) of those outcomes across province of birth, maternal age at child’s birth, and maternal place of birth. (Because a large percentage of stillbirth and infant death records lacked maternal place of birth, this variable was excluded from Table 3.) Rates for the in-scope population that fell outside the 95% vs for the cohort are noted in the tables. All estimates met the minimum sample size requirement of 5 in both the numerator and the denominator.
The rates of preterm birth in the cohort were generally consistent with those for all in-scope births, in that the 95% vs contained the rates for all in-scope births (Table 2). However, rates of SGA and LGA for the cohort and in-scope births differed across the three characteristics. Applying the cohort weight eliminated the differences by maternal age group and place of birth, but not for certain provinces.
Fetal mortality rates in the cohort differed slightly from those of all in-scope births for Prince Edward Island (no fetal deaths in the cohort) and Manitoba (Table 3). Applying the cohort weight did not adjust for the lack of cohort fetal deaths in Prince Edward Island.
The overall infant mortality rate was lower for the cohort, primarily because no census record could be found for infant deaths not matched to a birth registration. Applying the weight adjusted the overall estimated infant mortality rate to match that of the in-scope population, because the cohort weight was calibrated to infant death totals including the 685 not matched to a birth registration. However, the weight did not adjust for the lack of cohort infant deaths in Prince Edward Island.
Almost all (97%) infant deaths not matched to a birth registration occurred in Ontario. Because of concerns about the quality of Ontario’s birth registration data,Note 13 the province tends to be excluded from most national estimates published by the Canadian Perinatal Surveillance System, while Ontario births were excluded entirely from the 1996 Canadian Birth-Census Cohort Database because critical linkage variables were missing from the birth records, and because of the documented data quality concerns for this period. Excluding Ontario raises the cohort’s overall infant mortality rate (Figure 1). However, the exclusion makes little difference to the weighted cohort estimates, because the cohort weight adjusts for the missing 685 infant deaths.
Perinatal outcomes by maternal census characteristics
Rates for all five birth outcomes were calculated by the mother’s ethnocultural background and her highest level of education. Ethnocultural background was grouped into three categories: Aboriginal identity; visible minority (Chinese, South Asian, Black, Filipino, Latin American, Southeast Asian, Arab, West Asian, Korean, Japanese, other visible minority, multiple visible minority); and neither Aboriginal nor visible minority. Highest level of maternal education, based on most advanced certificate, diploma or degree, was grouped into four categories: less than secondary graduation, secondary graduation, postsecondary certificate or diploma (short of a university bachelor’s degree, including trades certificates), and university degree (bachelor’s degree or higher).
Cohort rates of preterm and LGA birth and fetal and infant mortality were higher, and the rate of SGA was lower, among mothers who reported Aboriginal identity than among non-Aboriginal mothers (Tables 4 and 5). Applying the cohort weight produced similar results, although for LGA and infant mortality, the differences between Aboriginal and non-Aboriginal mothers were somewhat attenuated. Births to mothers from a visible minority had higher rates of SGA and lower rates of LGA than births to other mothers. Rates of fetal and infant mortality were similar among mothers from a visible minority and mothers who were neither Aboriginal nor visible minority. Excluding Ontario had little effect on the patterns in the unweighted and weighted rates of infant mortality across maternal ethnocultural categories (Figure 2 presents weighted rates).
Cohort rates of preterm, SGA and LGA birth, fetal mortality and infant mortality were lower at higher levels of maternal education. Applying the cohort weight yielded a similar pattern for SGA, preterm birth and infant mortality, but flattened the gradient for LGA and lessened the gradient for fetal mortality. Excluding Ontario had little effect on the patterns in rates across maternal levels of education (data not shown).
The purpose of the 2006 Canadian Birth-Census Cohort Database is to provide information on the nature and extent of variations in perinatal health across socioeconomic and ethnocultural groups. The results of this analysis suggest that the cohort can help accomplish this goal.
Cohort eligibility was dependent on linkage rates to the census, and on the census sampling of households for the long-form questionnaire. Despite an overall linkage rate of 90%, variations emerged across certain characteristics. A substantial number of those differences were reduced or eliminated by applying the cohort weight; the resulting weighted cohort estimates were consistent with those of all in-scope births.
Differences in birth outcomes across the selected socioeconomic and ethnocultural characteristics were similar to those based on other data sources. The higher rates of preterm birth, LGA, and fetal and infant mortality for births to Aboriginal mothers are consistent with other studies,Note 15Note 16Note 17 as are the higher rates of preterm birth and infant mortality among mothers with lower educational attainment.Note 18Note 19Note 20 Although these general patterns held whether or not the cohort weight was applied, the importance of applying the weight was apparent for estimates directly affected by census long-form oversampling of remote northern areas and most Indian reserves.
Strengths and limitations
The 2006 Canadian Birth-Census Cohort has several important strengths. It is population-based with a large sample and a cohort weight that permits inference about the population of births that the cohort represents. This allows for detailed analyses of perinatal outcomes by characteristics including education, income, ethnicity and Aboriginal identity. Contextual effects (such as neighbourhood) and the health effects of environmental exposuresNote 21Note 22 can also be examined. Analyzing results for the 2006 cohort together with the 1996 cohort will reveal the extent to which differences in perinatal outcomes across socioeconomic and ethnocultural groups changed over that 10-year period.
Analysis of the cohort involves a number of limitations. Among the linked records, not all individuals on the birth record were found in the same census household, resulting in missing information for some children, mothers or fathers. Furthermore, the validity of the cohort estimates across the socioeconomic and ethnocultural characteristics could not be evaluated as easily as the cohort estimates of perinatal outcomes across birth characteristics. The latter could be compared directly with the rates for all in-scope births, whereas the former relies on comparisons with the findings of other studies.
To further assess the cohort’s face validity, a subsample analysis was undertaken. Quebec is the only province to report information about maternal education to the national birth database. Consequently, it was possible to examine birth outcomes by maternal education for cohort members born in Quebec, and to compare them with rates for all in-scope births in Quebec. Across levels of maternal education, rates of preterm birth, SGA and LGA for the cohort were comparable to those calculated for all in-scope births in Quebec; applying the cohort weight further reduced differences (data not shown). However, small sample sizes resulted in wide confidence intervals for estimates of fetal and infant mortality. Thus, potential bias for fatal outcomes across socioeconomic measures remains a concern because of the relatively low linkage rates for stillbirths and infant deaths.
With two years of birth data and a broad range of socioeconomic and ethnocultural characteristics now linked at the individual level, the 2006 Canadian Birth-Census Cohort offers information that can help inform perinatal surveillance and research in Canada, particularly with respect to non-fatal outcomes.
The authors are grateful to Statistics Canada employees Martin Lessard, James Brennan and Patrick Gallifa for performing the data linkage; Wei Qian for developing a cohort weight; Lauren Pinault, Jessica Pembroke, Raymond Reaume, and Zimei Zhang for manual review and verification; and Michael Tjepkema and Julie Bernier for managerial oversight and support. Funding for this study was provided by the Canadian Institutes of Health Research (MOP-111122), with support from Statistics Canada and Health Canada (Air Health Effects Research, Population Studies Division).
- Date modified: