Statistics Canada
Symbol of the Government of Canada

Data quality, concepts and methodology: Quality of demographic data

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Notes related to the quality of demographic estimates

In this case, the adjustment for the census net undercoverage also includes the incompletely enumerated Indian reserves.

Unless otherwise noted, the term preliminary include both preliminary and updated estimates.

The estimates contain certain inaccuracies stemming from two types of errors:

  1. errors in the Census data;
  2. imperfections in other data sources and the method used to estimate the components.

Census Data

A. Coverage, response and imputation errors

The errors attributable to census data can be divided into two groups: Response and processing errors, and coverage errors. The first group implies non-response error, misinterpretation by respondents, incorrect coding and non-response imputation. Errors in the second group primarily result from census net undercoverage (CNU) which is the difference between undercoverage and overcoverage. It should be noted that both types of errors are intrinsic to any survey data.

The coverage errors occur when dwellings and/or individuals are missed, incorrectly included (except for the 2006 Census, where peoples incorrectly included where not considered in the Census Overcoverage Study) or counted more than once. Following each census, Statistics Canada undertakes coverage studies to measure these errors. The main studies are the Reverse Record Check Survey (RRC) and the Census Overcoverage Study (COS). Based on these studies, estimates of undercoverage and overcoverage are produced for each province and territory. Demography Division adjusts the population enumerated in the census by province and territory using these estimates. At the subprovincial level these rates were applied to all geographic regions in the province or territory by age and sex.

During the process of developing base populations, an attempt was made to correct only coverage errors in the base population. However, the correction based on the results of the coverage studies and on modeling of overcoverage for provinces and territories by age and sex prior to 1991, was also subject to sampling, collection, response and processing errors and uncertainty in the assumptions underlying the models. With respect to the coverage studies, statistical analysis concluded that the adjustment, although not without errors itself, improved the quality of census data (Royce, 1993). They were deemed to be consistent over time and across geographical areas, and to provide logical results. Users should also be aware that when calculating census net undercoverage (CNU) rates for small areas, it is likely that the underlying assumptions may be violated. If this is true, the resulting CNU rate would be misleading. Errors associated with these assumptions are, however, very difficult to quantify.

Nevertheless, the corrections to the census data due to CNU improved, in general, the quality of the estimates by compensating for the differential undercoverage by age, sex and by province/territory across censuses

The adjustment also incorporates the results of a study on the estimates of the number of people living on incompletely enumerated Indian reserves to complete the corrections for coverage errors in the census. The results of the coverage studies contain mainly sampling errors.

These adjustments have a direct impact on:

  1. The error of closure and its distribution by age and sex within a province or a territory as well as by province/territory as the CNU and its distribution vary from one census to another;
  2. within-cohort consistency of population estimates. If for example, the male cohort in age group 0-4 in 1981 was tracked up to the 2001 Census (unadjusted for CNU) the age group 20-24 would be noticeably smaller in 2001 than the age group 15-19 in 1996. Since Canada receives many immigrants within these age groups, the opposite would be expected. However, only after adjustment for CNU, the cohort size increases from 1996 to 2001.

For further information regarding the main coverage studies, please see the following document on Statistics Canada's web site: 1996, 2001 and 2006 Census Technical Report on Coverage.

B. Components

Errors due to estimation methodologies and data sources other than the census can also be significant.

a. Births and deaths

Since the law requires the recording of vital statistics, the final estimates for births and deaths data meet very high quality standards. Nevertheless, since preliminary estimates are derived, they can be slightly different from final estimates.

b. Immigration and non-permanent residents

With respect to immigrants and non-permanent residents (NPRs), Citizenship andImmigration Canada (CIC) administers special data files on both of these components. Since immigration is controlled by law, data on immigrants and NPRs are compiled upon arrival in Canada. These data represent only "legal" immigration and exclude illegal immigrants. Thus, for the "legal" part of international movement into Canada, the data are considered to be of high quality. However, some biases such as the difference between the intended destination at the time of arrival and the actual destination, may exist. Finally, since information provided by the Visitor Data System (VSD) from CIC is not complete (age and sex of dependents, province of residence for certain groups of permit holders), estimates of NPRs are more prone to error than data on immigrants.

c. Emigration, returning emigration and net temporary emigration

Of all the demographic components that are used in the population estimates program, these components are the most difficult to estimate with precision. Canada does not have a complete border registration system. While immigration and non permanent residents (NPRs) are well documented by the federal government, Statistics Canada has always used indirect techniques for the estimation of the number of persons leaving the country. For this reason, available statistics regarding these three components have historically been of a lower quality than other components.

Estimates of the number of emigrants and returning emigrants are both derived using Canada Child Tax Benefit (CCTB) data provided by Canada Revenue Agency (CRA). Data are adjusted to take into account the incomplete coverage of the program and to derive the emigration and returning emigration of adults.

These adjustments and the delay in obtaining the data are the two main sources of errors.

As current information on the number of persons living temporarily abroad does not exist, estimates are based on the Reverse Record Check (RRC) and the census. Estimates for the intercensal period, distributed equally among the five years, are maintained constant for the postcensal period. Moreover, assumptions were made to allow for the distribution of national data by subprovincial regions. Any geographical or quarterly variation may introduce error in the estimation of these components.

d. Interprovincial migration and intraprovincial migration

Since July 1993, preliminary interprovincial migration estimates have been based on Canada Child Tax Benefit (CCTB) files. Under this program, only 76% of children aged 0-17 at the Canada level were entitled to benefits on July 1, 2001. Consequently, preliminary CCTB based estimates are subject to larger error than final estimates derived from Canada Revenue Agency (CRA) tax files. Since the two estimates of interprovincial migration are produced from different sources, they are more subject to precocity errors.

Moreover, as no preliminary data is available for subprovincial migration, we assume the same level of migration as the previous year. The last two years are therefore identical for this component.

C. Geographical changes

Subprovincial geographical boundaries may change from one census to another. In order to facilitate chronological studies, population estimates for CDs, CMAs and ERs were produced for the 1996 to 2009 period according to boundaries delineated in the 2006 Census.

In order to clarify the demographic significance of geographical boundary changes, the 2001 population counts are convert in 2006 geographical boundary. Afterward, we compare the converted counts with the population counts of the 2001 Census in 2001 geographical boundary. Data presented here apply to population enumerated in the 2001 Census without adjustment for census net undercoverage.

Census metropolitan areas (CMAs)

Among the 27 CMAs as defined in the 2001 Census, 7 have undergone geographical boundary changes in the 2006 Census. Had the latter been applied in 2001, population in all 27 CMAs would have reached 19,360,000 instead of 19,297,000 representing a slight increase of 63,000 persons or 0.3%

In one CMA, the demographic repercussion of boundary changes was more pronounced. In Sherbrooke, the relative gain attributable to boundary change reached 14.4%. In some cases (Québec, Montréal, Ottawa-Gatineau, London, Winnipeg and Calgary), boundary changes had a more negligible effect on population, less than 1%.

Census divisions (CDs)

Boundary changes affected 33 of the 288 CDs in Canada and population in 14 CDs was only slightly affected with relative gains/losses not exceeding 0.1%.

Boundary changes greatly impacted population numbers in nine CDs located in Quebec. The CD most affected was Lajemmerais in Quebec, with a loss of 36.1% followed by, in decreasing order, La-Vallée-du-Richelieu (-19.9%), and Shawinigan (-19.7%). Finally, the following CDs are the ones who registered the highest gains: Lévis with 54.8% (which is a new CD created from two CDs from the 2001 Census, Desjardins and Chute-de-la-Chaudière), Maskinongé (49,5%), Longueuil (19.3%), Nouvelle-Beauce (18.8%), Bellechasse (12.9%) and Coaticook (11.0%).

Quality assessment

In order to assess the quality of our estimates, two evaluation measures are used: precocity errors and errors of closure.

A. Precocity errors

The quality of preliminary estimates of components is analyzed using precocity errors. Precocity error is defined as the difference between the preliminary and final estimate of a particular component in terms of its relative proportion of the total population of the relevant geographical area. It can be calculated for both population and component estimates.

Precocity error allows for useful comparisons between components, as well as between different geographical levels of different population size. Note that when compared to the total population for an area, the differences between preliminary and final estimates of the components are quite small. However, this type of error has a different impact on each component and geographical area.

Generally speaking, net interprovincial and subprovincial migration yields the greatest precocity errors. This is likely the result of the use of different data sources for preliminary and final estimates. In most years and for most provinces/territories, births, deaths and immigration estimates yielded the smallest precocity errors. For immigration estimates, this reflects the completeness of the data source and the availability of data for the more timely preliminary estimates. In the case of births and deaths, small precocity errors can be explained by the use of short-term projections for preliminary estimates.

According to the analysis of the most recent precocity errors and assuming that the quality of the basic data remains constant, the present postcensal estimates should have an acceptable degree of reliability.

B. Errors of closure

The error of closure measures the exactness level of the final postcensal estimates. It can be defined as the difference between the most current postcensal population estimates as of Census Day and the enumerated population of the most recent census (after adjustments for census net undercoverage (CNU)).

The error of closure comes from two sources: the relative differences in the amount of CNU between census and errors in the components of demographic growth over the intercensal period. This can be calculated for total population estimates and by age and sex. With each 5-year intercensal period, the error of closure can only be calculated with the release of census data and estimates of CNU.

By dividing the error of closure by the census population adjusted for CNU the differences are relatively small at the national level (0.16% for 2001 and 0.32% for 2006). At the provincial and territorial level, as at the subprovincial level differences are understandably larger, since the estimates are also affected by errors in estimating interprovincial and subprovincial migration. Nevertheless, the provincial/territorial final postcensal estimates generally fall within 1% of the adjusted census population, except for the territories and a few other exceptions.

Next