Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
Appendix A Glossary
Appendix B Postal codes by federal electoral district
Appendix C Hierarchy of standard geographic units for dissemination, 2006 Census
Appendix D Geographic units by province and territory, 2006 Census
Appendix E Data quality, sampling and weighting, confidentiality and random rounding
Adjusted counts
'Adjusted counts' refer to previous census population and dwelling counts that
were adjusted (i.e., recompiled) to reflect current census boundaries, when a
boundary change occurs between the two censuses.
Block-face
A block-face is one side of a street between two consecutive features
intersecting that street. The features can be other streets or boundaries of
standard geographic areas.
Block-faces are used for generating block-face representative points, which in turn are used for geocoding and census data extraction when the street and address information are available.
Cartographic boundary files
Cartographic boundary files (CBFs) contain the boundaries of standard
geographic areas together with the shoreline around Canada. Selected inland
lakes and rivers are available as a supplementary layer.
Census agricultural region
Census agricultural regions (CARs) are composed of groups of adjacent census
divisions. In Saskatchewan, census agricultural regions are made up of groups
of adjacent census consolidated subdivisions, but these groups do not
necessarily respect census division boundaries.
Census consolidated subdivision
A census consolidated subdivision (CCS) is a group of adjacent census
subdivisions. Generally, the smaller, more urban census subdivisions (towns,
villages, etc.) are combined with the surrounding, larger, more rural census
subdivision, in order to create a geographic level between the census
subdivision and the census division.
Census division
Census division (CD) is the general term for provincially legislated areas
(such as county, municipalité régionale de comté and regional district) or their equivalents. Census divisions are intermediate
geographic areas between the province/territory level and the municipality
(census subdivision).
Census metropolitan area and census agglomeration
A census metropolitan area (CMA) or a census agglomeration (CA) is formed by
one or more adjacent municipalities centred on a large urban area (known as the
urban core). A CMA must
have a total population of at least 100,000 of which 50,000 or more must live
in the urban core. A CA must have an urban core population of at least 10,000. To be included in the CMA or CA, other adjacent
municipalities must have a high degree of integration with the central urban
area, as measured by commuting flows derived from census place of work data.
If the population of the urban core of a CA declines below 10,000, the CA is retired. However, once an area becomes a CMA, it is retained as a CMA even if its total population declines below 100,000 or the population of its urban core falls below 50,000. The urban areas in the CMA or CA that are not contiguous to the urban core are called the urban fringe. Rural areas in the CMA or CA are called the rural fringe.
When a CA has an urban core of at least 50,000, it is subdivided into census tracts. Census tracts are maintained for the CA even if the population of the urban core subsequently falls below 50,000. All CMAs are subdivided into census tracts.
Census metropolitan area and census agglomeration influenced zone
The census metropolitan area and census agglomeration influenced zone (MIZ) is a concept that
geographically differentiates the area of Canada outside census metropolitan
areas (CMAs) and census agglomerations (CAs). Census subdivisions outside CMAs and CAs are assigned to one of
four categories according to the degree of influence (strong, moderate, weak or
no influence) that the CMAs and/or CAs have on
them.
Census subdivisions (CSDs) are assigned to a MIZ category based on the percentage of their resident employed labour force that has a place of work in the urban core(s) of CMAs or CAs. CSDs with the same degree of influence tend to be clustered. They form zones around CMAs and CAs that progress through the categories from 'strong' to 'no' influence as distance from the CMAs and CAs increases.
Census subdivision
Census subdivision (CSD) is the general term for municipalities (as determined
by provincial/territorial legislation) or areas treated as municipal equivalents for statistical purposes (e.g., Indian reserves, Indian settlements and unorganized territories).
Census tract
Census tracts (CTs) are small, relatively stable geographic areas that usually
have a population of 2,500 to 8,000. They are located in census metropolitan
areas and in census agglomerations with an urban core population of 50,000 or
more in the previous census.
A committee of local specialists (for example, planners, health and social workers, and educators) initially delineates census tracts in conjunction with Statistics Canada. Once a census metropolitan area (CMA) or census agglomeration (CA) has been subdivided into census tracts, the census tracts are maintained even if the urban core population subsequently declines below 50,000.
Coordinate system
A coordinate system is a reference system based on mathematical rules for
specifying positions (locations) on the surface of the earth. The coordinate
values can be spherical (latitude and longitude) or planar (such as Universal
Transverse Mercator).
Cartographic boundary files, digital boundary files, representative points and road network files are disseminated in latitude/longitude coordinates.
Datum
A datum is a geodetic reference system that specifies the size and shape of the
earth, and the base point from which the latitude and longitude of all other
points on the earth's surface are referenced.
Designated place
A designated place (DPL) is normally a small community or settlement that does
not meet the criteria established by Statistics Canada to be a census
subdivision (an area with municipal status) or an urban area.
Designated places are created by provinces and territories, in cooperation with Statistics Canada, to provide data for submunicipal areas.
Digital boundary files
Digital boundary files (DBFs) portray the boundaries used for 2006 Census
collection and, therefore, often extend as straight lines into bodies of water.
Dissemination area
A dissemination area (DA) is a small, relatively stable geographic unit
composed of one or more adjacent dissemination blocks. It is the smallest
standard geographic area for which all census data are disseminated. DAs cover all the territory
of Canada.
Dissemination block
A dissemination block (DB) is an area bounded on all sides by roads and/or
boundaries of standard geographic areas. The dissemination block is the
smallest geographic area for which population and dwelling counts are
disseminated. Dissemination blocks cover all the territory of Canada.
Economic region
An economic region (ER) is a grouping of complete census divisions (CDs) (with
one exception in Ontario) created as a standard geographic unit for analysis of
regional economic activity.
Ecumene
Ecumene is a term used by geographers to mean inhabited land. It generally
refers to land where people have made their permanent home, and to all work
areas that are considered occupied and used for agricultural or any other
economic purpose. Thus, there can be various types of ecumenes, each having
their own unique characteristics (population ecumene, agricultural ecumene,
industrial ecumene, etc.).
Federal electoral district
A federal electoral district (FED) is an area represented by a member of the
House of Commons. The federal electoral district boundaries used for the 2006
Census are based on the 2003 Representation Order.
Geocoding
Geocoding is the process of assigning geographic identifiers (codes) to map features and data records. The resulting geocodes permit data to be linked geographically.
Households, postal codes and place of work data are linked to block-face representative points when the street and address information is available; otherwise, they are linked to dissemination block (DB) representative points. In some cases, postal codes and place of work data are linked to dissemination area (DA) representative points when they cannot be linked to DBs. As well, place of work data are linked to census subdivision representative points when the data cannot be linked to DAs.
Geographic code
A geographic code is a numerical identifier assigned to a geographic area. The
code is used to identify and access standard geographic areas for the purposes
of data storage, retrieval and display.
Geographic reference date
The geographic reference date is a date determined by Statistics Canada for the
purpose of finalizing the geographic framework for which census data will be
collected, tabulated and reported. For the 2006 Census, the geographic
reference date is January 1, 2006.
Land area
Land area is the area in square kilometres of the land-based portions of standard geographic areas.
Land area data are unofficial, and are provided for the sole purpose of calculating population density.
Locality
'Locality' (LOC) refers to the historical place names of former census
subdivisions (municipalities), former designated places and former urban areas,
as well as to the names of other entities, such as neighbourhoods, post
offices, communities and unincorporated places.
Map projection
A map projection is the process of transforming and representing positions from
the earth's three-dimensional curved surface to a two-dimensional (flat)
surface. The process is accomplished by a direct geometric projection or by a
mathematically derived transformation.
The Lambert conformal conic map projection is widely used for general maps of Canada at small scales and is the most common map projection used at Statistics Canada.
National Geographic Database
The National Geographic Database (NGD) is a shared database between Statistics
Canada and Elections Canada. The database contains roads, road names and
address ranges. It also includes separate reference layers containing physical
and cultural features, such as hydrography and hydrographic names, railroads
and power transmission lines.
The NGD was created in 1997 as a joint Statistics Canada/Elections Canada initiative to develop and maintain a national road network file serving the needs of both organizations. The active building of the NGD - that is, integrating the files from Statistics Canada, Elections Canada and Natural Resources Canada - occurred from 1998 to 2000. Thereafter, Statistics Canada and Elections Canada reconciled their digital boundary holdings to the new database's road network geometry so that operational products could be derived.
Since 2001, the focus of the NGD has been on intensive data quality improvements, especially regarding the quality and currency of its road network coverage. There has been considerable expansion of road names and civic addresses ranges, as well as the addition of hydrographic names. Priorities were determined by Statistics Canada and Elections Canada, enabling the NGD to meet the joint operational needs of both agencies in support of census and electoral activities.
Place name
'Place name' refers to the set of names that includes current census
subdivisions (municipalities), current designated places and current urban
areas, as well as the names of localities.
Population density
Population density is the number of persons per square kilometre.
Postal code
The postal code is a six-character code defined and maintained by Canada Post
Corporation for the purpose of sorting and delivering mail.
Province or territory
Province and territory refer to the major political units of Canada. From a statistical point of view, province and territory are basic areas for which data are tabulated. Canada is divided into 10 provinces and three territories.
Reference map
A reference map shows the location of the geographic areas for which census
data are tabulated and disseminated. The maps display the boundaries, names and
codes of standard geographic areas, as well as major cultural and physical
features, such as roads, railroads, coastlines, rivers and lakes.
Representative point
A representative point is a point that represents a line or a polygon. The
point is centrally located along the line, and centrally located or population
weighted in the polygon.
Representative points are generated for block-faces, dissemination blocks, dissemination areas, census subdivisions, urban areas and designated places.
Households, postal codes and place of work data are linked to block-face representative points when the street and address information is available; otherwise, they are linked to dissemination block (DB) representative points. In some cases, postal codes and place of work data are linked to dissemination area (DA) representative points when they cannot be linked to DBs. As well, place of work data are linked to census subdivision representative points when the data cannot be linked to DAs.
Road network file
The road network file (RNF) contains roads, road names, address ranges and road
ranks for the entire country. Most commonly, address ranges are dwelling-based
and are mainly available in the large urban centres of Canada.
Rural area
Rural areas include all territory lying outside urban areas. Taken together,
urban and rural areas cover all of Canada.
Rural population includes all population living in the rural fringes of census metropolitan areas (CMAs) and census agglomerations (CAs), as well as population living in rural areas outside CMAs and CAs.
Spatial Data Infrastructure
The Spatial Data Infrastructure (SDI), formerly known as the National Geographic Base (NGB), is an internal, maintenance database that is not disseminated outside of Statistics Canada. It contains roads, road names and address ranges from the National Geographic Database (NGD), as well as boundary arcs of standard geographic areas that do not follow roads, all in one integrated line layer. The database also includes a related polygon layer consisting of basic blocks (BB) (basic blocks are the smallest polygon units in the database, and are formed by the intersection of all roads and the arcs of geographic areas that do not follow roads), boundary layers of standard geographic areas, and derived attribute tables, as well as reference layers containing physical and cultural features (such as hydrography, railroads and power transmission lines) from the NGD.
The SDI supports a wide range of census operations, such as the maintenance and delineation of the boundaries of standard geographic areas (including the automated delineation of dissemination blocks, dissemination areas and urban areas), and geocoding. The SDI is also the source for generating many geography products for the 2006 Census, such as cartographic boundary files and road network files.
Spatial data quality elements
Spatial data quality elements provide information on the fitness for use of a spatial database by describing why, when and how the data are created, and how accurate the data are. The elements include an overview describing the purpose and usage, as well as specific quality elements reporting on the lineage, positional accuracy, attribute accuracy, logical consistency and completeness. This information is provided to users for all spatial data products disseminated for the census.
Standard Geographical Classification
The Standard Geographical Classification (SGC) is Statistics Canada's official
classification for three types of geographic areas: provinces and territories,
census divisions (CDs) and census subdivisions (CSDs). The SGC provides unique numeric identification (codes) for these hierarchically related geographic areas.
Statistical Area Classification
The Statistical Area Classification (SAC) groups census subdivisions according
to whether they are a component of a census metropolitan area, a census
agglomeration, a census metropolitan area and census
agglomeration influenced zone (strong MIZ, moderate MIZ, weak MIZ or no MIZ), or the territories (Yukon, Northwest
Territories and Nunavut). The SAC is used for
data dissemination purposes.
Thematic map
A thematic map shows the spatial distribution of one or more specific data
themes for standard geographic areas. The map may be qualitative in nature
(e.g., predominant farm types) or quantitative (e.g., percentage population change).
Urban area
An urban area has a minimum population concentration of 1,000 persons and a
population density of at least 400 persons per square kilometre, based on the
current census population count. All territory outside urban areas is
classified as rural. Taken together, urban and rural areas cover all of Canada.
Urban population includes all population living in the urban cores, secondary urban cores and urban fringes of census metropolitan areas (CMAs) and census agglomerations (CAs), as well as the population living in urban areas outside CMAs and CAs.
Urban core, urban fringe and rural fringe
'Urban core, urban fringe and rural fringe' distinguish between central and peripheral urban and rural areas within a census metropolitan area (CMA) or census agglomeration (CA).
'Urban core' is a large urban area around which a CMA or a CA is delineated. The urban core must have a population (based on the previous census) of at least 50,000 persons in the case of a CMA, or at least 10,000 persons in the case of a CA.
The urban core of a CA that has been merged with an adjacent CMA or larger CA is called the 'secondary urban core'.
'Urban fringe' includes all small urban areas within a CMA or CA that are not contiguous with the urban core of the CMA or CA.
'Rural fringe' is all territory within a CMA or CA not classified as an urban core or an urban fringe.
Urban population size group
The term 'urban population size group' refers to the classification used in
standard tabulations where urban areas are distributed according to the
following predetermined size groups, based on the current census population.
Tabulations are not limited to these predetermined population size groups; the census database has the capability of tabulating data according to any user-defined population size group.
Table B.1 Postal codes by federal electoral district
Figure C.1 Hierarchy of standard geographic units for dissemination, 2006 Census
Table D.1 Geographic units by province and territory, 2006 Census
The 2006 Census was a large and complex undertaking and, while considerable effort was taken to ensure high standards throughout all collection and processing operations, the resulting estimates are inevitably subject to a certain degree of error. Users of census data should be aware that such error exists, and should have some appreciation of its main components, so that they can assess the usefulness of census data for their purposes and the risks involved in basing conclusions or decisions on these data.
Errors can arise at virtually every stage of the census process, from the preparation of materials through data processing, including the listing of dwellings and the collection of data. Some errors occur at random, and when the individual responses are aggregated for a sufficiently large group, such errors tend to cancel out. For errors of this nature, the larger the group, the more accurate the corresponding estimate. It is for this reason that users are advised to be cautious when using small area estimates. There are some errors, however, which might occur more systematically, and which result in 'biased' estimates. Because the bias from such errors is persistent no matter how large the group for which responses are aggregated, and because bias is particularly difficult to measure, systematic errors are a more serious problem for most data users than the random errors referred to previously.
For census data in general, the principal types of error are as follows:
coverage errors, which occur when dwellings or individuals are missed, incorrectly enumerated or counted more than once
non-response errors, which result when responses cannot be obtained from a certain number of households and/or individuals, because of extended absence or some other reason or when responses cannot be obtained from a certain number of questions in a complete questionnaire
response errors, which occur when the respondent, or sometimes the census representative, misunderstands a census question, and records an incorrect response or simply uses the wrong response box
processing errors, which can occur at various steps including coding, when 'write-in' responses are transformed into numerical codes; data capture, when responses are transferred from the census questionnaire in an electronic format, by optical character recognition methods or key entry operators; and imputation, when a 'valid', but not necessarily correct, response is inserted into a record by the computer to replace missing or 'invalid' data ('valid' and 'invalid' referring to whether or not the response is consistent with other information on the record)
sampling errors, which apply only to the supplementary questions on the 'long form' asked of a one-fifth sample of households, and which arise from the fact that the responses to these questions, when weighted up to represent the whole population, inevitably differ somewhat from the responses which would have been obtained if these questions had been asked of all households.
The above types of error each have both random and systematic components. Usually, however, the systematic component of sampling error is very small in relation to its random component. For the other non-sampling errors, both random and systematic components may be significant.
Coverage errors affect the accuracy of the census counts, that is, the sizes of the various census universes: population, families, households and dwellings. While steps have been taken to correct certain identifiable errors, the final counts are still subject to some degree of error because persons or dwellings have been missed, incorrectly enumerated in the census or counted more than once.
Missed dwellings or persons result in undercoverage. Dwellings can be missed because of the misunderstanding of collection unit (CU) boundaries, or because either they do not look like dwellings or they appear uninhabitable. Persons can be missed when their dwelling is missed or is classified as vacant, or because the respondent misinterprets the instructions on whom to include on the questionnaire. Some individuals may be missed because they have no usual residence and did not spend census night in a dwelling.
Dwellings or persons incorrectly enumerated or double-counted result in overcoverage. Overcoverage of dwellings can occur when structures unfit for habitation are listed as dwellings (incorrectly enumerated), when there is a certain ambiguity regarding the collection unit (CU) boundaries or when units (for example, rooms) are listed separately instead of being treated as part of one dwelling (double-counted). Persons can be counted more than once because their dwelling is double counted or because the guidelines on whom to include on the questionnaire have been misunderstood. Occasionally, someone who is not in the census population universe, such as a foreign resident or a fictitious person, may, incorrectly, be enumerated in the census. On average, overcoverage is less likely to occur than undercoverage and, as a result, counts of dwellings and persons are likely to be slightly underestimated.
For the 2006 Census, three studies are used to measure coverage error. In the Dwelling Classification Study, dwellings listed as vacant were revisited to verify that they were vacant on Census Day, and dwellings whose households were listed as non-respondent were revisited to determine the number of usual residents and their characteristics. Adjustments have been made to the final census counts to account for households and persons missed because their dwelling was incorrectly classified as vacant. The census counts may also have been adjusted for dwellings whose households were classified as non-respondent. Despite these adjustments, the final counts still may be subject to some undercoverage. Undercoverage tends to be higher for certain segments of the population, such as young adults (especially young adult males) and recent immigrants. The Reverse Record Check Study is used to measure the residual undercoverage for Canada, and each province and territory. The Overcoverage Study is designed to investigate overcoverage errors. The results of the Reverse Record Check and the Overcoverage Study, when taken together, furnish an estimate of net undercoverage.
While coverage errors affect the number of units in the various census universes, other errors affect the characteristics of those units.
Sometimes it is not possible to obtain a complete response from a household, even though the dwelling was identified as occupied and a questionnaire was mailed out or dropped off. The household members may have been away throughout the census period or, in rare instances, the householder may have refused to complete the form. More frequently, the questionnaire is returned but no response is provided to certain questions. Effort is devoted to ensure as complete a questionnaire as possible. Once the questionnaires are captured, edit analysis are performed to detect significant cases of partial non-response and follow-up interviews are attempted to get the missing information. Despite this, at the end of the collection stage, a small number of responses are still missing, i.e., non-response errors. Although missing responses are eliminated during processing by replacing each one of them by the corresponding response for a 'similar' record, there remain some potential imputation errors. This is particularly serious if the non-respondents differ in some respects from the respondents; this procedure will then introduce a non-response bias.
Even when a response is obtained, it may not be entirely accurate. The respondent may have misinterpreted the question or may have guessed the answer, especially when answering on behalf of another, possibly absent, household member. The respondent may also have entered the answer in the wrong place on the questionnaire. Such errors are referred to as response errors. While response errors usually arise from inaccurate information provided by respondents, they can also result from mistakes by the census representative who completed certain parts of the questionnaire, such as the structural type of dwelling, or who followed up to obtain a missing response.
Some of the census questions require a written response. During processing, these 'write in' entries are given a numeric code. Coding errors can occur when the written response is ambiguous, incomplete, and difficult to read or when the code list is extensive (e.g., major field of study, place of work). A formal quality control (QC) operation is used to detect, rectify and reduce coding errors. Within each work unit, a sample of responses is independently coded a second time. The resolution of discrepancies between the first and second codings determines whether recoding of the work unit is necessary. Census coding is now entirely automated, resulting in a reduction of coding errors.
The information on the questionnaires is scanned and captured into a computer file. To monitor and to ensure that the number of data capture errors are within tolerable limits, a sample of fields are sampled and reprocessed. Analysis of the two captures is done. Unsatisfactory work is identified, corrected and appropriate feedback is done to the system in order to minimize their occurrence.
Once captured, the data are edited where they undergo a series of computer checks to identify missing or inconsistent responses. These are replaced during the imputation stage of processing where either a response consistent with the other respondents' data is inferred or a response from a similar donor is substituted. Imputation ensures a complete database where the data correspond to the census counts and facilitate multivariate analyses. Although errors may have been introduced during imputation, the methods used have been rigorously tested to minimize systematic errors.
Various studies are being carried out to evaluate the quality of the responses obtained in the 2006 Census. For each question, non-response rates and edit failure rates have been calculated. These can be useful in identifying the potential for non-response errors and other types of errors. Also, tabulations from the 2006 Census have been or will be compared with corresponding estimates from previous censuses, from sample surveys (such as the Labour Force Survey) and from various administrative records (such as birth registrations and municipal assessment records). Such comparisons can indicate potential quality problems or at least discrepancies between the sources.
In addition to these aggregate-level comparisons, there are some micro-match studies in progress, in which census responses are compared with another source of information at the individual record level. For certain 'stable' characteristics (such as age, sex, mother tongue and place of birth), the responses obtained in the 2006 Census, for a sample of individuals, are being compared with those for the same individuals in the 2001 Census.
Estimates obtained by weighting up responses collected on a sample basis are subject to error due to the fact that the distribution of characteristics within the sample will not usually be identical to the distribution of characteristics within the population from which the sample has been selected.
The potential error introduced by sampling will vary according to the relative scarcity of the characteristics in the population. For large cell values, the potential error due to sampling, as a proportion of the cell value, will be relatively small. For small cell values, this potential error, as a proportion of the cell value, will be relatively large.
The potential error due to sampling is usually expressed in terms of the so-called 'standard error'. This is the square root of the average, taken over all possible samples of the same size and design, of the squared deviation of the sample estimate from the value for the total population.
The following table provides approximate measures of the standard error due to sampling for census long form (2B) data. These measures are intended as a general guide only.
Table E.1 Approximate standard error due to sampling for 2006 Census sample data
Users wishing to determine the approximate error due to sampling for any given cell of data, based upon the 20% sample, should choose the standard error value corresponding to the cell value that is closest to the value of the given cell in the census tabulation. When using the obtained standard error value, the user, in general, can be reasonably certain that, for the enumerated population, the true value (discounting all forms of error other than sampling) lies within plus or minus three times the standard error (e.g., for a cell value of 1,000, the range would be 1,000 ± [3 x 65] or 1,000 ± 195).
The standard errors given in the table above will not apply to population, household, dwelling or family counts for the geographic area under consideration (see Sampling and weighting below). The effect of sampling for these cells can be determined by a comparison with a corresponding 100% data product.
The effect of the particular sample design and weighting procedure used in the 2006 Census will vary, however, from one characteristic to another and from one geographic area to another. The standard error values in the table may, therefore, understate or overstate the error due to sampling.
The 2006 Census data were collected either from 100% of the households or on a sample basis with the data weighted to provide estimates for the entire population. The long form questionnaire (2B) information was collected on a 20% random sample basis of the households and weighted to compensate for sampling. All table headings are noted accordingly. Note that on Indian reserves and in remote areas all data were collected on a 100% basis.
For any given geographic area, the weighted population, household, dwelling or family total or subtotal may differ from that shown in reports containing data collected on a 100% basis. Such variations are due to sampling and to the fact that, unlike sample data, 100% data do not exclude institutional residents.
The figures shown in the tables have been subjected to a confidentiality procedure known as random rounding to prevent the possibility of associating statistical data with any identifiable individual. Under this method, all figures, including totals and margins, are randomly rounded either up or down to a multiple of '5', and in some cases '10'. While providing strong protection against disclosure, this technique does not add significant error to the census data. The user should be aware that totals and margins are rounded independently of the cell data so that some differences between these and the sum of rounded cell data may exist. Also, minor differences can be expected in corresponding totals and cell values among various census tabulations. Similarly, percentages, which are calculated on rounded figures, do not necessarily add up to 100%. Order statistics (median, quartiles, percentiles, etc.) and measures of dispersion such as the standard error are computed in the usual manner. When a statistic is defined as the quotient of two numbers (which is the case for averages, percentages, and proportions), the two numbers are rounded before the division is performed. For income, owner's payments, value of dwelling, hours worked, weeks worked and age, the sum is defined as the product of the average and the rounded weighted frequency. Otherwise, it is the weighted sum that is rounded. It should also be noted that small cell counts may suffer a significant distortion as a result of random rounding. Individual data cells containing small numbers may lose their precision as a result. Also, a statistic is suppressed if the number of actual records used in the calculation is less than 4 or if the sum of the weight of these records is less than 10. In addition, for values expressed in dollar units, the statistic is suppressed if the range of the values is too narrow or if all values are less than, in absolute value, a specified threshold. Finally, again for values expressed in dollar units, the statistic is suppressed if there is a dollar value too large compared to all the others.
Users should be aware of possible data distortions when they are aggregating these rounded data. Imprecisions as a result of rounding tend to cancel each other out when data cells are re-aggregated. However, users can minimize these distortions by using, whenever possible, the appropriate subtotals when aggregating.
For those requiring maximum precision, the option exists to use custom tabulations. With custom products, aggregation is done using individual census database records. Random rounding occurs only after the data cells have been aggregated, thus minimizing any distortion.
In addition to random rounding, area suppression has been adopted to further protect the confidentiality of individual responses.
Area suppression is the deletion of all characteristic data for geographic areas with populations below a specified size. The extent to which data are suppressed depends upon the following factors:
if the data are tabulated from the 100% database, they are suppressed if the total population in the area is less than 40
if the data are tabulated from the 20% sample database, they are suppressed if the total non-institutional population in the area from either the 100% or 20% database is less than 40.
There are some exceptions to these rules:
income distributions and related statistics are suppressed if the population in the area, excluding institutional residents, is less than 250 from either the 100% or the 20% database, or if the number of private households is less than 40 from the 20% database
place-of-work distributions and related statistics are suppressed if the total number of employed persons in the area is less than 40, according to the sample database. If the data also include an income distribution, the threshold is raised to 250, again according to the sample database
tabulations covering both place of work and place of residence along with related statistics are suppressed, if the total number of employed persons in the area is less than 40 according to the sample database, or if the area's total population, excluding institutional residents, according to either the 100% or the sample database, is less than 40. If the tabulations also include an income distribution, the threshold is raised to 250 in all cases and the tabulations are suppressed if the number of private dwellings in the place of residence area is less than 40
if the data are tabulated from the 100% database and refer to six-character postal codes or to groups of either dissemination blocks or block-faces, they are suppressed if the total population in the area is less than 100
if the data are tabulated from the 20% sample database and refer to six-character postal codes or to groups of either dissemination blocks or block-faces, they are suppressed if the total non-institutional population in the area from either the 100% or 20% database is less than 100
if the data refer to groups of either dissemination blocks or block-faces, and cover place of work, they are suppressed if the total number of employed persons in the area is less than 100, according to the sample database
if the data refer to groups of either dissemination blocks or block-faces, and cover both place of work and place of residence, they are suppressed if the total number of employed persons in the area is less than 100, according to the sample database, or if the area's total population, excluding institutional residents, according to either 100% or the sample database, is less than 100.
In all cases, suppressed data are included in the appropriate higher aggregate subtotals and totals.
The suppression technique is being implemented for all products involving subprovincial data (i.e., Profile series, basic cross-tabulations, semi-custom and custom data products) collected on a 100% or 20% sample basis.
For further information on the quality of census data, contact the Social Survey Methods Division at Statistics Canada, Ottawa, Ontario, Canada K1A 0T6, or by calling 613-951-4783.