Section 5 Data quality

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Lineage
Positional accuracy
Attribute accuracy
Logical consistency
Consistency with other products
Completeness

Linkage data quality elements provide information on the fitness-for-use of a linkage database by describing why, when, and how the data are created, and how accurate the data are. The elements include an overview describing the purpose and usage, as well as specific quality elements reporting on the lineage, positional accuracy, attribute accuracy, logical consistency and completeness. This information is provided to users for all linkage data products disseminated for the census.

Lineage

Lineage describes the history of the linkage data, including descriptions of the source material from which the data were derived and the methods of derivation. It also contains the dates of the source material, and all transformations involved in producing the final digital files.

The Postal Code Conversion File (PCCF) is the result of two updating activities. The first is done every five years, after each census, to align the database to the latest census geographic areas. The other is the ongoing maintenance activity that links the latest postal codes from Canada Post Corporation (CPC) to census geographic areas. These links are recorded on the Geography Division's postal code database.

Linking to 2006 Census geographic areas

Sources

The sources used to align the census geography linkage from 2001 to 2006 were:

  • Monthly updates of the Address Lookup File, Postal Code Delivery Mode File, and Householder File from CPC
  • Geography Division's Spatial Data Infrastructure (SDI)
  • 2006 Census of Population and Dwellings
  • September 2006 PCCF
  • 2006 Census block-face, dissemination block, and representative points data files
  • Dissemination area correspondence file

Process

The following steps were used to assign 2006 Census geographic areas to the PCCF:

  1. Process information from the CPC files
  2. Automated geocoding of postal codes to 2006 Census block-face, dissemination block or dissemination area
  3. Assign 2006 Census dissemination areas for postal codes using the correspondence between 2001 Census and 2006 Census geographic areas
  4. Manually geocode postal codes
  5. Sample verification of postal code records
  6. Assign the single link indicator (SLI)
  7. Assign higher levels of geography.

Step 1: Process information from the CPC files

The monthly files received from CPC between October 2006 and May 2011 are processed to assign Birth date, Retired date, Historic Delivery Mode Type (H_DMT) and Delivery Mode Type (DMT). Records are extracted from the CPC Address Lookup File with the postal code, Postal Code type (PCtype) and related address information. Birth date is the date the postal code became effective. Retired date is the date the postal code is no longer found in the CPC monthly files. The Delivery Mode Type is assigned using the Delivery Mode Type File. When a DMT is updated for a postal code, the previous DMT becomes the H_DMT. Users should note that some postal codes are retired and reintroduced at a later date, possibly in another location.

Step 2: Automated geocoding of postal codes to 2006 Census block-face, dissemination block or dissemination area

All postal codes active in May 2011 are geocoded using a new automated geocoding system. A detailed discussion of the new approach to geocoding is found in the working paper entitled How Postal Codes Map to Geographic Areas (Catalogue no. 92F0138MIE2007001), which is available on the Statistics Canada website.

The new system uses the forward sortation area (FSA) search area file and a match between CPC municipality and census subdivision (CSD) to determine the general area where the postal code would be found. Census responses are used to create FSA search areas. These FSA areas are composed of dissemination areas where a particular FSA was reported in the 2006 Census. Canada Post municipalities are matched to 2006 Census subdivisions using the province of the municipality and the similarity in name. When the match is not clear, historical CSD files on the Spatial Data Infrastructure (SDI) are used to determine the match.

Postal codes with civic address ranges associated with them (PCtype 1 and 2) are coded to the appropriate dissemination area, dissemination block or block-face in the SDI. About 94% of the PCtype 1 and 2 postal code records in the May 2011 PCCF were coded in this way.

The postal code response in the 2006 Census is used to code rural routes, postal installation/post office boxes and postal codes that service general areas. These postal codes are geocoded to the dissemination area (DA) level. The number of DAs coded to is reduced in a post process to remove duplication in DA assignment. However, not all active postal codes are geocoded in this way, either because the address information is not found or the census response is not significant (at least four responses of that postal code per dissemination block) to determine the appropriate area for geocoding.

A quality indicator (QI) is assigned in the automated geocoding process. The indicator is based on the confidence of the link of the postal code to the geographic area. Please see the Technical specifications section for more details.

Step 3: Assign 2006 Census dissemination areas for postal codes using the correspondence between 2001 Census and 2006 Census geographic areas

When a match could not be found through the automated address matching system, postal codes that had been previously coded to a 2001 Census geographic area are linked to a 2006 Census geographic area using the correspondence between 2001 Census and 2006 Census geographic areas (based on the 2001 Census dissemination areas in the September 2006 PCCF). These links are created at the 2006 Census DA level only.

Step 4: Manually geocode postal codes

Postal codes are manually geocoded when they could not be coded at an acceptable degree of precision using the automated process or when they could not be converted using the correspondence between the 2001 Census and 2006 Census geographic areas.

In addressable areas covered by the Spatial Data Infrastructure (SDI), an attempt is made to link postal codes to one or more block-faces. The list of new postal codes and address range records from CPC was matched to the SDI street listings according to elements common to both files (e.g., province, municipality, street name, type, direction, and address range). Once matched, the postal code and related geographic area codes are transferred to the postal code database.

For those postal codes that could not be coded by the above method, municipal and other maps are used to find the street(s). If a street could not be found on a municipal map or other authoritative source, local authorities (such as Planning and Engineering Departments and local post offices) are contacted to assist in the location of the street. In areas experiencing high growth, new maps are requested from the proper authority. After the street is located, the position of the boundary relative to that street on the SDI is used to determine the associated dissemination area.

Step 5: Sample verification of postal code records

The relationship between the postal code, dissemination blocks and dissemination areas is verified by sampling records from the geocoding completed in each of the processes above. These records are independently manually geocoded. The two sets of geocodes are compared as part of the verification.

Step 6: Assign the single link indicator (SLI)

Many postal codes are represented by multiple records on the PCCF. The single link indicator (SLI) is created to assist users dealing with postal codes having multiple records. The SLI provides a geographic record for mapping a postal code representative point. The SLI has a value of '1' to flag the best (or only) link for a given postal code. The value '0' indicates an additional record.

Please note that the SLI is identified on both active and retired postal codes. Users will find when working with both active and retired postal codes that multiple SLIs will appear for a postal code that was retired and reintroduced. However, there will only be one SLI for a set of active records for a postal code.

When assigning the SLI, priority is given to postal codes associated with civic addresses or dwellings (based on the PCtype). The confidence of coding to the geographic area (the quality indicator) and the precision of the geocoding (the block-face, dissemination area or dissemination block), as well as the population, are considered. When the postal code was linked to a DA associated with multiple federal electoral district (FED), population centre (POPCTR), or designated place (DPL), the SLI is linked to the record represented by the greatest proportion of the FED, POPCTR or DPL population.

Users are cautioned that the SLI provides only a partial correspondence between the postal code and other geographic areas.

Step 7: Assign higher levels of geography

Higher levels of geography are assigned based on the block-face, dissemination block, or dissemination area. Please see the hierarchy chart in Appendix B for how geographic areas are related. When a dissemination area is related to more than one FED, POPCTR or DPL, more than one record appears in the PCCF for that postal code to dissemination areas linkage.

Positional accuracy

Positional accuracy refers to the absolute and relative accuracy of the positions of geographic features. Absolute accuracy is the closeness of the coordinate values in a dataset to values accepted as being true. Relative accuracy is the closeness of the relative positions of features to their respective relative positions accepted as or being true. Descriptions of positional accuracy include the quality of the final file or product after all transformations.

The geographic coordinates assigned to postal codes are either block-face, dissemination block or dissemination area representative points calculated for census purposes. Therefore, the positional accuracy of the postal code is dependent on:

  • the accuracy of the links established between the postal code and the block-face, dissemination block, or dissemination area
  • the positional accuracy of the block-face, dissemination block, or dissemination area representative point with respect to the block-face, dissemination block, or dissemination area.

Using different methods to create links in the PCCF results in varying degrees of accuracy for those links. Postal codes linked to block-faces are considered to be the more precise, as they are linked as closely as possible to address ranges representing the location of the postal code according to CPC. When the block-face link cannot be produced, postal codes are linked to a dissemination block or dissemination area.

Table 5.1 illustrates the lowest level to which geocoding was completed for postal codes associated with address ranges (PCtype 1 and 2).

Table 5.1 
Geocoded postal code of PCtype 1 and 2 records – active in May 2011
Geocoded records Records Postal codes associated with records
number percent number percent
Geocoded to block-face 1,280,186 80.41 716,651 85.03
Geocoded to dissemination block 173,588 10.90 70,406 8.35
Geocoded to dissemination area 138,272 8.69 55,751 6.61
Total 1,592,046 100.00 842,808 100.00
Note: Some postal codes may have more than one representative point. The postal code counts in this table differ from those given in the section About this product/Comparison to other products/versions, which include all postal code types as well as both active and retired records.

The quality indicator (QI) illustrates the confidence of the link established between the postal code and the more precise geographic area for each record geocoded using the automated system. For more information on the QI, refer to the Technical specifications section.

The geographic coordinates included on the PCCF are derived from Statistics Canada's Spatial Data Infrastructure (SDI). Users should be aware that absolute positional accuracy is not an intended feature of the SDI. Consequently, these files and any by-product are not recommended for engineering or legal applications or for emergency dispatching services.

For more information on the method used to calculate representative points for block-faces, dissemination blocks and dissemination areas, refer to the Technical specifications section.

Attribute accuracy

Attribute accuracy refers to the accuracy of the quantitative and qualitative information attached to each feature (such as population for a population centre, street name, census subdivision name and code).

The PCCF is a flat file providing attributes for postal codes and for those dissemination area(s), dissemination block(s), etc. linked to the postal code. Most of these attributes are taken from two independent sources. Some attributes are also created for the PCCF.

The geographic code, type, and name of all higher level standard geographic areas in which a block-face, dissemination block or dissemination area is located are extracted from the Spatial Data Infrastructure.

The information relevant to each postal code – birth date, retirement date, delivery mode type, type of postal code and CPC community name – is carried forward from the CPC address look-up file and auxiliary files. In some cases, the postal code type was imputed by Statistics Canada (see the Technical specifications section).

The single link indicator (SLI; see Process in the Data quality section) and the type of representative point are assigned by Statistics Canada.

Tests are run to ensure that certain basic data relationships were consistent within the set of records in the PCCF.

Logical consistency

Logical consistency describes the fidelity of relationships encoded in the data structure of the digital linkage data.

In some cases, especially in rural areas, the postal code service areas do not respect dissemination area boundaries. When this occurs, the same postal code is repeated with different geographical information (i.e., different coordinates or dissemination area codes). These multiple records for a postal code reflect the relationship between the postal code and census geographic areas. Also, a postal code can be linked to more than one block-face or dissemination block within the same dissemination area.

Conversely, different postal codes could have the same coordinates. This happens when more than one postal code has been linked to the same dissemination area. Also, more than one postal code can be linked to a single block-face or dissemination block.

Every set of active records for a postal code has one SLI equal to '1'. Every set of retired records for a postal code, for a given retirement date, has one SLI equal to '1'.

Consistency with other products

Geographic areas contained in the PCCF are consistent with all 2006 Census related geographic products, except for the 2006 Census Forward Sortation Area Boundary File (Catalogue no. 92-170-XWE, XCE). The 2006 Census Forward Sortation Area Boundary File represents only the FSAs reported in the 2006 Census, whereas the PCCF is updated twice a year to include recent postal codes and also includes retired postal codes.

Completeness

Completeness refers to the degree to which geographic features, their attributes and their relationships are included or omitted in a dataset. It also includes information on selection criteria, definitions used, and other relevant mapping rules.

Completeness in the context of the PCCF is the degree to which all valid postal codes are accounted for on the PCCF and all geographic codes from the 2006 Census are linked to a postal code. All postal codes as of May 2011 according to CPC have been linked to census geography. There are 493 populated dissemination areas that are not linked to any postal code on the PCCF. Of the DAs that are linked to a postal code, there are five populated dissemination areas that are not linked to any active postal code on the PCCF.

There are also 7,305 retired postal codes included in the PCCF. Postal codes retired before January 1, 2006 are included in the Retired 2005 text file, R2005.txt. There are 59,247 retired postal codes in the Retired 2005 file.

The quality indicator (QI) is currently available only for the records using the automated geocoding process. When postal codes were geocoded using address information, each of the three characters of the QI contains an 'A', 'B' or 'C' indicating the confidence of geocoding. When the QI could not be determined, an 'N' is used to represent 'unknown'. The QI for the records that are manually geocoded or were directly converted from the 2001 Census geocodes contain an 'NNN' for the QI.

Every attempt was made to ensure that the delivery installation (PO) value indicated whether a postal code of PCtype 3 or 5 was coded to a postal installation or to the area serviced by the postal code. Occasionally a PCtype 3 or 5 record may be coded to a postal installation (indicated in a record with PO='1') and to a service area (indicated by a record with PO='0'). In some cases, including where the geographic area linkages were directly based on conversion from the 2001 Census geocodes, the PO is unknown (this is indicated by a PO='2').