5. Data quality
Linkage data quality elements provide information on the fitness-for-use of a linkage database by describing why, when, and how the data are created, and how accurate the data are. The quality elements include an overview reporting on lineage, positional accuracy, attribute accuracy, logical consistency and completeness. This information is provided to users for all linkage data products.
Lineage describes the history of the linkage data, including descriptions of the source material from which the data were derived and the methods of derivation. It also contains the dates of the source material, and all transformations involved in producing the final digital files.
The Postal CodeOM Conversion File (PCCF) is the result of two updating activities. The first is done every five years, after each census, to align the database to the latest census geographic areas. The other is the ongoing maintenance activity that links the latest postal codesOM from Canada Post Corporation (CPC) to census geographic areas. These links are recorded on the Geography Division's postal codeOM database.
Linking to 2011 Census geographic areas
The sources used to align the census geography linkage from 2006 to 2011 were:
- Monthly updates of the Address Lookup File, Postal CodeOM Delivery Mode File, and Householder File from CPC
- Geography Division's Spatial Data Infrastructure (SDI)
- 2011 Census of Population and Dwellings
- 2011 Census block-face, dissemination block, and representative points data files
- Dissemination area correspondence file
The following steps were used to assign 2011 Census geographic areas to the PCCF:
- Process information from the CPC files
- Automated geocoding of postal codesOM to 2011 Census block-face, dissemination block or dissemination area
- Assign 2011 Census dissemination areas for postal codesOM using the correspondence between 2006 Census and 2011 Census geographic areas
- Manually geocode postal codesOM
- Sample verification of postal codeOM records
- Assign the single link indicator (SLI)
- Assign higher levels of geography.
Step 1: Process information from the CPC files
The monthly files received from CPC are processed to assign birth date, retired date, historic delivery mode type (H_DMT) and delivery mode type (DMT). Records are extracted from the CPC Address Lookup File with the postal codeOM, postal codeOM type (PCtype) and related address information. Birth date is the date the postal codeOM became effective. Retired date is the date the postal codeOM is no longer found in the CPC monthly files. The delivery mode type is assigned using the Delivery Mode Type File. When a DMT is updated for a postal codeOM, the previous DMT becomes the H_DMT. Users should note that some postal codesOM are retired and reintroduced at a later date, possibly in another location.
Step 2: Automated geocoding of postal codesOM to 2011 Census block-face, dissemination block or dissemination area
All postal codesOM active in June 2013 are geocoded using an automated geocoding system. A detailed discussion of the approach to geocoding is found in the working paper entitled How Postal Codes Map to Geographic Areas (Catalogue no. 92F0138MIE2007001), which is available on the Statistics Canada website.
The system uses the forward sortation area© (FSA©) search area file and a match between CPC municipality and census subdivision (CSD) to determine the general area where the postal codeOM would be found. Census responses are used to create FSA© search areas. These FSA© areas are composed of dissemination areas where a particular FSA© was reported in the 2011 Census. Canada Post municipalities are matched to 2011 Census subdivisions using the province of the municipality and the similarity in name. When the match is not clear, historical CSD files on the Spatial Data Infrastructure (SDI) are used to determine the match.
Postal codesOM with civic address ranges associated with them (PCtype 1 and 2) are coded to the appropriate dissemination area, dissemination block or block-face in the SDI. About 97% of the PCtype 1 and 2 postal codeOM records in the June 2013 PCCF were coded in this way.
The postal codeOM response in the 2011 Census is used to code rural routes, postal installation/post office boxes and postal codesOM that service general areas. These postal codesOM are geocoded to the dissemination area (DA) level. The number of DAs coded to is reduced in a post process to remove duplication in DA assignment. However, not all active postal codesOM are geocoded in this way, either because the address information is not found or the census response is not significant (at least four responses of that postal codeOM per dissemination block) to determine the appropriate area for geocoding.
A quality indicator (QI) is assigned in the automated geocoding process. The indicator is based on the confidence of the link of the postal codeOM to the geographic area. Please see the Technical specifications section for more details.
Step 3: Assign 2011 Census dissemination areas for postal codesOM using the correspondence between 2006 Census and 2011 Census geographic areas
When a match could not be found through the automated address matching system, postal codesOM that had been previously coded to a 2006 Census geographic area are linked to a 2011 Census geographic area using the correspondence between 2006 Census and 2011 Census geographic areas. These links are created at the 2011 Census DA level only.
Step 4: Manually geocode postal codesOM
Postal codesOM are manually geocoded when they could not be coded at an acceptable degree of precision using the automated process or when they could not be converted using the correspondence between the 2006 Census and 2011 Census geographic areas.
In addressable areas covered by the Spatial Data Infrastructure (SDI), an attempt is made to link postal codesOM to one or more block-faces. The list of new postal codesOM and address range records from CPC was matched to the SDI street listings according to elements common to both files (e.g., province, municipality, street name, type, direction, and address range). Once matched, the postal codeOM and related geographic area codes are transferred to the postal codeOM database.
Step 5: Sample verification of postal codeOM records
The relationship between the postal codeOM, dissemination blocks and dissemination areas is verified by sampling records from the geocoding completed in each of the processes above. These records are independently manually geocoded. The two sets of geocodes are compared as part of the verification.
Step 6: Assign the single link indicator (SLI)
Many postal codesOM are represented by multiple records on the PCCF. The single link indicator (SLI) is created to assist users dealing with postal codesOM having multiple records. The SLI provides a geographic record for mapping a postal codeOM representative point. The SLI has a value of '1' to flag the best (or only) link for a given postal codeOM. The value '0' indicates an additional record.
Please note that the SLI is identified on both active and retired postal codesOM. Users will find when working with both active and retired postal codesOM that multiple SLIs will appear for a postal codeOM that was retired and reintroduced. However, there will only be one SLI for a set of active records for a postal codeOM.
When assigning the SLI, priority is given to postal codesOM associated with civic addresses or dwellings (based on the PCtype). The confidence of coding to the geographic area (the quality indicator) and the precision of the geocoding (the block-face, dissemination area or dissemination block), as well as the population, are considered. When the postal codeOM was linked to a DA associated with multiple federal electoral district (FED), population centre (POPCTR), or designated place (DPL), the SLI is linked to the record represented by the greatest proportion of the FED, POPCTR, or DPL population.
Users are cautioned that the SLI provides only a partial correspondence between the postal codeOM and other geographic areas.
Step 7: Assign higher levels of geography
Higher levels of geography are assigned based on the block-face, dissemination block, or dissemination area. Please see the hierarchy chart in Appendix B for how geographic areas are related. When a dissemination area is related to more than one FED, POPCTR or DPL, more than one record appears in the PCCF for that postal codeOM to dissemination areas linkage.
Positional accuracy refers to the absolute and relative accuracy of the positions of geographic features. Absolute accuracy is the closeness of the coordinate values in a dataset to values accepted as being true. Relative accuracy is the closeness of the relative positions of features to their respective relative positions accepted as or being true. Descriptions of positional accuracy include the quality of the final file or product after all transformations.
The geographic coordinates assigned to postal codesOM are either block-face, dissemination block or dissemination area representative points calculated for census purposes. Therefore, the positional accuracy of the postal codeOM is dependent on:
- the accuracy of the links established between the postal codeOM and the block-face, dissemination block, or dissemination area
- the positional accuracy of the block-face, dissemination block, or dissemination area representative point with respect to the block-face, dissemination block, or dissemination area.
Using different methods to create links in the PCCF results in varying degrees of accuracy for those links. Postal codesOM linked to block-faces are considered to be the more precise, as they are linked as closely as possible to address ranges representing the location of the postal codeOM according to CPC. When the block-face link cannot be produced, postal codesOM are linked to a dissemination block or dissemination area.
Table 5.1 illustrates the lowest level to which geocoding was completed for postal codesOM associated with address ranges (PCtype 1 and 2).
|Geocoded records||Records||Postal codesOM associated with records|
|Geocoded to block-face||1,461,590||85.50||754,398||88.13|
|Geocoded to dissemination block||151,660||8.87||60,886||7.11|
|Geocoded to dissemination area||90,367||5.29||36,109||4.22|
|Geocoded to census subdivision||5,813||0.34||4,631||0.54|
Note: Some postal codesOM may have more than one representative point. The postal codeOM counts in this table differ from those given in the section About this product/Comparison to other products/versions, which include all postal codeOM types as well as both active and retired records.
The quality indicator (QI) illustrates the confidence of the link established between the postal codeOM and the more precise geographic area for each record geocoded using the automated system. For more information on the QI, refer to the Technical specifications section.
The geographic coordinates included on the PCCF are derived from Statistics Canada's Spatial Data Infrastructure (SDI). Users should be aware that absolute positional accuracy is not an intended feature of the SDI. Consequently, these files and any by-product are not recommended for engineering or legal applications or for emergency dispatching services.
For more information on the method used to calculate representative points for block-faces, dissemination blocks and dissemination areas, refer to the Technical specifications section.
Attribute accuracy refers to the accuracy of the quantitative and qualitative information attached to each feature (such as population for a population centre, street name, census subdivision name and code).
The PCCF is a flat file providing attributes for postal codesOM and for those dissemination area(s), dissemination block(s), etc. linked to the postal codeOM. Most of these attributes are taken from two independent sources. Some attributes are also created for the PCCF.
The geographic code, type, and name of all higher level standard geographic areas in which a block-face, dissemination block or dissemination area is located are extracted from the Spatial Data Infrastructure.
The information relevant to each postal codeOM – birth date, retirement date, delivery mode type, type of postal codeOM and CPC community name – is carried forward from the CPC address look-up file and auxiliary files. In some cases, the postal codeOM type was imputed by Statistics Canada (see the Technical specifications section).
The single link indicator (SLI; see Process) and the type of representative point are assigned by Statistics Canada.
Tests are run to ensure that certain basic data relationships were consistent within the set of records in the PCCF.
Logical consistency describes the fidelity of relationships encoded in the data structure of the digital linkage data.
In some cases, especially in rural areas, the postal codeOM service areas do not respect dissemination area boundaries. When this occurs, the same postal codeOM is repeated with different geographical information (i.e., different coordinates or dissemination area codes). These multiple records for a postal codeOM reflect the relationship between the postal codeOM and census geographic areas. Also, a postal codeOM can be linked to more than one block-face or dissemination block within the same dissemination area.
Conversely, different postal codesOM could have the same coordinates. This happens when more than one postal codeOM has been linked to the same dissemination area. Also, more than one postal codeOM can be linked to a single block-face or dissemination block.
Every set of active records for a postal codeOM has one SLI equal to '1.' Every set of retired records for a postal codeOM, for a given retirement date, has one SLI equal to '1.'
Geographic areas contained in the PCCF are consistent with all 2011 Census related geographic products, except for the 2011 Census Forward Sortation Area Boundary File (Catalogue no. 92-179-X). The 2011 Census Forward Sortation Area Boundary File represents only the forward sortation areas© reported in the 2011 Census, whereas the PCCF is updated annually to include recent postal codesOM and also includes retired postal codesOM.
Completeness refers to the degree to which geographic features, their attributes and their relationships are included or omitted in a dataset. It also includes information on selection criteria, definitions used, and other relevant mapping rules.
Completeness in the context of the PCCF is the degree to which all valid postal codesOM are accounted for on the PCCF and all geographic codes from the 2011 Census are linked to a postal codeOM. Almost all postal codesOM as of June 2013 according to CPC have been linked to census geography.
There are also 3,080 retired postal codesOM included in the PCCF. Postal codesOM retired before January 1, 2011 are included in the Retired 2010 text file, R2010.txt. There are 66,102 retired postal codesOM in the Retired 2010 file.
The quality indicator (QI) is currently available only for the records using the automated geocoding process. When postal codesOM were geocoded using address information, each of the three characters of the QI contains an 'A', 'B' or 'C' indicating the confidence of geocoding. When the QI could not be determined, an 'N' is used to represent 'unknown.' The QI for the records that are manually geocoded or were directly converted from the 2006 Census geocodes contain an 'NNN' for the QI.
Every attempt was made to ensure that the delivery installation (PO) value indicated whether a postal codeOM of PCtype 3 or 5 was coded to a postal installation or to the area serviced by the postal codeOM. Occasionally a PCtype 3 or 5 record may be coded to a postal installation (indicated in a record with PO='1') and to a service area (indicated by a record with PO='0'). In some cases, including where the geographic area linkages were directly based on conversion from the 2006 Census geocodes, the PO is unknown (this is indicated by a PO='2').