![]() ![]() |
||||||
![]() ![]() ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() ![]() |
![]() |
![]() |
![]() |
![]() ![]() |
|
![]() |
Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
![]() ![]() |
![]() |
DiscussionAbstract AbstractStatistics Canada creates files that provide the link between postal codes and the geographic areas by which it disseminates statistical data. By linking postal codes to the Statistics Canada geographic areas, Statistics Canada facilitates the extraction and subsequent aggregation of data for selected geographic areas from files available to users. Users can then take data from Statistics Canada for their areas and tabulate this with other data for these same areas to create a combined statistical profile for these areas. An issue has been the methodology used by Statistics Canada to establish the linkage of postal codes to geographic areas. In order to address this issue, Statistics Canada decided to create a conceptual framework on which to base the rules for linking postal codes and Statistics Canada's geographic areas. This working paper presents the conceptual framework and the geocoding rules. The methodology described in this paper will be the basis for linking postal codes to the 2006 Census geographic areas. This paper is presented for feedback from users of Statistics Canada's postal code related products. IntroductionThis paper has two objectives: (1) to establish a conceptual framework articulating the relationship between postal codes and geographic areas; and, (2) to subsequently, establish rules for linking postal codes to standard geographic areas.1 Postal codes are managed by Canada Post Corporation (CPC) for the efficient sorting and delivery of mail. They are not created as units for the analysis or mapping of population, business or dwelling characteristics. The postal code products created by Statistics Canada provide the link between postal codes and Statistics Canada's standard geographic areas to allow for these various uses of the data. Generally, the uses of the postal code products can be characterized as follows:
Statistical unit The association between the postal code and standard geographic areas is currently provided in the following products: (1) Postal Code Conversion File (PCCF) and (2) Postal Code Federal Riding File (PCFRF). The PCCF provides the linkage between the postal code and most standard geographic areas including the detailed entities: block-face, dissemination block and dissemination area. The PCCF is the primary product that meets the demands of the above-mentioned applications. Over the years there have been a number of issues related to PCCF quality which have led to the questioning of the fundamental concepts underlying how the postal codes are linked to standard geographic areas. These include issues such as the appropriate use of data in linking postal codes, the precision and accuracy of the linkages and the assignment of single link indicators. Investigations of these PCCF quality issues have increased the already high cost, to Statistics Canada, of geocoding postal codes. These issues have highlighted a need to review and establish a conceptual framework articulating the relationship between postal codes and standard geographic areas in order to re-establish the business rules for linking postal codes to standard geographic areas and subsequent product development. The following situations lead to difficulties in linking postal codes to geographic areas:
The premise of this working paper is that better use of the postal code information collected in the census as well as better use of the information in CPC files can address the issues described above as well as lead to more precise links between postal codes and standard geographic areas. In order to improve the linkage, rules need to be developed to geocode postal codes to Statistics Canada's standard geographic areas. Geocoding can be defined as the process of assigning geographic identifiers (codes) to map features and data records. The resulting geocodes permit data to be linked geographically. For example, one way to geocode a civic address in Canada would be to start by locating the general area and then "zero-in" to the specific building with that civic address. Since the same street name or street address number may be used in many parts of Canada, being able to limit the searching for a street name and street address number combination to a particular place within Canada is essential. The term "search area" will be used in this paper to denote the geographic area within which the geocoding can be done using road names and civic addresses. The geocoding process for civic addresses can be described as follows: Step 1: Establish search area. Step 2: Find search area in Road Network File. Step 3: Find street within search area on Road Network File. Step 4: Find address within address range on street. Step 5: Establish geocodes. (The geocodes used are block-faces, dissemination blocks and dissemination areas) The geocoding process described above is the general model that will be used to establish business rules for geocoding postal codes. The geocoding process for postal codes ultimately creates the link between the postal codes and the geocodes of block-face, dissemination block and dissemination area. The link between postal codes and all other standard geographic areas is based on this initial geocoding. The intents of this working paper are the following:
The expectation is that the geocoding rules and recommendations for quality indicators can be used to create the products. 1.0 Postal code typology and locationExamining the structure and typology of the postal codes is the first step in understanding how postal codes can be associated with standard geographic areas. Some of the pertinent definitions as well as the types and occurrences are examined in this section. This provides the basis for the business rules for geocoding postal codes (see section 3.0). 1.1 The postal code and locationThe postal code is maintained by the CPC. It was designed to help sort mail rapidly and make more efficient the delivery of letters, parcels and other mail by CPC. A postal code may be linked to different types of points of delivery including residential mail boxes, super boxes as well as post office mail boxes. (See www.canadapost.ca.) The characters that form the postal code are generally representative of the intended service (delivery) of the postal code. Canada Post defines a postal code as follows: A six-character alphanumeric combination (ANA NAN) assigned to one or more postal addresses. The postal code is an integral part of every postal address in Canada and is required for the mechanised processing of mail. Postal Codes are also used to identify the various CPC processing facilities and delivery installations (Canada Post Corporation, 2005a, p. 24). The first three characters of the postal code (alphanumeric combination of "ANA") form the forward sortation area (FSA). This is defined as follows: The forward sortation area (FSA) represents a specific area within a major geographic region or province. The forward sortation area provides the basis for the primary sorting or forward mail (Canada Post Corporation, 2006, Section B, Chapter 3, p. 9). The last three characters of the code are referred to as the Local Delivery Unit (LDU). These allow for the creation of individual postal codes serviced by postal installations within the FSA. Each of the first letters of the postal code is reserved for a particular province/territory of the country (see Appendix A for more details). Specifically, based on examining the CPC files, the first character is indicative of the province/territory of the delivery installation from which service for the postal code is provided. The second character of the postal code is indicative of the coverage of the postal code. A postal code with a "0" in the second character is classified as "rural" and all other postal codes are considered as urban by Canada Post.2 However, this does not signify that the postal code applies to "rural areas" or "urban areas" based on the current census population.3,4 We can term these as postal codes servicing rural delivery areas and urban delivery areas to avoid confusion with the Statistics Canada usage of the terms rural areas and urban areas. For example, New Brunswick does not contain any rural delivery areas although Statistics Canada does classify much of New Brunswick as rural. In the case of postal codes associated with rural delivery areas, the Canada Postal Guide states that: "…the last three characters together with the forward sortation area identify a specific rural community" (Canada Post Corporation, 2006, Section B, Chapter 3, p.10). The communities need to be located on a map so that they can be geocoded to standard geographic areas.Maps are available from CPC that show the organizational area represented by an FSA. However, people living in one province may access mail using a postal code associated with another province, and similarly people living in one FSA may use the postal code of another FSA. This means that the area serviced by the postal code or the FSA may be different from the administrative boundary created by Canada Post. Further to this, the scale of these maps (1:50,000 to 1:12,000,000) makes the FSA boundaries difficult to relate to the more detailed road network in Statistics Canada's Geographic Frame. While these maps are useful in orienting the FSA within the province they neither provide the extent of the FSA with reference to a publicly available road network nor the more detailed information needed for geocoding the postal code. However, Letter Carrier Walk (LCW) maps recently available on the internet (generally for urban delivery areas) do provide FSA boundaries in the context of road network. These can be more helpful than the FSA maps in determining the boundaries of many FSAs. However, they do not delineate the area serviced by a postal code. The postal code by itself is not sufficient to obtain the exact location and coverage of the service area of the postal code. The FSA can be used to locate the general area serviced by most (but not all) of the associated postal codes. More information from other sources is needed for associating the postal code with specific standard geographic areas. 1.2 Address and delivery information from CPC filesCPC provides information on the postal codes in terms of the different aspects of mail delivery. This is relevant to how these postal codes map geographically and how they are accessed by the people and institutions that use the mail system. The typologies and location information described here are derived from data provided on a monthly basis by CPC to Statistics Canada in the following files:
These are the three most relevant files associated with the geocoding of postal codes. Of these the Address Lookup file, containing the address information associated with the vast majority of the postal codes, is the most useful for geocoding. 1.3 The Address Lookup FileThe Address Lookup File shows the complete picture in terms of active postal codes (including updates and about 500 to 2500 new postal codes introduced or "birthed" each month). Of the 1,144,108 records in the June 2006 Address Lookup file, 1,133,814 were "active" or valid for mailing. The other postal codes in the file are those that are "retired" that month. Statistics Canada uses the term "retired postal code" to describe all postal codes that were active at some time but, are no longer active. Although Canada Post only considers the active postal codes as valid for mailing, retired postal codes may be used for some time after they are retired. The CPC community name in the Address Lookup File Identification of the municipality serviced by a postal code is an important initial step in "zeroing in" on where the postal code should be geocoded. The 1st character of the postal code (indicating the province or territory) in combination with the CPC community name does in some (but not all) cases point to the current municipality or census subdivision (CSD) to which the postal code can be geocoded. When the CSD can be located, civic address information could be used to geocode to a street address range. The term "municipality" as in the Address Lookup File does not necessarily constitute the legally defined municipality at the time when Canada Post publishes its data. These names serve the purpose of preserving locally used names to prevent confusion in terms of the area serviced. The area serviced does not necessarily coincide with legally constituted municipalities of the same or similar name. For example, when municipalities are amalgamated, Canada Post preserves the older (non-amalgamated) names until it can establish that different streets are uniquely named in the newly amalgamated municipality. The FSA is also defined in relation to a municipality as follows: The FSA comprises the first three characters of a postal code. The FSA qualifies the area within a municipality that is permitted to use the valid alternate municipality name (Canada Post Corporation, 2005a, p 23). These municipality names are often names of places no longer in existence or commonly used names associated with localities. To avoid confusion this name will be referred to hereafter as the "CPC community". Comparison of municipal names in use in January 2001 illustrates this point. Statistics Canada's CSDs contain all the legally defined municipalities existing on January 1st of the census year (2001). A comparison of the CPC community names in existence shows that less than 40% matched the CSD. The following table illustrates the matches of CPC Community name with the 2001 CSD names6: The match rate in the above table is about 40% despite some processing (standardization) to increase the match rate. Examination of the June 2001 Address Lookup File shows changes occurring as a result of amalgamation. For example, both Nepean and Ottawa are maintained as CPC community names although Nepean no longer exists as a legal municipality as a result of amalgamation. The street names and the CPC community names appear to be updated in the Address Lookup File to reflect the changes at the municipal level once CPC resolves the uniqueness of the address. This may explain the slight increase in match rates over four years. However, for geocoding purposes, a correspondence file between CSDs and the CPC communitynames is needed so that the geocoding can be narrowed down to the CSD first and then to the street address within it.7 Unfortunately even where the CSD names match, more information is needed to define the area within which the street address information can be uniquely matched. More than one street can occasionally have the same name within a CSD. The same postal code may sometimes be assigned to multiple CSDs. The possibility of geocoding within multiple CPC community should be considered when geocoding. The CPC community Postal code types in the Address Lookup File Once the municipality or general area serviced by the postal code is established, the specific service area of the postal code needs to be understood before geocoding. The record types in the Address Lookup File are essential in establishing the type of service area for the postal code. Canada Post Corporation designates five types of records in the Address Lookup File.8 These are associated with five different types of delivery. These records are typed and named as "PCType" in the PCCF file. The typology is as follows:9
A postal code can be associated with more than one PCType. Each record in the June 2006 Address Lookup File has a postal code and a specific PCType. Figure 1 shows that the vast majority of the records are of PCType 1 or 2: The service areas of records with PCType of 1 and 2 are defined in terms of civic addresses. Basically civic (street) addresses are those that are found on dwellings or buildings. As can be seen above, the PCTypes with civic addresses constitute about 98% of the records in the Address Lookup File in June 2006. Most urban block-faces and many residential streets in urban areas typically have only one civic addressed postal code associated with them. While there may be difficulties in finding the CPC community and locating the standard geographic area(s) for a postal code, the service areas of these postal codes can be considered as completely defined by the address information provided by Canada Post. Here, geocoding is a simple matter if the road name and address information can be matched to the same vintage data in Statistics Canada's Geographic Frame. No civic address information for the service area is provided for PCTypes 3, 4 and 5. The service areas of these postal codes are not directly defined in the Address Lookup File. For PCType 3, the Address Lookup File provides lock box or post office box address ranges within a postal delivery installation. All postal codes can be linked back to the installation from which they are serviced.10 At a minimum, the postal codes of PCTypes 3, 4 and 5 can be coded to the CSD where the postal delivery installation is located (if the postal delivery installation address can be geocoded). For PCType 4, it is possible to sometimes obtain the names of the routes serviced. These include delivery along well-defined roads in settled rural areas, deliveries to industrial parks and to group mail boxes. For PCType 5 the only information is the CPC community name of the area serviced. In all cases, the service area is not necessarily where the postal delivery installation is located and the service area is not defined in terms of civic address ranges. In short, the postal codes with PCTypes 3, 4 or 5 (unlike PCTypes of 1 and 2) do not have a precise geographic footprint that permits direct and complete association with Statistics Canada's geographic files. Alternate data sources depicting usage patterns need to be examined to decide on how to allocate service areas for these postal codes and how to geocode these postal codes based on that information.With increased urbanisation and civic addressing, the proportion of PCType 2 postal codes has increased slightly. Table 2 illustrates the changes over time: While the percentage of each type of urban postal codes has remained consistent, the percentage of postal codes that include PCType 2 has had a small increase in rural delivery areas. This slight gain in the proportion of postal codes with civic address range records could lead to a slight increase in the proportion of postal codes that can be geocoded to a civic address range. As can be seen, a postal code could have records associated with more than one PCType. This generally occurs in the areas that Canada Post considers to be "rural". PCType 2 does not occur in isolation – it occurs with PCTypes 3, 4, and 5. The proportion of postal codes that have PCType 2 records continues to increase. This may be an indication of an increase in civic addressing particularly where there has been route service (PCType 4). For example, postal code K7V3Z8 is associated with a postal code of Route Service (PCType 4). However, it also has records of PCType 2 with civic addresses associated with it. Very rural areas tend to have only PCType 5 and/or 3 postal codes. PCType 1 occurs in urban areas and tends to occur in isolation except in a few cases with PCType 3. An example of this type of situation is the postal code K1A 9Z6.11 This is associated with a post office box and a civic address. The civic address record (PCType 1) is used to supplement the information in the record with the delivery installation address (PCType 3) for the same postal code.12 Effectively, these postal codes have two PCTypes but refer to one physical building (or business) that receives mail. The service area of a postal code can be considered to be the area where persons live and have access to mail for that postal code. For postal codes of PCType 1 and 2 that would include the dwellings and buildings with mail delivery based on their civic addresses. For postal codes with PCType 4, all of the areas that receive mail by route service would be included. For postal codes with PCType 3 and 5 that would encompass the locations of dwellings and buildings where persons typically access mail delivery through the associated delivery installation. The service area of an individual postal code would include that of each of its PCTypes. Examination of the PCType combinations associated with each postal code can help in understanding if and how the coverage, of theseservice areas,overlaps. This in turn can also help in deciding whether certain records are more representative of the geographic footprint of the postal code. Some PCTypes may be considered as more representative of the definitive service area for that postal code. For example, PCType 1 is more representative of a specific building/dwelling serviced than PCType 5. PCType 4 is representative of specific routes serviced whereas PCType 5 is not (based on the address information available) associated with one particular location more than any other within the CPC community. As discussed earlier in this section, where PCType 1 and 3 occur for the same postal code, the PCType 1 record shows the address of the users of the mail service. All of this suggests that the most representative records of the postal code's coverage would be as follows:
Both PCTypes 2 and 4 may cover the same area. The area serviced by the route service (PCType 4) may be quite specific even if it can not be located based on the address information from the Address Lookup File. This information is not sufficient to decide which of these records (PCType 2 or 4) is most representative. However, PCType 2 may be given precedence over PCType 4 in the geocoding process as being more representative since it is likely to be better geocoded (given specific address information) and also likely represents a part of the coverage of PCType 4 as a result of the progress in civic addressing. Essentially, when more than one PCType is associated with a postal code, the PCType may be used as one of the criteria to choose the most representative indicator for the spatial footprint of that postal code. Once the CPC community has been matched to a CSD, PCTypes 1 and 2 could be geocoded based on street address information, but, the lack of civic addressing information with PCTypes 3, 4 and 5 makes them almost impossible to geocode below the CSD level in the same way. Given that the target is to map the entire footprint of the postal code, more information is needed where postal codes have PCTypes 3, 4 and 5 records. The PCTypes can also be used as one factor in determining the most representative linkage of postal code to standard geographic area. To conclude, PCType is critical to determining the nature of the service area footprint of the postal code. Postal code records and geographic area coverage Even a postal code that maps specifically to civic address ranges may cover more than one block-face or block. Therefore when a postal code on the Address Lookup File has more than one record associated with it, each with its own address range, this can legitimately lead to the coverage of more than one block-face (or dissemination block). (This can also happen if a single address range from the Address Lookup File spans more than one block-face or dissemination block). The tables below shows the number of records in the Address Lookup File associated with each PCType and Postal Code combination: Examination of the table above shows that while records of PCType 1 are related to one address range about 70% of the time, even these may sometimes cover more than 50 address-ranges (potentially 50 block-faces or dissemination blocks). In our experience in geocoding of PCType 1 and 2, the coverage is of consecutive or adjacent block-faces when a postal code has more than one civic address range record. PCType 2 appears to have the most number of defined address ranges associated with it. PCType 4 has routes associated with it and these routes may cover large areas even if there are not that many records in the Address Lookup File. PCType 3 is generally associated with one or more post offices boxes in a postal delivery installation and has only one record associated with the installation. Similarly PCType 5 refers to delivery that is available from an installation and may have only one record. Of significance here is that for PCType 1 and especially PCType 2 there may not be one single geocoded record (block-face or block) that is more representative of the postal code in terms of service provided than any other. 1.4 Files for locating postal delivery installationsPCType 3 and PCType 5 postal codes are associated with the pick-up of mail from a postal delivery installation (rather than delivery to a civic address). The civic addresses associated with the postal delivery installation are available on the Householder File. The point locations for these postal delivery installations were also recently acquired from CPC by Statistics Canada. The point locations are not always consistent with the coordinates of Statistics Canada's Road Network File. This means that where address information is available and can be matched to that in the Road Network File, the geocoding is best done with it so that the position of the installation relative to the road network information is shown as accurately as possible. When address information is not available, the post office location address can be confirmed with Canada Post. Statistics Canada could maintain a list of postal delivery installation addresses and associated dissemination blocks to permit the geocoding of postal codes. Section 2.0 examines the question of whether some postal codes should be associated with the postal delivery installation only. 1.5 PCType and delivery mode type from the Delivery Mode FileThe delivery mode type (DMT) provides detail on the mail delivery service for urban delivery areas.13 It is used to calculate the rates for mailing services. (Canada Post Corporation, 2005c, p.2). The CPC description and typology is as follows: A = Delivery to block face address The delivery mode type can be useful in illustrating whether the geocoding applies to specific building and thus may potentially be "sub" block-face. As shown in Table 4, in a few cases postal code records can be associated with a specific building: Table 4 PCType 1 records from the June 2006 Address Lookup File with DMT from June 2006 Postal Code Delivery Mode File As can be seen, the most frequent combinations of DMT and PCType are associated with well-defined address ranges (DMT = "A"). The rest of the PCType 1 are associated with apartment buildings (DMT = "B") and different types of businesses (DMT = "E" and G"). Most of the geocoding of PCType 1 can be done simply based on the address information. However, it is especially important that buildings with large populations are geocoded properly. For example, a building may represent a health facility that defines an entire block, dissemination area or census subdivision (CSD). In the rare case where an entire CSD is represented by the facility, its population could represent all of the CSD's population. For PCType 1 postal codes, the DMT helps to distinguish the type of building (dwelling, apartment, large volume receiver, etc.) that is represented by the postal code. The DMT does not add more information than the PCType for locating postal codes based on address information, although it does about the nature of the building(s) receiving the mail. 1.6 Considerations for geocoding based on the CPC dataThese considerations for geocoding follow from the examination of the CPC data as elaborated in this section. These considerations, in conjunction with the findings in the next section on how postal codes are used and reported, will be used to develop the rules for geocoding. The considerations for developing the geocoding rules are:
In conclusion, supplementary information on the areas where postal code service is received is needed to both augment and validate the partial geocoding that can be done based on the CPC files alone. 2.0 Postal code users and geographyAfter examining the postal code data in the previous section, the key challenge remains: how to develop a search area where the postal code can be geocoded. The CPC community names are place names that do not usually match names or boundaries of the CSDs maintained in Statistics Canada's Geographic Frame. In the previous section, the discussion on the use of the CPC files for geocoding showed that supplementary information was needed for geocoding. The substantive un-answered questions were:
The supplementary information being considered here is information based on the reported use of the postal code. The area where a postal code is used is referred to as the service area of the postal code. The service areas, based on the postal code reported in the Census of Population, provide the supplementary information to both enable and confirm the geocoding. This is in addition to trying to find the CSD based on the CPC community name. Improvements to this process are discussed in Appendix C. This section focuses on how persons or households use postal codes and how this provides supplementary and confirmatory information to that of the address information from CPC. The users of the postal code are persons and businesses. But, the relationship between these statistical units and the postal codes is indirect. As discussed in the previous section, postal codes are not assigned to persons or businesses but, to buildings and dwellings.14 The exception to this are the less than 2% of the postal code address records, with DMT of "G" or "M", which are associated large volume receivers. A data source is also needed for finding the service area in this manner. In the case of persons the relationships needed from the data source are as follows: Essentially the person's relationship with the postal code and the standard geographic area can be used to make the connection between the latter two. Even if such a data source is available for the postal code, the question remains as to which standard geographic area is related to that postal code. For example, there is a very slight risk that the standard geographic area assigned in the data source not being that of the person's home but that associated with the standard geographic area of employment. These risks need to be balanced in the context of how persons use postal codes and how businesses use postal codes. 2.1 Persons and the postal code reported in the censusThe one source containing a reported postal code and the standard geographic area for persons in the entirety of Canada is the Census of Population. Households in Canada are required to report (or confirm) their postal code on the census form. This respondent postal code is typically the postal code of the residence of the household. This can be used to estimate the service area of a postal code. A household is generally defined as being composed of a person or group of persons who co-reside in, or occupy, a dwelling. The census is the most complete survey of household postal code response data for Canada. The postal code provided by the respondent in the census and the postal code assigned to the person's dwelling by CPC is usually the same. This information can be used to find the standard geographic area related to the postal code reported in the census. The diagram below shows how these are related: The census asks respondents to provide the postal code of their residence. Given that this information exists at the household and dissemination block levels, why not simply assign postal codes to dissemination blocks (and other standard geographic areas) based on the census alone? The standard geographic areas are typically containers by which analysis is done and any extraction of data using the postal code as a key can be done with these standard geographic areas. However, if standard geographic areas are to be used as the means to reliably link postal codes to other data, then the following are the reasons why the census information needs to be supplemented with geocoding:
The census reported postal code can be used to map service areas for both the postal code and the FSA. Thus, the best quality data can be obtained by combining the information from the census with that available from CPC. 2.2 Identifying the search area for geocodingSince less than 40% of the unique CPC community names match to those of like CSDs, how can we geocode addresses containing the remaining 60% of the CPC community names? Locating the street names and addresses for geocoding can be done using the census data. A test was conducted in autumn 2005 to measure the rate of geocoding based on FSA service areas. Since FSAs are more stable over time than postal codes, search areas were created using the FSA of the valid reported postal codes from the 2001 Census. For this test of geocoding of the June 2005 postal codes, FSA service areas were created using the postal codes that were valid in May 2001. Dissemination blocks were assigned to an FSA only if a valid postal code with that FSA was reported at least four times in that block.15 The geocoding of postal codes was done within the FSA service area for that postal code's FSA. Essentially the FSA service areas were used as the search areas for the geocoding. Despite these conservative criteria, over 88% of the civic address records (PCType of 1 and 2) could be geocoded – most of them at the most detailed block-face level. Using these search areas appears to be a productive way to do the geocoding where the postal code service area is designated in terms of civic addresses (PCTypes 1 and 2) and the related civic addresses and street network information is available in the road network file. 2.3 Geocoding postal codes directly using the census dataA method is needed for geocoding postal codes that either do not link to civic addresses (PCType 3, 4, 5) or have civic addresses that can not be found on the road network file. Would the census provide sufficient information to geocode these? The 2001 Census reported postal code data can be examined to see the PCTypes of postal codes that were in use. These responses should be evaluated with some caution because some postal codes have multiple PCTypes associated with them and respondents do not state the PCType of their postal code. The postal codes are therefore examined below in terms of the combinations of PCType. The June 2001 Address Lookup file had 758,658 postal codes which were not retired. Of the 673,242 valid postal codes reported in the census, 673,179 postal codes were found to be on the June 2001 Address Lookup File. A detailed examination of the postal codes that were not reported revealed the following: It appears that almost every rural delivery postal code is reported. Only 19 rural delivery postal codes were not reported at all. This is less than 1% of all the rural postal codes. Since coding PCTypes 4 and 5 are particularly difficult to geocode, the use of the census responses appears to be an effective approach to geocode these in the rural areas. Comparison of the 2001 Census reported results with the 2006 Census reported results should allow us to evaluate whether these postal codes are stable over time and therefore whether the use of the census data may be sufficient for geocoding the postal codes active during the census. Fewer urban delivery (in comparison to rural delivery) postal codes are reported. Business postal codes are understandably not generally reported in the census. This appears to be the case with about 50% of the PCType 3 postal codes. The PCType 5 postal codes may not be used as much in urban delivery areas, particularly since most people likely have delivery directly to their place of residence. Although only 10% of the PCType 1 postal codes were not reported, these still constitute a large number (76,581). These postal codes are not designated as business postal codes by PCType. However, upon closer examination, these do appear to be postal codes associated with non-residential buildings. For example K2E7L6, K2E7J5 and K2E1B6 are associated with civic addresses along Colonnade Drive in Ottawa. This area is zoned as a "general industrial zone".16 The subset of PCType 1 postal codes that were not reported were classified further in terms of their DMT and then checked to see if they constituted delivery to a single address. The breakdown of the PCType 1 postal codes that were not reported is shown below: Altogether, slightly more than half of these postal codes are associated with delivery to single address.17 Given that more than half of these postal codes are associated with one single address, some of them clearly designated as business address by CPC, suggests that the urban under reported PCType 1 postal codes could indeed be mostly business or institutional postal codes. Use of the census reported postal code is likely an appropriate tool to find the service areas of the FSAs as well as that of postal codes associated with households (rather than businesses or institutions). Generally, household postal codes are reported in the census and service areas created from the reported FSA should be sufficient for geocoding the majority of postal codes.18 2.4 Recommendations for geocoding based on the census responseCensus data is a good source for geocoding where postal codes are in use at census time. It can also be used to create search areas for geocoding postal codes with civic addresses. This baseline information can help in assigning postal codes in terms of usage to dissemination blocks, dissemination areas and CSDs. Postal code usage can change over time both because Canada Post changes the postal codes and because users may change their reported postal codes. Further to this, it is not possible to obtain a postal code for every household in the census because the response may be illegible, invalid or blank. However, the response postal code, especially if it is frequently reported within a dissemination block can be effectively used to construct the FSA of postal code usage by block. While postal codes tend to change a little every month, the FSA service area is likely to be more stable over the five-year period between censuses.19 The ability to geocode 2005 Canada Post data based on service areas created from the 2001 Census data suggests that FSAs are indeed usually stable. The stability of the FSA service areas and as well as that of the relationship of the reported postal codes with standard geographic areas should also be more rigorously tested by comparing the 2001 Census results with that of the 2006 Census.The 2006 Census reported postal codes can be used as follows for geocoding:
Geocoding directly to standard geographic areas from the census data rather than converting previously geocoded data should result in geocoding that is consistent with the concept of postal code service areas as well as more complete coverage and more precise geocoding. 3.0 Geocoding rules and measures of qualityRules based geocoding can be developed by combining the information from the census and the CPC files. Quality indicators can be developed for each record geocoded with this approach since both the rules based method of geocoding and the information used for geocoding can be used to create the quality indicators. 3.1 Service areas and search areasService areas need to be defined to start the geocoding process. The service areas and relationships suggested for geocoding are:
FSA service area, asdiscussed earlier, can be used to find and geocode to civic address ranges. For the 2006 Census, each FSA service area would include every DA which had at least one dissemination block where a minimum number of respondents reported that FSA in the census.20 Having a minimum number of respondents would ensure that the FSA is truly used and reported accurately within the DA. Because more than one FSA may be used within a DA, FSA boundaries constructed of whole DAs will overlap. However, a little overlap and overcoverage should not be an issue since these FSA service areas will not be used to geocode directly. FSA service areas will only provide the search areas for finding and geocoding postal codes based on street address information in the road network files. The postal code service area, unlike the FSA service area will be more tightly constructed to include only dissemination blocks where this postal code was reported by a minimum number of respondents. Furthermore, in order to restrict overlaps, only the highest reported postal code in that dissemination block would be chosen. The tighter delineation criteria is used because the postal code service area may be used to directly geocode the postal code to a dissemination block where the postal code usage was reported. However, not all postal codes may be assigned a search area based on the 2006 Census response data.21 Imputed postal codes could be used to create more comprehensive search areas but, this is generally a little less reliable. Another possibility is to create the relationship between the CSD used for the census with that of the previous names of the CSD as well as the place name file maintained in Geography Division. The CPC community's coverage indicates the area where the street name is unique. (As discussed earlier, CPC retains the old municipality names in cases of amalgamations and boundary changes until the duplicate street address problems are resolved.) Then this CPC community can match to one or more CSDs may be inexact. It is difficult to say definitively where the boundaries of the CPC community are because these boundaries are not available from CPC. However, CSDs can be matched to the CPC Community. This match between the CSDs and the CPC community can be used as confirmatory information for the automated geocoding by FSA service area.22 However, because the correspondence is not exact in terms of area covered, this correspondence will not be used as the primary method to create search areas for automated geocoding. The CPC community to CSD correspondence will only be used as an additional check on the automated geocoding and as a secondary means of geocoding. 3.2 Geocoding postal codes by PCTypeGiven the intention to geocode the postal codes as definitively as possible, priority will be given to geocoding postal code records whose service areas are defined by civic address ranges. These are the PCTypes 1 and 2. The next most definitive coverage is likely PCType 4 since the routes serviced are defined even if they do not have civic addresses or road names that can be matched to a road network file. Lastly PCTypes 3 and 5, which are associated with the postal delivery installation but, can be accessed by households in various parts of the CPC community or even by travellers or others outside the municipality will be given priority in terms of the allocation of the standard geographic areas to the postal code. This allocation of standard geographic areas in terms of the type of service available there puts the emphasis on the most likely used postal code in that standard geographic area. PCTypes 1 and 2 can be definitively geocoded by matching the CPC address information to the road network file. If the match is attempted on the street network and address information alone, there can be potentially multiple matches (across Canada). For example, there are many "Main" streets in Canada. Searching on a street name of "Main" would result in multiple matches. In the automated matching program, the matches would be done within a search area. The testing done to date suggests that the FSA search area should be used as the primary search area. The CSD of the match can also be confirmed by the CPC community to CSD Correspondence File. (Please see Appendix C on creating a correspondence between CPC community and CSD). PCType 1 and 2 records should be geocoded to the block-face or dissemination block level since these clearly represent postal delivery to address ranges. The precision of the geocoding can also be monitored – the number of dissemination blocks or block-faces geocoded to should be proportional to the address range records from the CPC Address Lookup File. In this way, both the census and the CPC Address Lookup File information can be combined to do the geocoding and assess its quality. Various combinations of PCTypes 2, 3, 4 and 5 occur together, particularly in rural delivery areas. (See table 5.) The census reported postal codes do not contain information on which PCType record was reported. Therefore, the type of service from that postal code can not be deduced. The Address Lookup File from CPC does provide the civic address information for PCType 2 but the coverage of each of the other PCTypes for a given postal code can not be deduced. Given this, the geocoding of the share of a particular type of postal code of PCTypes 3, 4 or 5 may not be accurate in terms of usage. In the past, postal codes associated with installations, particularly of PCType 3, were expected to be coded (on the PCCF) to the dissemination block of the installation. The approach here is to continue with that coding while distinguishing the record on the PCCF as one referring to the installation. But, where the general delivery (PCType 5) and post office box (PCType 3) are the only service available, these could also be geocoded to the standard geographic area as the postal code most likely in use there. The following approach is suggested for postal codes with any permutation of PCTypes 2, 3, 4 and 5 records:
In sparsely populated DAs where the postal service used is not evident from the response data or based on address information, the service available is likely from the nearest postal installation. These DAs could be coded to the postal code of the nearest postal delivery installation where postal service could be accessed (PCType 5 or 3). Geocoding done in this manner could be flagged using a field indicating the source of geocoding. The intent is to provide complete geocoding of postal codes in terms of the service available in every populated DA. The method of geocoding described here uses the FSA service area based on the census as the primary search area. PCType 1 and 2 are then geocoded within this area by matching on street address information. Where this is not possible, the relationship between the CPC community and FSA to the CSD could be used. The geocoding of PCType 3, 4 and 5 records is more difficult since these do not map to civic addresses. However, the census responses can be used to code to areas where postal codes of this type are in use. PCType 3 and 5 would also continue to be coded to the postal installation as well as the service area. In this way, all postal codes in use can be mapped geographic areas based on the available census response data and the road network file. 3.3 Measures of qualityThe quality of the postal code geocoding can be both improved and better reported given the concept of geocoding to FSA and postal code service areas. (Background and definitions of the measures of quality are provided in Appendix B.) A. Timeliness Timeliness can be measured in terms of the difference between the following:
The conceptual review above allows for rules based geocoding of postal codes to block-faces, dissemination blocks and DAs in Statistics Canada's Geographic Frame. Automated geocoding has been tested and can improve the efficiency of the geocoding process. While it may not be possible to use automated geocoding when there is insufficient information on the Geographic Frame or inconsistencies between the Geographic Frame and CPC data, it can be used for the majority of records. This efficiency can have the following improvements in vintage:
There are vintage differences between the census response data (available once in five years), the files from Canada Post (available once a month) and the road network files (which are periodically updated.) These can lead to difficulties in creating the association or linkage in the geocoding process. The lack of up-to-date information in the road network files is solved usually by contacting the appropriate municipality and requesting information. This can be a time-consuming process and can also result in response burden to the municipality. One way to deal with this would be to stage the geocoding. That is, code to only the CSD or the DA if the road network information is not available and then code to more precise levels as the information becomes available. Currently, once a postal code is geocoded we do not go back and try to improve the precision. In fact we may never re-geocode the postal code unless Canada Post signals a change in the service for that postal code. With automated geocoding we would have the ability to look for improvements in precision with each iteration of the road network file. B. Accuracy and precision The accuracy of the geocoding process can be indicated in terms of the confidence of geocoding to the correct standard geographic area(s). Precision can be measured in terms of whether the number of block-faces and dissemination blocks coded to are consistent with the address ranges serviced by the postal code. A record level data quality rating can be created based on the geocoding process to indicate the confidence rank of the geocoding. Access to more than one source of information for the geocoding also allows for the construction of quality indicators for the geocoding. One way of assigning the quality indicator would be to construct a three character indicator based on the geocoding process itself. The characters would be easily understood with "AAA" suggesting that the quality of the geocoding was confirmed by more than one source and was judged as the best and with "CCC" indicating that the geocoding was an approximation. The following table illustrates the proposed concept: Quality indicator for PCType 1 and 2 The final Quality indicator (QI) is constructed as a concatenation such that: QI = QI_1 | QI_2 | QI_3 A QI = AAA indicates that the geocoding was of the best quality The quality indicator QI_1 This indicator is for the quality of the general area to which the geocoding was done. It should be an indicator of the certainty in terms of geocoding to the correct Census Subdivision (and correct part of that CSD.) A - Search area is based on FSA service area and CPC community to CSD correspondence The quality indicator QI_2 This is an indicator of the Street Matching for PCTypes 1 and 2. A - Match is found on standardized street name, type and direction to the Road Network File The quality indicator QI_3 QI was assigned as follows in matching address ranges to the Road Network File: This quality indicator applies to the street address matching process for PCType 1 and 2 (about 97% of the records on the Address Lookup File). A data quality indicator could also be provided in a similar manner for postal codes coded directly to an installation. Where the geocoding is done based on the census information alone, a measure of stability could be created by comparing the geographic area reported in the previous census. This validation is not a direct measure of accuracy but, does provide an indicator of the confidence of the geocoding. Quality indicator for PCType 3, 4, and 5 The final quality indicator (QI) is constructed as a concatenation such that: QI = QI_1 | QI_2 A QI = AA indicates that the geocoding was of the best quality. There is no QI_3 available for PCType 3,4 and 5 since there are no street address range information describing the service area and the geocoding is not done to the same precision. The quality indicator QI_1 This indicator is for the quality of the general area to which the geocoding was done. It should be an indicator of the certainty in terms of geocoding to the correct census subdivision (and correct part of that CSD.) QI_1 is constructed in exactly the same way as for PCTypes 1 and 2. A - Search area is based on FSA service area and CPC community to CSD correspondence The quality indicator QI_2 This is an indicator of the stability of the definition of the boundaries of these service areas. QI_2 is constructed differently for PCType 3, 4 and 5 (from that for PCType 1 and 2) since no street address range information is available for the geocoding. PCTypes 3, 4, 5 (Postal code service area location) A - Stable area (i.e., geocoded to the same DA) based on current census data and previous census data PCType 3 (Postal installation location) A - Address from Householder file is geocoded to Road network file and matches the DA of the point file from CPC23 Precision of the area coded to can be measured in terms of the level of geography coded to as well as the number of standard geographic areas coded. This is also particularly relevant for the civic address based geocoding. For example, PCTypes 1 and 2 have service areas defined by address ranges. This would imply that at the most precise level the geocoding would be done to a block-face, followed by the dissemination block and then the DA, the DA being the least precise level of geocoding. The number of dissemination blocks geocoded to as a proportion of the number of address ranges for a postal code would also indicate whether there has been allocation of unnecessary dissemination blocks to a postal code. Measures of precision can be used as part of the verification of the geocoding process as well as to provide general indicators to the data user. C. Completeness Given that the postal code products (PCCF and PCFRF) provide links between postal codes and standard geographic area, the completeness of the coverage of these products can be measured in the following way:
The change in methodology proposed here allows for postal codes of PCType 3 and 5 to be coded to populated DAs when no postal walk or route service is available. This would result in a more complete coverage of the postal service available across Canada. D. Relevance The sales as well as the use within Statistics Canada of the PCCF and the PCFRF show that this product is useful. However, an indicator in the file, the Single Link Indicator (SLI) is used for population or dwelling allocation although it is not truly designed for that purpose. The SLI is designed to provide one representative point for a postal code for the purpose of mapping label points of the postal code. But, this indicator is also being used for data allocation. One standard geographic area (dissemination block or DA) may not be representative of all of the population accessing this postal code. For example a postal code service area may span three dissemination blocks and one dissemination block may not be more representative than the others. The recommendation is that data allocation be done by either using statistical software available from Statistics Canada (PCCF+) or by using another allocation process based on a DA population weighted file.24 E. Interpretability and metadata Metadata can be provided at the record level to enable clients to better understand the record linkage. This metadata will also help in the internal evaluation of the improvements that may be needed in geocoding. The quality indicator (QI) has been discussed earlier (in section 3.3). The other record level metadata suggested are described below: Method of coding The method of coding can be relevant to the quality of the record linkage. The following are proposed for this indicator:
Currently the QI is proposed to be available only for the automatically geocoded records. This additional indicator documents the source of the other types of record linkage. With the monitoring of the quality of these other types of coding, the QI could in the future be made to include all sources of coding. Service areas coded The postal code may be coded to the postal code service area or to a postal delivery installation and these two types of coding need to be distinguished. The following is proposed for this indicator:
DMT with modifications Currently where the DMT is missing, Statistics Canada assigns a "W" for the DMT. This "W" includes both rural and small urban areas. The user may be able to better use the DMT if the distinction can be made between these. Another letter such as "S" could be used to distinguish rural delivery areas where the PCType = 2 and delivery to a civic address is available. Geocoding rules and well defined sources of the geocoding enable the creation of quality indicators and other metadata that can inform the user as well as enable the monitoring of the quality of the files. ConclusionThe intentions of this paper were: (1) to study the postal code concepts and descriptors used by Canada Post Corporation; (2) to establish a conceptual framework articulating the relationship between postal codes and standard geographic areas; and, (3) to subsequently, establish business rules for linking postal codes to standard geographic areas. Postal codes can be related to standard geographic areas by the postal code service areas. While in some cases the address information available from Canada Post is sufficient to geocode the postal code to a road network file, in many cases supplementary information is needed to do the geocoding. The census provides an ideal means to discern the service area of the FSA and also that of the postal code. While this method of geocoding emphasizes the household use of the postal code, business postal codes can also be geocoded this way.25 For the geocoding of the postal code, the first step would be determining the FSA service area based on the census and supplementing this with a concordance file of the CPC community and the CSD. This would establish the search area within which the postal code can be geocoded. Differentiating the geographic footprint of the postal code by PCType enables better coding to postal code service areas. An automated geocoding process based on this methodology would allow for efficiency in geocoding. An efficient geocoding system can be run as needed to take advantage of updates to postal codes as well as improvements in the road network. Ultimately, the conceptual framework discussed here, provides rules for an efficient geocoding process with metadata on the quality of the geocoded record. Notes1. The terms in italics are defined in the glossary. 2. The CPC classification is considered to be a component of mail preparation. The classification is typed based on the 2nd character of the postal code as follows:
3. In this working paper, "census" refers to the Census of Population. 4. Statistics Canada defines urban and rural areas, based on the Census of Population reported for dissemination blocks, as follows: An urban area has a minimum population concentration of 1,000 persons and a population density of at least 400 persons per square kilometre, based on the current census population count. All territory outside urban areas is classified as rural. Taken together, urban and rural areas cover all of Canada. 5. The other 3% of the postal codes are not associated with civic addresses but, with post office boxes and postal service routes. 6. This match rate is based on matching unique CSD names within a province/territory with the CPC community names of postal codes of a province/territory (based on the first letter of the postal code). The matching was done after standardising the names to all capitals with no accents or special characters. The figures in the table are in fact a slight overestimate of the match rate since the CSD type could not be factored in the matching and is not in the denominator of the percentage calculation. (CSDs with the same name and a different type [town, reserve, municipal district etc.] such as Yarmouth Town and Yarmouth Municipal District both in Nova Scotia constitute entirely different CSDs. Please see the Glossary for a definition of CSD Type.) 7. Finding the particular service area of a postal code is essential to establishing an automated geocoding process. However, in a manual geocoding process, the options exist to look-up various maps and even consult municipalities. 8. This information is based on an analysis of Postal Code data and information gleaned from the Canada Post Corporation documents on the CPC website, www.canadapost.ca, as well as a telephone conversation with a Canada Post representative on November 17, 2004. 9. This typology list is an amalgamation of information provided for records described in "Postal Code Address Data Technical Specifications" (Canada Post Corporation, 2005a, p.5-10). 10. Information from the Address Lookup File can be used to find the delivery installation in the Householder File. The civic address of the delivery installation is often but not always available in the Householder File. Point locations of the delivery installations are also available from Canada Post Corporation. 11. This postal code with PCType=1 and PCType=3 was found on the December 2004 version of the Address Lookup File from Canada Post Corporation. This was also confirmed on the Internet lookup tool at www.canadapost.ca (accessed on May 30, 2005). 12. This is based on a telephone conversation with a Canada Post Corporation representative on November 17, 2004. 13.The DMT should not be confused with the PCType in the PCCF. The DMT has more to do with how the mail is delivered. The PCType of the postal code takes precedence in terms of where the mail is delivered. A typical case would be a large volume receiver who has a postal code with PCTypes of 3 and 1. A piece of mail with address formatted as PCType 1 (street address) would still be sent to the post office box of the postal code as long as the postal code was correct and machine readable. (Otherwise, the machine will attempt to decipher the postal code based on the address information.) This system prevents large amounts of mail being delivered by a letter carrier. The DMT is used by large volume mailers to obtain discounts based on pre-sorting mail. (This is based on a conversation with a representative of CPC, November 17, 2004.) 14. The vast majority of records in the Address Lookup File are of PCTypes 1 and 2, and these refer to delivery to a specific dwelling or building. Other PCTypes refer to services available from the delivery installation. 15. Where postal codes for another province were reported within the block, this was not considered in the geocoding. The geocoding method is explained more fully in the next section. 16. Based on maps and information on the City of Ottawa website accessed on Ottawa.ca, July 5, 2006 17. This includes all of the postal codes in the chart above with DMTs of E, B and G as well as the postal codes with DMT of A that have delivery to a single address only. 18. The method being used for geocoding postal codes to the 2001 Census geographic areas is based on matching CPC community name to CSD name. This method has problems as discussed in section 1.3. However, improvements are being made to the matching to CSD name and this is discussed further in Appendix C. 19. Where large-scale changes do occasionally affect FSA areas, these changes are reported by CPC. 20. For the test of FSA service areas (mentioned in section 2.2 ) a 2001 Census block was assigned to a FSA service area only if a minimum of four respondents reported a postal codes with that FSA in that block. Choosing a minimum number of responses of an FSA is based on the premise that the likelihood that four people making the same mistake in reporting the FSA is lower that that of one person making that mistake in the same block. Despite using this conservative criterion, the geocoding rate was over 88% of the PCType 1 and 2 civic address records. 21. Based on the test of FSA service areas mentioned above, service areas could not be constructed for all of the postal codes because not all postal codes were reported. 22. A list of correspondence between the CPC community names and the CSDs has been created for searching of street addresses in the manual geocoding process. Please see Appendix C for more details on how this correspondence file is being improved. 23. A file of postal installation locations was obtained from Canada Post in January 2006 for testing. 24. The Postal Code Conversion File Plus (PCCF+) is a complementary product to the Postal Code Conversion File (PCCF). It is an automated system that uses postal codes to assign census geography. PCCF+ is based on the latest Postal Code Conversion File and the Postal Code Population Weight File produced by the Geography Division of Statistics Canada. PCCF+ uses weights to allocate postal codes linked to multiple dissemination areas according to the distribution of population using a given postal code. 25. The authors suggest that the business use of the postal code as well as comparisons of the postal code service area footprint between the 2001 Census and the 2006 Census can be explored in a subsequent working paper. |
![]() |
|