Reports on Special Business Projects
Toward a Classification of Communities by Remoteness: A Proposal

Release date: June 30, 2023

Skip to text

Text begins

Acknowledgments

The authors of this report would like to acknowledge the following at Statistics Canada for their support and inputs: Alessandro Alasia, Pulkit Aggarwal, Eric Baxter, Mahamat Hamit-Haggar, Peter Murphy, and Jason Wong. The following colleagues at Indigenous Services Canada are also acknowledged for their valuable feedback: Eric Guimond, Bruno Powo Fosso, and Jennie Thompson.

Summary

This study proposes a categorical classification of geographic communities (i.e., census subdivisions) into remoteness classes using a continuous index of remoteness. The methodology and the results of its application are discussed herein.

Alasia et al. (2017) used the travel cost from a community to population centres (POPCTRs) and the size of those POPCTRs to develop a continuous Index of Remoteness (IR) at the census subdivision (CSD) level in Canada. This index provides a measure of geographical proximity to what are typical points of service availability, taken in that work to be POPCTRs. The index has values ranging from 0 for the least remote community to 1 for the most remote community. This IR provides policymakers and researchers with a practical measure on a continuous scale for gauging the relative remoteness of communities.

However, in some applications, it is desirable to have discrete levels of remoteness defined by thresholds on the continuous IR. These levels would group communities into distinct categories of remoteness. This categorical classification would identify communities that are relatively less remote (i.e., starting with a “non-remote” class), and other classes with relatively more remote communities (i.e., ending with a “remote” class). When a two-level or binary classification is called for, a single threshold or cut-off is needed on the continuous IR. This would divide communities into a “remote” class and a “non-remote” class.

In this study, the IR was subdivided into classes based on natural clusters found on the IR continuum. To narrow the search interval to identify a single IR cut-off point on the [0, 1] range, the index values were initially grouped into three classes, with cut-off values of 0.2721 and 0.5010. In a following step, the class in the middle parts of the range, i.e., [0.2721, 0.5010], was split further, resulting in two classes over this narrowed range. Finally, the binary cut-off was identified at the IR value of 0.40.

In order to carry out the above steps to narrow the search range for a binary cut-off, in addition to looking at the natural breaks in the IR, factors were selected and used that relate to a more generalized concept of remoteness than what the IR represents. Such a remoteness concept encompasses the availability of services in the environs of a community, as proxied by the surrounding population. Therefore, the aggregate population of surrounding POPCTRs and the aggregate population of surrounding CSDs (i.e., communities) were selected to bring in this extended element of remoteness. These factors were evaluated in concert with the IR to achieve progressive narrowing of the search range.  As a result of this procedure, the [0.4000, 0.4500] interval was identified as being of most interest. Applying a criterion whereby priority was given to putting borderline CSDs (i.e., IRs within [0.4000, 0.4500]) in the remote class, the lower end of this range, 0.40, was selected as the binary IR cut-off.

Using the 0.40 IR cut-off, about one-third (31.7%) of populated CSDs overall were flagged as remote. Among Indigenous CSDs (see section 2.2 for definition used), almost three-fifths (60.2%) of these communities were remote, while this proportion dropped to about one-fourth (25.3%) for non-Indigenous communities. In terms of population, about one in twenty-five (4.1%) Canadians lived in a remote area (based on the 2016 Census of Population), while almost three-fifths (60.3%) of the population of Indigenous CSDs also lived in remote communities.

This study outlines a methodology that is generic and is not tied to addressing any specific problem or application. It is acknowledged that specific applications might have their own unique needs for groupings by remoteness and might also follow some other valid approach than the one used here to achieve them. Users might also adapt the methodology used in this study to address their needs.

1 Introduction

Two of the main factors in defining the remoteness of a community are population density and proximity to relatively more densely populated and larger centres. By combining these factors, a continuous remoteness index can be computed with values that would range from a minimum value that would be assigned for the least remote communities, to a maximum value, for the most remote communities. An example would be values on a continuum from 0 to 1.

In some applications, it is desired or required to have two or more discrete levels of remoteness as opposed to a continuous index. Starting from the continuous Index of Remoteness (IR) for Canadian communities created by Alasia et al. (2017), the objective of this study is to develop a binary IR cut-off point to group CSDs into non-remote and remote classes (alternatively, less-remote, and more-remote, etc.).

The proposed binary IR cut-off point is identified by progressive narrowing of the range of IR values over which to search for such a point. The methodology finds natural breaks on the IR scores and exploring the cut-offs by examining the IR and the aggregate population within a certain degree of proximity to a given community.

The result of this analysis is a proposed classification that can be further tested in actual applications. A qualified proposal is put forward for the choice of a single cut-off point from a progressively narrowed search range on the full IR continuum. A rationale is provided for each successive narrowing of the search range and for the choice of a single value as the proposed binary IR cut-off. While the methodology used here is generic and is not tied to addressing any specific problem or application, it is acknowledged that different applications might require their own specialized groupings by remoteness and that the cut-off or threshold for other binary groupings could vary from what is proposed in this work.

The paper is organized as follows: section 2 presents an overview of the remoteness and geographic concepts and the data sources. Section 3 outlines the methodology used for finding the binary IR cut-off point. Section 4 presents the results of applying the identified binary cut-off. The conclusion in section 5 provides an overview of key findings and the position of the approach used here relative to other possible approaches and proposals for a binary IR cut-off.

2 Concepts and Data Sources

This section provides brief information on remoteness and the geographic concepts and data sources used in this study.

2.1 Remoteness concepts

Remoteness generally refers to the isolation of a community in physical terms. Population density and proximity to relatively more densely populated and larger centres are the two main attributes mostly used to capture the notion of the remoteness of a community. These two attributes are considered to compute the continuous Index of Remoteness (IR) of Canadian communities which is used in this study. A detailed methodological description of IR is presented in Alasia et al. (2017). The index is based on a gravity model and is computed for each census subdivision (CSD)Note  of Canada, reporting some population or a road connection. The model accounts for distance between a CSD and any population centres within a 2.5-hour travel time, as well as the population size of these population centres.Note  The continuous index that results from these computations ranges from 0 to 1, with values near 0 representing the least remoteness and closest proximity to population centers.Note  This continuous index provides policymakers, researchers, and other stakeholders with an important tool for identifying communities within a specific range of remoteness.

Figure 1 displays the frequency distribution of the 2016 IR. The distribution is skewed to the right, implying that the number of communities with high remoteness are relatively few compared with those that have low or intermediate remoteness. The largest number of CSDs fall in the two bins with ranges from 0.30 to 0.35 and from 0.35 to 0.40.

Examining Figure 1 could arguably lead to three groupings being visualized: a first grouping including CSDs at the lower end of the range with IR values less than 0.15; a second grouping containing CSDs with IR values falling in the [0.1500, 0.4500)Note  interval; and a third grouping with IR scores equal to or greater than 0.45. Over two-thirds (61.6%; 3,158 out of 5,125) CSDs have IR scores between [0.1500, 0.4500). Subsequent sections of this paper will describe the methodology and the results obtained, but this initial observation leads to a preliminary expectation that the binary IR cut-off point falls in this middle interval.

Figure 1 Frequency distribution of the  IR of CSDs in Canada

Description for Figure 1 
Data table for Figure 1
Table summary
This table displays the results of Data table for Figure 1. The information is grouped by IR value (range) (appearing as row headers), Number of CSDs, calculated using number units of measure (appearing as column headers).
IR value (range) Number of CSDs
number
0-0.05 21
0.05-0.1 196
0.1-0.15 557
0.15-0.2 303
0.2-0.25 406
0.25-0.3 543
0.3-0.35 722
0.35-0.4 669
0.4-0.45 515
0.45-0.5 321
0.5-0.55 248
0.55-0.6 187
0.6-0.65 118
0.65-0.7 99
0.7-0.75 113
0.75-0.8 28
0.8-0.85 45
0.85-0.9 22
0.9-0.95 10
0.95-1 2

One of the ways of setting about defining discrete levels of remoteness is to identify natural clusters within the continuous IR. However, identifying the natural clusters does not suffice to flag non-remote and remote CSDs. The IR was calculated based on the proximity of a community to POPCTRs and the population of these POPCTRs. However, in the generalized sense, there are also other factors on which more comprehensive conceptualizations of remoteness may be based. For example, the IR does not adequately capture the condition of a community that is far from a POPCTR but is located in an area where it is surrounded by small communities that together contain a number of services. Therefore, other related factors that bring in a larger or more generalized conception of remoteness could be considered to advantage for the task at hand. In this analysis, to identify communities that could be classified as remote, factors such as the population in the surrounding areas of a community were also explored to identify a cut-off that can be used to classify CSDs.

2.2 Geographical concepts

This study is using Statistics Canada’s 2016 Standard Geographical Classification (SGC).Note  The 2016 SGC has 5,162 census subdivisions (CSDs, i.e., communities) and 1,005 population centres (POPCTRs). A CSD is an administrative region defined along municipal and reserve boundaries by the province or territory to which each municipality or reserve belongs.Note  In this study, CSDs are also referred as communities, since they are almost analogous to municipalities which can be thought of as making up a type of community. The term community may be more easily understood by some stakeholders than CSDs. Whereas POPCTRs are defined as areas with a population of at least 1,000 and a population density of 400 persons or more per square kilometer.Note 

In this study, CSD are classified as Indigenous or non-Indigenous. Here, Indigenous CSDs are defined in terms of the 2016 SGC as any of the six CSD types legally affiliated with First Nations or Indian bands: Indian reserve (IRI), Indian settlement (S-É), Indian government district (IGD), Terres réservées aux Cris (TC), Terres réservées aux Naskapis (TK), Nisga'a land (NL).Note  In addition, ISC provided a list including Inuit CSDs and a list of CSDs in Northwest Territories and Yukon which were affiliated with First Nations bands.Note  CSDs included in these two lists were also classified as Indigenous in the current study.

Based on consultations with subject matter experts at ISC, there were only eight self-governing Métis settlements in Alberta with land-base agreements. These settlements were defined as Designated Places (DPLs) and created by provinces and territories, in cooperation with Statistics Canada.Note  These DPLs were smaller than a CSD, and therefore the CSDs in which they located could not be classified as Indigenous CSDs.

All other CSDs are classified as non-Indigenous. Note that the classification of a CSD as non-Indigenous does not imply that the entire population of that CSD was non-Indigenous. The population of a CSD classified here as non-Indigenous could comprise of both non-Indigenous and Indigenous people.

Overall, 1,070 Indigenous CSDs (as defined above) with IR values were included in the analysis.

2.3 Data sources

This study relied mainly on the 2016 version of the continuous IR for Canadian communities whose original version was published in 2017 (Alasia et al., 2017). Whereas the original version of the IR is based on the 2011 Census of Population and the 2011 SGC , the 2016 versionNote  is based on the 2016 Census of Population and the 2016 SGC.Note 

The 2016 IR contains values for 5,125 CSDs, including for the 4,882 populated CSDs, as well as for 243 non-populated CSDs that were connected to the main road/ferry network.

In the current study, the population of CSDs and POPCTRs was obtained in nearly all cases from the 2016 Census of Population.Note 

The travel time matricesNote  used in this study were a copy of those used for the 2016 IR version. The travel-time matrices contain the travel time from each CSD to POPCTRs within 300 kilometres straight-line distance and from each CSD to other CSDs within this distance.

3 Methodology  

In order to classify CSDs into two groups based on their IR scores, several steps were taken. An overview of these steps is presented in the following sub-section, while the details of their application are discussed in the sub-section 3.2.

3.1 Overview

The initial step narrowed the range of the IR values over which a discrete cut-off for classifying communities into non-remote and remote classes could be found. This was done by finding natural breaks that occur in the distribution of the IR scores of all the CSDs using the k-means clustering method.Note  This allowed for the identification of discrete groups within the continuous IR.

The k-means method generated clusters (or classes) in the IR score distribution by minimizing the “within class” variation and maximizing the “between classes” variation. Given the distribution observed in Figure 1, CSDs were grouped into three clusters based on their IR scores using a k-means algorithm with k = 3.Note  The results of three clusters were combined with the two clusters (i.e., k = 2) to narrow the search interval for the desired binary IR cut-off.

In a second step, in addition to looking at natural breaks in IR scores, other factors were explored in order to distinguish groups of communities with relatively similar IR scores. It can be expected that residents of a community that have access to a set of other communities within a reasonable proximity, i.e., in their “surroundings”, that together have a relatively large aggregate population will have more services available to them. By contrast, residents of a community with a smaller aggregate population in their surrounding areas can be expected to have access to fewer services. This presumption is both intuitive and corroborated by evidence in the literature supporting the assumption that population size can be used as a proxy for service availability (see Alasia et al., 2017 and Department of Health and Aged Care in Australia, 2001).

Therefore, the aggregate population in CSDs and the aggregate population in POPCTRS around a community within a 2.5-hour travel time were selected for use in this study along with the natural breaks in IR scores. The 2.5-hour duration was used in order to limit the travel time to a plausible range of commuting over a single day. These factors describe the situation in the surrounding areas of a given community with a specific IR score and provide a lens for making relevant differentiations between it and other communities with relatively similar IR scores. For example, a community that is far from a POPCTR but is surrounded by several small communities, with a particular aggregate population, would likely be considered less isolated. Therefore, in a more generalized sense, it would be less remote compared to one with fewer near neighbors or a smaller aggregate population, even though both might have fairly close IR values.

The first factor, the “aggregate population in surrounding CSDs” or AggPopSurrCSD, was calculated by finding the sum of the populations of all CSDs within a travel time of 2.5 hours and within a 300 km straight-line distance from a reference CSD. The population of the reference CSD itself was not included in the calculation of this indicator. The second factor, the “aggregate population in surrounding POPCTRs” or AggPopSurrPOPCTR, was calculated by finding the sum of the populations of all POPCTRs within 2.5 hours travel time and within 300 km geodesic distance from a reference CSD, including the population of the POPCTR located in the reference CSD itself.Note 

The AggPopSurrCSD and AggPopSurrPOPCTR were utilized to search for a cut-off value by searching for patterns in their relationship with the IR within the intermediate interval, which was obtained based on natural breaks in the IR. This was done by exploring the measures of central tendency (mean and median) of these two factors and relating these with the distribution of IR scores. Communities with IR scores falling within the intermediate interval identified earlier were included in this analysis. The objective was to identify the IR score below which most or nearly all CSDs are surrounded by a greater aggregate population (AggPopSurrCSD) than the mean of AggPopSurrCSD for all CSDs.  The same process was repeated using the aggregate population of surrounding POPCTRs, i.e., using AggPopSurrPOPCTR.

In the final step, after identifying some candidates for IR cut-off, a single IR cut-off was selected. This was done based primarily on a qualitative argument that prioritizes classifying borderline CSDs as remote. The decision was also supported by examining IR characteristics by mode of transportation.

The next section explains the application of these steps, see the summary of these steps in Figure A-1 in Appendix.

3.2 Application

Natural breaks in the IR scores were identified using the k-means method as the first step in categorizing CSDs into natural clusters. This allowed for the identification of discrete groups within the continuous IR.  Applying this method, CSDs were grouped into two and three clusters (i.e., k = 2 and k = 3). The classification of CSDs into three clusters yielded an intermediate (or middle) range within which the search for the binary IR cut-off would proceed. Note that intervals were displayed mathematically in this study such that a square bracket (i.e., []) was used when including the end value and a round bracket (i.e., ()) to exclude the end value.

Table 1 shows the number of CSDs by class obtained by running k-means algorithm for k = 2 and k = 3.


Table 1
Number of CSDs by k=2 and k=3 cluster
Table summary
This table displays the results of Number of CSDs by K=2 and K=3 cluster. The information is grouped by Two-class (appearing as row headers), Three-class, Non-remote; IR range [0, 0.2717], Intermediate-remote; IR range [0.2721, 0.5010] and Remote; IR range [0.5014, 1], calculated using CSD counts units of measure (appearing as column headers).
Two-class Three-class
Non-remote; IR range [0, 0.2717] Intermediate-remote; IR range [0.2721, 0.5010] Remote; IR range [0.5014, 1]
CSD counts
Non-remote; IR range [0, 0.3791] 1,717 1,415 Note ...: not applicable
Remote; IR range [0.3793, 1] Note ...: not applicable 1,127 866

As seen in Table 1, CSDs with IR scores at either end of the IR distribution (1,717 CSDs with IRs in [0, 0.2717] and 866 CSDs with IRs in [0.5014, 1]) were classified respectively as being non-remote (close to 0) or remote (close to 1) based on being sorted into either two and three clusters (k= 2 or k=3). However, the classification of CSDs with score in the range of [0.2721, 0.5010], called “Intermediate-Remote”, would change depending on whether two or three clusters were created. This interval is defined a transitional working category between non-remote and remote when two natural breaks were being assessed.

Given the overlap of classes based on two and three clusters, the IR score of 0.3791 (i.e., the natural break/cut-off in IRs obtained based on k-means algorithm for k=2) divides the intermediate-remote class into two groups: (i) CSDs with IR scores within [0.2721, 0.3791] interval and (ii) CSDs with IR scores within (0.3791, 0.5010] interval. With a view to explore further this intermediate-remote class of CSDs , Table 2 displays the descriptive statistics of AggPopSurrCSD and AggPopSurrPOPCTR in the IR ranges obtained by regarding the single natural break obtained based on k=2 means algorithm for populated CSDs with IR scores that fall between 0.2721 and 0.5010.

The table shows that CSDs with IR in the range of [0.2721, 0.3791] were in areas that had a minimum aggregate population of surrounding population centres (AggPopSurrPOPCTR) of 30,707. This is close to the lower population threshold of a medium POPCTR, which is 30,000.Note  Looking at the median value of AggPopSurrCSD for the CSDs with IRs in [0.2721, 0.3791] in Table 2, it is seen that more than half of these communities also have access to aggregate surrounding population in the range of the size of a large POPCTR (i.e., 100,000 or more).Note  From these estimates of the equivalent level of access in their surrounding areas when comparing to POPCTRs with relatively large populations, it follows that the binary cut-off point should be considered to fall outside this interval (i.e., outside [0.2721, 0.3791]). Furthermore, it should be located within an interval that represents greater remoteness, being the [0.3793, 0.5010] interval. This latter range has 1,127 CSDs, with the majority of them (94%, or 1,058 CSDs) being populated CSDs connected to the main road/ferry networks (Table A-1 in Appendix).


Table 2
Descriptive statistics of AggPopSurrCSD and AggPopSurrPOPCTR for populated CSDs within intermediate-remote class by IR range
Table summary
This table displays the results of Descriptive statistics of AggPopSurrCSD and AggPopSurrPOPCTR for populated CSDs within intermediate-remote class by IR range. The information is grouped by Factor (appearing as row headers), IR Range and Aggregate Population, calculated using Min, Max, Mean, Standard deviation and Median units of measure (appearing as column headers).
Factor IR Range Aggregate Population
Min Max Mean Standard deviation Median
AggPopSurrCSD [0.2721, 0.3791] 1,477 1,830,417 460,103 229,372 408,938
[0.3793, 0.5010] 0 1,100,780 138,352 81,584 129,413
AggPopSurrPOPCTR [0.2721, 0.3791] 30,707 1,165,823 325,483 170,437 295,917
[0.3793, 0.5010] 0 218,666 72,524 39,124 68,820

The next step continues to narrow the search interval for the IR cut-off further by examining the characteristics of CSDs with IR scores falling inside the [0.3793, 0.5010] interval. For this purpose, the following factors which represent the aggregate population of surrounding areas are used (i) AggPopSurrCSD (the aggregate population of surrounding CSDs); and (ii) AggPopSurrPOPCTR (the aggregate population of surrounding POPCTRs).

Table 3 displays the descriptive statistics for AggPopSurrCSD and AggPopSurrPOPCTR for populated and connected CSDs with IR scores between 0.3793 and 0.5010, the narrowed IR range identified earlier.


Table 3
Descriptive statistics of AggPopSurrCSD and AggPopSurrPOPCTR for populated and connected CSDs; IR in [0.3793, 0.5010]
Table summary
This table displays the results of Descriptive statistics of AggPopSurrCSD and AggPopSurrPOPCTR for populated and connected CSDs; IR in [0.3793, 0.5010]. The information is grouped by Factor (appearing as row headers), Min, Max, Mean, Standard deviation, Median and 95th percentile (appearing as column headers).
Factor Min Max Mean Standard deviation Median 95th percentile
AggPopSurrCSD 325 1,100,780 139,044 81,173 130,036 270,281
AggPopSurrPOPCTR 0 218,666 72,606 39,078 68,820 136,154

With the mean values of AggPopSurrCSD and AggPopSurrPOPCTR noted from Table 3, the distribution of IR scores versus these factors is examined to identify any insight-giving patterns that might be present in IR scores with respect to the mean values of these factors.

Figure 2 displays the scatter plot of the IR scores (y-axis) versus AggPopSurrCSD. As expected, there is a negative correlation between the IR and AggPopSurrCSD. The vertical line in this figure corresponds to the mean of AggPopSurrCSD (see Table 3). This line divides the CSDs into two groups based on whether or not their AggPopSurrCSD is above the average for all populated and connected CSDs.

Next, moving up the IR axis in Figure 2, it is examined whether there is an IR score below which most or nearly all CSDs have AggPopSurrCSD that is greater than the mean for this factor for all CSDs. This IR value can be a potential cut-off point as it separates CSDs with relatively greater aggregate surrounding populations from those with smaller aggregate populations in surrounding areas.

As seen in Figure 2, nearly all CSDs (except for 6 CSDs; see the top right of the Figure, above the IR = 0.45 horizontal line) that have an above-average AggPopSurrCSD have IR scores of greater than 0.45 (i.e., to the right of the vertical line). Also, dividing the interval of [0.3793,0.5010] into smaller intervals with a step size of 0.0050, the proportion of CSDs with AggPopSurrCSD greater than 139,000 was calculated for CSDs with IR scores within each small interval (e.g., [0.4400,0.4450)), see Figure A-4 in the Appendix. As may be seen in Figure A-4, the proportion of CSDs with AggPopSurrCSD greater than 139,000 is less than 10.0% (i.e., from no CSDs to two CSDs) in every small interval for IR values greater than or equal to 0.45.

Note that the mean and median IR scores of CSDs falling below the mean of AggPopSurrCSD (139,000) (i.e., to the left of the vertical line) are equal to 0.45 (rounded to two decimal points). This result suggests that 0.45 can be a potential upper limit for the IR cut-off, since CSDs with IR scores above this value have access to areas with smaller aggregate populations.

Similarly, examining the distribution of IR scores versus AggPopSurrPOPCTR, 98.4% of CSDs with access to above average mass of POPCTR aggregate population (see Table 3) have IR scores less than 0.45. This result also supports 0.45 as a potential limit value for a cut-off point, as it separates CSDs with relatively large aggregate population in their surrounding population centres from CSDs with smaller values for this factor.

Figure 2  Scatter plot of IR vs. the  aggregate population of surrounding CSDs (i.e., AggPopSurrCSD); IR in [0.3793, 0.5010]

Description for Figure 2

Figure 2 displays the scatter plot of the Index of Remoteness (IR) values (y-axis) versus the aggregate population of surrounding CSDs (i.e., AggPopSurrCSD; x-axis). The y-axis ranges from 0.380 to 0.500 while the x-axis ranges from 0 to 350,000. Each point on the plot represents a populated census subdivision (CSD) which is connected to the main road/ferry network with an IR value that falls within [0.3793, 0.5010]. Almost all of the points are scattered on the left side of the plot under the plot diagonal connecting the top-left corner to the bottom-right corner. There is a vertical and horizontal line dividing the plot area into  four areas. The vertical line equation is x =  139,000 which represents the mean of the aggregate population of surrounding CSDs within 2.5 hours of each reference CSD. This line divides the CSDs into two groups based on whether or not their AggPopSurrCSD is above the average for all populated and connected CSDs. The horizontal line equation is IR = 0.45. There is a rectangle at the top right of the Figure above the horizontal line and to the right of the vertical line close to the intersection of vertical and horizontal lines. These rectangles identify the 6 CSDs that have both an above-average AggPopSurrCSD and IR scores greater than 0.45.

This analysis further narrows the search interval of a possible IR cut-off for a binary classification, from [0.3793, 0.5010] to [0.3793, 0.4500). The value of 0.4500 is excluded from the search interval when proceeding to the next step, because the CSDs that have this IR score are more remote compared to those with smaller IR scores. There is only one CSD that happens to have an IR score of exactly 0.4500 while having an AggPopSurrPOPCTR of less than 139,000.

The next step consisted of examining IR characteristics of CSDs within this IR range by partitioning them around the mean values of the two population factors. CSDs whose factor values in each case are above the respective means of the factor values have a relatively large aggregate population in their surrounding areas compared to CSDs whose factor values fall below the mean. This provides a suitable way for distinguishing less remote from more remote communities. Communities with relatively small aggregate surrounding populations, on average, have greater IR scores and are likely to be more remote in the generalized sense of remoteness compared with those with greater aggregate populations in their environs, and vice versa.Note 

Table 4 shows that these mean values of AggPopSurrCSD and AggPopSurrPOPCTR are 164,000 and 87,000, respectively, and it also depicts the descriptive statistics of the CSD groups for which the values of these two factors are greater than their respective mean values (within the IR range of interest, i.e., [0.3793, 0.4500)).


Table 4
Descriptive statistics for IR for groups of CSD that have factor values for AggPopSurrCSD and AggPopSurrPOPCTR greater than the respective means of each factor; populated and connected CSDs; IR in [0.3793, 0.4500)
Table summary
This table displays the results of Descriptive statistics for IR for groups of CSD that have factor values for AggPopSurrCSD and AggPopSurrPOPCTR greater than the respective means of each factor; populated and connected CSDs; IR in [0.3793,0.4500). The information is grouped by Factor (appearing as row headers), Factor mean , CSD counts and IR for Group of CSDs having factor values > factor mean, calculated using Min, Max, Mean, Standard deviation and Median units of measure (appearing as column headers).
Factor Factor mean CSD counts IR for Group of CSDs having factor values > factor mean
Min Max Mean Standard deviation Median
AggPopSurrCSD 164,000 360 0.3793 0.4489 0.4015 0.0164 0.3974
AggPopSurrPOPCTR 87,000 357 0.3793 0.4448 0.4018 0.0160 0.3986

Table 4 shows that the mean IR for the group created using either factor is approximately 0.40 (rounding to two decimal places). The median IR for these groups is also 0.40, implying that the middle point or the pivot point of the IR is 0.4. This analysis introduces 0.40 as another potential cut-off point/lower limit of the search range.

Examining the reduced interval defined by 0.40 as its upper point, i.e., [0.3793, 0.4000),Note  it is first noted that there are 277 CSDs with an IR score within the interval which have mean and median IR scores of 0.3900. These CSDs have a minimum value for AggPopSurrPOPCTR of 18,884, and nearly all of them (98%) have AggPopSurrCSD and AggPopSurrPOPCTR values over 30,000 (close to the lower population threshold of a medium POPCTR). Looking at a closer level of proximity, over 80% of these communities also have access to aggregate population of 30,000 and more within 1.5 hours travel time.

Based on this analysis, remote CSDs would be expected to have access to aggregate population in their surroundings within 2.5 hours that is smaller than 30,000 when compared to non-remote CSDs. This analysis provide support to a cut-off value of 0.40 as a lower limit of the IR value for the search range.

The cut-off point was selected such that, to the greatest extent possible, it would classify borderline CSDs as remote. To achieve this, the lower limit of the [0.4000, 0.4500] range, i.e., 0.4000,Note  was selected as the preferred cut-off point. In specific field applications, practitioners might consider tailoring the approach used here to their requirements. For example, they could investigate the CSDs with IR scores relatively close to the 0.40 cut-off somewhat more comprehensively in terms of focusing on an aspect which is important for their application.

One of the strengths of the methodology used in this study lies in exploring the natural breaks on the IR continuum to yield clusters. These clusters indicate conceptual groupings of communities in terms of similarities in their degree of remoteness as expressed by their IR score. These clusters imply a degree of similarity between communities within them. However, the methodology is still dependent upon the use of aggregate populations in surrounding areas, which may not capture the full picture for individual communities. Another limitation of the methodology lies in the relative uncertainty as to the extent to which a distributed population can serve as a proxy for service availability. Although much research has been done on population size as a measure of what services are available, it is less certain that a collection of smaller communities would offer the same services as would be found in an urban hub of a similar total population size.

4 Results

The results of applying the proposed IR binary cut-off of 0.40 to all CSDs in Canada is discussed in this section. While discussing these results, it should be recalled that communities classified using the binary cut-off into the two groups (non-remote and remote) do still have different levels of remoteness on the IR continuum compared to others in the same group.

4.1 Geographic Distribution of CSDs by Non-remote and Remote Class

Figure 3 shows the geographic distribution of non-remote and remote CSDs based on the 0.40 cut-off.Note  In general, vast regions of the northern parts of several provinces are predominantly made up of remote CSDs. Most parts of the territories are also made up of remote CSDs. However, the distribution is more mixed in most of the Atlantic provinces except for Newfoundland and Labrador, as well as in Alberta and parts of British Columbia.

Figure 3 Geographic distribution of non-remote and  remote CSDs, IR cut-off point 0.40

Description for Figure 3

This map shows the remoteness class of census subdivisions (CSDs) based on the 0.40 cut-off value. The CSD, provincial, and territorial boundaries are marked as black lines. This map has three components: a large map showing non-remote and remote CSDs across Canada and two inset maps zoomed into south British Columbia (part A) and areas in south-east Ontario, south Quebec, and parts of New Brunswick close to Ontario and Quebec (part B).

CSDs are categorized into one of two classes: non-remote (i.e., an IR value less than 0.40) and remote (i.e., an IR value greater than or equal to 0.40), colored in orange and blue respectively. CSDs with no reported population in 2016 and not connected to the main road/ferry network are also displayed in this map and colored in white.

The map shows that almost all territories are comprised of either remote CSDs (colored in blue) or CSDs with no population which are not connected to the main road/ferry network (colored in white). Looking at the provinces, large parts of almost all provinces are remote (colored in blue) except for some areas found mostly in the south of the provinces.

Reference period 2016: CSV

4.2 Non-remote and Remote CSDs by Indigenous and Non-Indigenous Communities

Figure 4 displays the distribution of populated CSDs by remoteness class based on the 0.40 IR cut-off, including breakdowns by Indigenous and non-Indigenous communities.Note 

Among all populated CSDs, about one-third (31.7%) of them were remote based on this IR cut-off. Focusing on Indigenous CSDs, almost three-fifths (60.2%) of these CSDs were classified as remote. This proportion dropped to one-fourth (25.3%) for non-Indigenous communities.

Figure 4 Distribution of Indigenous and non-Indigenous populated CSDs by remoteness  class

Description for Figure 4 
Data table for Figure 4
Table summary
This table displays the results of Data table for Figure 4 Remoteness class - IR cut-off: 0.40, Non-remote and Remote, calculated using percent units of measure (appearing as column headers).
Remoteness class - IR cut-off: 0.40
Non-remote Remote
percent
All CSDs 68.3 31.7
Non-Indigenous CSDs 74.7 25.3
Indigenous CSDs 39.8 60.2

In terms of population distribution, Figure 5 shows that almost three-fifths (60.3%) of the residents of Indigenous CSDs lived in remote communities, compared to only about 3.3% of the population of non-Indigenous CSDs.Note  Overall, about one in twenty-five (4.1%) of all Canadians lived in remote areas (based on the 2016 Census of population). 

Figure 5 Population Distribution of  Indigenous and non-Indigenous CSDs by remoteness class

Description for Figure 5 
Data table for Figure 5
Table summary
This table displays the results of Data table for Figure 5 Remoteness class - IR cut-off: 0.40, Non-remote and Remote, calculated using percent units of measure (appearing as column headers).
Remoteness class - IR cut-off: 0.40
Non-remote Remote
percent
All CSDs 95.9 4.1
Indigenous CSDs 39.7 60.3
Non-Indigenous CSDs 96.7 3.3

4.3 Non-remote and Remote CSDs by Mode of Transportation

Table 5 depicts the IR descriptive statistics for all populated CSDs by mode of transportation. All 47 CSDs which were connected to other CSDs via air only have IR scores greater than 0.40 and therefore were considered to be remote based on the 0.40 IR cut-off. However, CSDs which were connected to other communities via the main road/ferry network and by a combination of air, train, winter road, charter boat and/or seasonal ferry could be classified as either non-remote or remote, since the minimum IR score of these CSDs is less than 0.40. The next two sub-sections look at these CSDs in more detail.


Table 5
IR descriptive statistics by mode of transportation; populated CSDs
Table summary
This table displays the results of IR descriptive statistics by mode of transportation; populated CSDs. The information is grouped by Mode of transportation (appearing as row headers), CSD and IR, calculated using Counts, Percent, Min, Max, Mean, Standard deviation and Median units of measure (appearing as column headers).
Mode of transportation CSD IR
Counts Percent Min Max Mean Standard deviation Median
Air 47 1.0 0.4671 1 0.7862 0.1472 0.8523
Combination of air, train, winter road, charter boat, and/or seasonal ferry 91 1.9 0.2532 0.9173 0.7302 0.1392 0.7811
Main road/ferry network 4,744 97.2 0 0.8571 0.3301 0.1533 0.3297
Total 4,882 100.0 0 1 0.3419 0.1680 0.3344

4.4 CSD Characteristics by Remoteness Class for Populated CSDs Connected to the Main Road/Ferry Network

Table 6 displays the descriptive statistics of population and IR scores of populated CSDs which were connected to the main road/ferry network by the remoteness class obtained based on the 0.40 IR cut-off. It shows less than one-third (29.7%) of populated and connected CSDs were flagged as remote communities, and about one in twenty-five (3.8%) of Canadians lived in these remote communities.


Table 6 Descriptive statistics of CSDs by remoteness class (IR cut-off: 0.40); populated and connected CSDs
Table summary
This table displays the results of Table 6 Descriptive statistics of CSDs by remoteness class (IR cut-off: 0.40); populated and connected CSDs. The information is grouped by Remoteness Class (appearing as row headers), CSD, Population and IR, calculated using Counts, Percent, Sum, Mean, Standard deviation and Median units of measure (appearing as column headers).
Remoteness Class CSD Population IR
Counts Percent Sum Percent Mean Standard deviation Median
Non-remote 3,334 70.3 33,745,749 96.2 0.2528 0.0978 0.2695
Remote 1,410 29.7 1,323,913 3.8 0.5127 0.0947 0.4847

4.5 CSD Characteristics by Remoteness Class for CSDs Connected to Other CSDs via a Combination of Modes of Transportation

Table 7 shows descriptive statistics of population and IR scores of CSDs which are connected to other communities via some combination of air, train, winter road, charter boat and/or seasonal ferry (e.g., CSDs that were non-connected to the main road/ferry network), by remoteness class based on the 0.40 IR cut-off. Among all such CSDs, only two were non-remote with one located in Quebec and the other in Ontario. Overall, the vast majority (97.8%; 89 out of 91) of these non-connected CSDs were flagged as remote.


Table 7
IR descriptive statistics by remoteness class (IR cut-off: 0.40); populated and non-connected CSDs
Table summary
This table displays the results of IR descriptive statistics by remoteness class (IR cut-off: 0.40); populated and non-connected CSDs. The information is grouped by Remoteness Class (appearing as row headers), CSD, Population and IR, calculated using Count, Sum, Min, Max, Mean, Standard deviation and Median units of measure (appearing as column headers).
Remoteness Class CSD Population IR
Count Sum Min Max Mean Standard deviation Median
Non- remote 2 379 0.2532 0.3461 0.2997 0.0657 0.2997
Remote 89 59,790 0.4084 0.9173 0.7398 0.1243 0.7850

Given the number of remote CSDs connected to other communities via some combination of mode of transportation (i.e., 89; see Table 8) and 47 remote CSD connected via air (see Table 5), nearly all (136 out of 138; 98.6%) communities which were not connected to other CSDs via the main road/ferry network had an IR score greater than 0.40. These results support the selection of 0.40 as a binary cut-off point, since this cut-off separates communities which were not connected to the main road/ferry network. Therefore, these communities would naturally be expected to be more geographic isolated or remote.

5 Conclusion

This study describes the methodology and results of proposing a single cut-off point that can be applied to a continuous Index of Remoteness (IR), previously developed at Statistics Canada, to classify all CSDs in Canada into two classes of remoteness. Of these two classes, one would be non-remote (or less-remote, an IR value of less than 0.40) and the other, remote (or more-remote, an IR value of greater than or equal to 0.40). This was done by first identifying natural clusters on the IR continuum and using the resulting classes to select a single cut-off value using supplementary factors that relate to a broader concept of remoteness.

The classification of CSDs into three natural clusters yielded an intermediate range of IR values which resulted in narrowing the search range into a range of IR values of [0.3793, 0.5010].

The aggregate surrounding populations of a CSD (e.g., the population within a certain proximity) was used to describe the remoteness of an area in a more generalized sense than would be captured by IR. This surrounding population would include both the surrounding CSDs and the surrounding POPCTRs. Therefore, in addition to the natural clusters in the IR distribution, the aggregate population of CSDs and the aggregate population of POPCTRs within 2.5 hours of travel time of a reference CSD were used to identify the binary cut-off point.

The narrowed range of [0.3793, 0.5010] that was obtained based on the natural breaks in IR scores was explored using the two factors relating to the aggregate surrounding population to narrow the range further in a progressive manner. This process yielded the range to be [0.4000, 0.4500]. Applying a criterion whereby priority is given to putting borderline CSDs (i.e., IR within [0.4000, 0.4500]) into the remote class, the lower end of this range, 0.40, was selected as the binary IR cut-off.

Overall, about one-third (31.7%) of populated CSDs were flagged as remote, and only one in twenty-five (4.1%) of Canadians lived in these remote communities (based on the 2016 Census of Population). Based on the 0.40 IR cut-off, almost three-fifths (60.2%) of all Indigenous CSDs were classified as being remote while this proportion drops to one-fourth (25.3%) for non-Indigenous communities.

Nearly all (136 out of 138, or 98.6%) communities which were not connected to other CSDs via the main road/ferry network had an IR score greater than 0.40. This observation supports the selection of 0.40 as the desired cut-off, since the chance of geographic isolation of communities (or being remote) which are not connected to the main road/ferry network would naturally be expected to be higher compared to connected ones.

While the methodology developed is generic and is not tied to addressing one problem or application, it is acknowledged that different applications might require their own specialized groupings by remoteness.

6 References

Alessandro Alasia, Frédéric Bédard, Julie Bélanger, Eric Guimond and Christopher Penney (2017). Measuring remoteness and accessibility: A set of indices for Canadian communities. Statistics Canada- Centre for Special Business Projects.

Department of Health and Aged Care (2001), Measuring Remoteness: Accessibility/Remoteness Index of Australia (ARIA). Occasional Papers: New Series No. 14. Australian Government.

Hastie, T., Tibshirani, R., & Friedman, J. (2008). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer.

Rajendra Subedi, Shirin Roshanafshar and T. Lawson Greenberg (2020). Developing Meaningful Categories for Distinguishing Levels of Remoteness in Canada. Statistics Canada- Centre for Population Health Data (CPHD). Analytical Studies: Methods and References No. 026.

7 Appendix


Table A-1
Number of CSDs by mode of transportation, by two- and three-class natural clusters on the IR continuum
Table summary
This table displays the results of Number of CSDs by mode of transportation. The information is grouped by Two-class (appearing as row headers), Three-class, Non-remote; IR range [0, 0.2717], Intermediate-remote; IR range [0.2721, 0.5010] and Remote; IR range [0.5014, 1], calculated using CSD counts, Air, Combination of air, train, winter road, charter boat and/or seasonal ferry, Main road/ferry network, No population and All CSDs units of measure (appearing as column headers).
Two-class Three-class
Non-remote; IR range [0, 0.2717] Intermediate-remote; IR range [0.2721, 0.5010] Remote; IR range [0.5014, 1]
CSD counts
Air
Non- remote; IR range
[0, 0.3791]
Note ...: not applicable Note ...: not applicable Note ...: not applicable
Remote: IR range
[0.3793, 1]
Note ...: not applicable 2 45
Combination of air, train, winter road, charter boat and/or seasonal ferry
Non- remote; IR range
[0, 0.3791]
1 1 Note ...: not applicable
Remote: IR range
[0.3793, 1]
Note ...: not applicable 6 83
Main road/ferry network
Non- remote; IR range
[0, 0.3791]
1,693 1,364 Note ...: not applicable
Remote: IR range
[0.3793, 1]
Note ...: not applicable 1,058 629
No population
Non- remote; IR range
[0,0.3791]
23 50 Note ...: not applicable
Remote; IR range
[0.3793, 1]
Note ...: not applicable 61 109
All CSDs
Non- remote; IR range
[0, 0.3791]
1,717 1,415 Note ...: not applicable
Remote; IR range
[0.3793, 1]
Note ...: not applicable 1,127 866

The progressively narrowed search intervals within which to explore for the IR cut-off at are displayed in Figure A-1. The following provides an overview of the steps for determining these narrowed intervals:

Step 1-a: Applying k-means algorithm for k = 3 to find natural breaks in IR scores and to identify the transitional interval.

Step 1-b: Applying k-means algorithm for k = 2 to reduce the search interval based on the overlap between intervals obtained in this step and Step1-a. The search interval is narrowed to [0.3793, 0.5010].

Step 2: Given the information obtained in Step1-a and Step1-b, the search interval in this step is [0.3793, 0.5010]. Focusing on the populated and connected CSDs to the main road/ferry network with IR scores in this interval, this step explores IR scores versus AggPopSurrCSD and AggPopSurrPOPCTR to identify a pattern to narrow the search interval. It results in reducing the search interval from [0.3793, 0.5010] to [0.3793, 0.4500). The 0.45 is also identified as a potential cut-off.

Step 3: Focusing on populated and connected CSDs with IR scores in [0.3793, 0.4500) and with AggPopSurrCSD and AggPopSurrPOPCTR greater than 164,000 and 87, 000, respectively, the search interval is narrowed to [0.3793, 0.4000).  The reduced search interval is obtained based on the mean and median of IR scores of these CSDs. The 0.40 is also identified as another potential cut-off.

Step 4: The smallest interval which is likely contain the IR cut-off is determined to be [0.4000, 0. 45000], and the IR cut-off is selected to be 0.40.

Note that there is no CSD with IR score within the (0.2717, 0.2721), (0.3791, 0.3793) and (0.5010, 0.5014) intervals.

Figure A-1  Summary at each  step of narrowing the search interval to identify the IR cut-off

Description for Figure A-1

Figure A-1 displays the progressively narrowed search intervals that were used within each step of the study to explore the ideal IR cut-off value.The y-axis represents the search interval at each step through the current study, ranging from 0 to 1. The x-axis represents the steps to search for the IR cut-off and are composed of five categorical values: Step0, Step1-a, Step1-b, Step2, Step3, and Step4. The search interval at each interval is shown by a line with start and end points.  The following provides an overview of what is visible for each of the steps:

  • At Step0, a vertical line from 0 to 1 displays the search interval at this step.
  • At Step1-a, three vertical lines display: a line from 0 to 0.2717, a line from 0.2721 to 0.5010, and a line from 0.5014 to 1.
  • At Step1-b, two lines display: a line from 0 to 0.3791 and a line from 0.3793 to 1.
  • At Step2, one line displays from 0.3793 to 0.5010.
  • At Step3, one line displays from 0.3793 to 0.4500.
  • At Step4, one line displays from 0.4000 to 0.4500.

Figure  A-2  Optimal number of clusters  K-means clustering- Elbow method

Description for Figure A-2

Data table for Figure A-2
Table summary
This table displays the results of Data table for Figure A-2. The information is grouped by Number of clusters (appearing as row headers), Total within sum of squares, calculated using number units of measure (appearing as column headers).
Number of clusters Total within sum of squares
number
1 149
2 57
3 25
4 14
5 9
6 6
7 5
8 4
9 3
10 2

Figure A-2 illustrates a line plot to identify the optimal number of clusters using the k-means clustering method based on the elbow method. The y-axis shows the total within sum of squares ranging from 0 to 160, while the x-axis shows the number of clusters (i.e., k) ranging from 1 to 10. For each value of k, its associated total within the sum of squares is displayed with a point. The points are connected to each other and form a line. As the number of clusters increases, the total within the sum of squares starts to decrease, with the maximum total within the sum of squares occurring at k = 1. The Figure shows a rapid change, creating an elbow shape, at k = 3. From this point on, the line is almost parallel to the x-axis.

Figure A-3  Optimal number of clusters K-means clustering- Average Silhouette  method

Description for Figure A-3

Data table for Figure A-3
Table summary
This table displays the results of Data table for Figure A-3. The information is grouped by Number of clusters (appearing as row headers), Average silhouettes, calculated using number units of measure (appearing as column headers).
Number of clusters Average silhouettes
number
2 0.54
3 0.57
4 0.56
5 0.55
6 0.56
7 0.56
8 0.55
9 0.55
10 0.53

Figure A-3 depicts a line plot of the average silhouette values versus the number of clusters (e.g., k) obtained based on the k-means clustering method. While the y-axis displays the average silhouette values for each k and ranges from 0.54 to 0.57, the x-axis displays the number of clusters and ranges from 2 to 10. For each value of k, its associated average silhouette is displayed with a point. The points are connected to each other and form a line. The line starts at k =2 with an average silhouette value of less than 0.54 and reaches its maximum point at k =3 with an average silhouette value of greater than 0.57. Then the line shows a drop with some minor upward and downward movement until it reaches at k =10 a point with the average silhouette value being less than 0.54.

Figure A-4  Proportion of CSDs with aggregate population of surrounding CSDs  (i.e., AggPopSurrCSD) greater than 139,000, by IR score ranges; IR  in [0.3793, 0.5010]

Description for Figure A-4 
Data table for Figure A-4
Table summary
This table displays the results of Data table for Figure A-4. The information is grouped by IR value (appearing as row headers), Proportion of CSDs (appearing as column headers).
IR value Proportion of CSDs
number percent
0.379 85.7
0.380 80.0
0.385 84.8
0.390 85.0
0.395 71.2
0.400 71.1
0.405 69.2
0.410 71.7
0.415 62.5
0.420 45.0
0.425 45.8
0.430 41.3
0.435 46.0
0.440 24.0
0.445 23.7
0.450 5.7
0.455 8.7
0.460 3.4
0.465 0.0
0.470 2.7
0.475 0.0
0.480 0.0
0.485 0.0
0.490 0.0
0.495 0.0

Figure A-5  Geographic  distribution of remote, intermediate-remote, and non-remote CSDs obtained using  k-means method (k=3

Description for Figure A-5

This map shows all census subdivisions (CSDs) in Canada divided into remoteness classes based on the k-means method for k = 3. The CSD, provincial, and territorial boundaries are marked on the map as black lines.

This map has three components: a large map showing non-remote, intermediate-remote, and remote CSDs across Canada and two inset maps zoomed into including areas in south British Columbia (part A) and some areas in south-east Ontario, south Quebec, and parts of New Brunswick close to Ontario and Quebec (part B).

CSDs are categorised into one of three classes: non-remote (colored in orange), intermediate-remote (colored in light blue), and remote (colored in dark blue). CSDs with no reported population in 2016 and not connected to the main road/ferry network were also displayed in this map and colored in as white.

The map shows that almost all of the territories are comprised of remote and intermediate-remote CSDs, as well as some CSDs with no population which are not connected to the main road/ferry network. Looking at the provinces, large parts of almost all of the provinces are remote and intermediate-remote except for some areas in the south of provinces. In Newfoundland and Labrador, CSDs are all either remote or intermediate-remote and are displayed in light or dark blue.

Reference period 2016: CSV

Date modified: