2014 Health Region Peer Groups – User Guide
- 1. Introduction
- 2. Data
- 3. Methodology
- 4. Results
- 5. Discussion
- 6. Peer Groups in Action
- 7. Summary
- 8. Bibliography
The purpose of this document is to define the concept of peer groups, to give an overview of how they are created and to demonstrate their usefulness. This paper presents the 2014 classification of the peer groups. More detailed technical information on the formation of the peer groups can be found in the working papers Health Region Peer Groups and Health Region Peer Groups 2003 written by the Health Statistics Division of Statistics Canada. To obtain a copy of this documentation, contact Client Services (613-951-1746; fax: 613-951-0792; firstname.lastname@example.org).
The launch of the Canadian Community Health Survey (CCHS) in 2000, along with the expansion of existing data products at the health region level, lead to the desire for a method of comparing regions with similar socio-economic determinants of health. The reasoning behind the development of such a method is that once the effects of the various social and economic characteristics known to influence health have been removed, it is possible to compare regions by measures of health status. It is also possible to compare the relative effectiveness of health promotion and prevention activities across regions. Thus, the health regions have been placed into groups with similar socio-economic characteristics using a clustering technique, and these groups are referred to as ‘peer groups’.
Development of the criteria used to define peer groups required careful consideration of their intended use. The requirement that peer groups be used as a method for comparing health related issues ultimately eliminated all variables directly describing health as potential candidates in the creation of the groups. Further, it was desired that all variables used must be reliable and available for all health regions. As well, the need for objectivity required that peer groups be developed using empirical techniques. Finally, consideration of the need for simplified and relevant comparisons also required that peer groups have approximately five to ten health regions per group and that there be representation across the country within each group. In the application of the above parameters, several limiting factors resulted which required some modifications. All criteria were followed to the extent possible and any deviations are explained in detail throughout this document.
The original 2000 Peer Group Classification was released in 2002, and was based on the 1996 Census information as well as the health region boundaries as defined by the provinces and territories in 2000. In order to remain current with respect to data availability and the health region boundary changes, it is necessary to update the peer group classification over time. These updates have taken place in the form of the 2003 Peer Group Classification and the 2007 Peer Group Classification. The latest update to the peer groups is based on data from the 2011 Census of Population and the 2011 National Household Survey (NHS) and the 2014 health region boundaries. The final result of this classification was the creation of nine peer groups, representing all health regions across Canada.
This document will give an overview of how the peer groups are created. The 2014 Peer Group Classification is presented and the results are compared with the peer groups created in the past. Finally, the use of the peer groups in the analysis of health related issues will be demonstrated through an example.
Typically, 24 variables describing the socio-economic and socio-demographic determinants of health within the health regions across Canada are used in the clustering algorithm to produce the peer groups. The variables chosen for this task cover a wide range of areas including demographic structure, social and economic status, ethnicity, Aboriginal status, housing, urbanization, income inequality and labour market conditions. Note that health-related variables were deliberately not used in the creation of the peer groups.
There have been some modifications made over time; however the majority of the variables have remained consistent since the peer groups were first created in 2002. This peer group classification used the same 24 variables used in the creation of the 2007 peer groups. These variables are based on the 2011 Census and 2011 NHS. The variables used for the analysis, along with their respective sources, are outlined in Appendix A.
A non-hierarchical cluster analysis was the method chosen to create the peer groups. Generally speaking, cluster analysis attempts to organize variables or observations into groups (clusters) based on a measure of their distance from each other so that observations within each group are similar to one another with respect to variables or attributes of interest. In other words, the goal is to group the observations into homogeneous and distinct clusters. Non-hierarchical algorithms attempt to partition a set of observations into a pre-defined set of disjoint groups using a specified optimization criterion. This approach appeared best suited to meet the original objectives of the peer group project, mainly to use an empirical technique to create a pre-defined number of peer groups with approximately 5 to 10 health regions within each group.
The peer groups were created in SAS using the FASTCLUS procedure. This procedure uses a k-means algorithm to assign observations to a pre-defined set of k clusters. A description of k-means clustering and several variants of the method can be found in Johnson and Wicheren (2002). The basic steps for placing observations into k clusters are as follows:
- Select k observations as cluster seeds (the initial centers of the clusters).
- Assign observations to the nearest cluster seed. After all observations are assigned, cluster seeds are replaced by their respective cluster means. This step is repeated until the change in cluster seeds becomes or approaches zero.
- Form final clusters by assigning each observation to its nearest cluster seed.
Complete details of the FASTCLUS procedure can be found in the SAS OnlineDoc®, Version 9.
3.1 Number of Clusters
One of the major challenges with cluster analysis is selecting the appropriate number of initial clusters. Several criteria have been suggested (Everitt et al., 2001) which generally involve the optimization of one or more test statistics. From a practical perspective it is generally left up to the analyst to determine the number that best suits a given need. For the purpose of the 2014 Peer Group Classification a maximum of 16 clusters was chosen. This would give an average number of seven health regions to each peer groupNote 1, which is in line with the study objectives. The maximum number of clusters used in 2014 was lower than that used in previous analyses since the total number of health regions has decreased since then.
4.1 Standardization of Variables
Variables measured on different scales, or on a common scale with differing variances, are often standardized in order to mitigate the effect of these differences among the variables. For this exercise, all 24 socio-economic variables were standardised (mean 0, variance 1) prior to performing the cluster analysis. Two variables could not be calculated for some of the more remote health regions: the proportion of low income persons in private households (LowPop) and the proportion of low income children (LowKids). This is because the Census and the NHS do not derive low income data for the three territories and Indian reserves. Other remote areas can also be excluded from low income statistics if the data in that region are considered unreliable. For the two low income variables used in the peer group analysis, a value of zero appears in the file for the three territories as well as for regions 2417 and 2418. This value of zero is an indication that the variable could not be calculated.
4.2 Creation of Peer Groups
To establish a starting point, the clustering algorithm was instructed to group the health regions into 16 clusters. Five of the resulting clusters contained only one health region. This indicated that 16 clusters were too many given that the objective of assigning peer groups is to be able to compare similar health regions. The cluster analysis was rerun with a reduced number of cluster seeds.
The results of the final cluster analysis using PROC FASTCLUS can be seen in Table 4.2.1. The table shows the number of health regions contained in each peer group, as well as several statistics related to the clusters. The root mean square standard deviation is a measure of the variability in the data points around the cluster centre. The radius displays the largest Euclidean distance from the cluster centre to any observation within the cluster. The nearest cluster refers to the closest peer group in terms of Euclidean distance. Finally, the last column of the table displays the distance between the current cluster centre and that of its closest neighbour. For each of these statistics, the cluster centre is the point having coordinates that are the means of all the observations in the cluster. Euclidean distance is a statistical measure of distance between two points.
There were two clusters that contained the majority of health regions (C and D). Both of these clusters were comprised of regions that were very similar (as both clusters were large in terms of the number of health regions and had low standard deviations). As well, these clusters were nearest neighbours and the distance between their cluster centers was small, demonstrating that the health regions in both clusters were also similar. Therefore, although these clusters did not meet the objective of having approximately five to ten regions per peer group, there did not appear to be a valid reason to split them into smaller groups.
Note that there are two levels of geography in Ontario: there are 14 Local Health integration Networks (LHINs) and 36 Public Health Units (PHUs). The information at the PHU level was used to create the peer groups. At the final stage in the cluster analysis, once the peer groups were established, the LHIN level geography was added to the existing clusters to find the appropriate peer group for each LHIN. The LHINs did not have an impact on the placement of the other health regions into the final peer groups. In an analysis involving the peer groups, only one level of geography in Ontario should be used to avoid undue influence from Ontario in the formation of the peer group clusters.
4.3 Collapsing Small Clusters
The results from section 4.2 (specifically Table 4.2.1), represent clusters that are approximately evenly spaced and have minimal within-cluster variance given the parameters used by the clustering algorithm. The results in the table show that 11 clusters were formed that range in size from two to 31 health regions (excluding the LHINs). However, having a cluster with less than five regions is not practical as it does not provide many options for comparison. In order to provide more peers for comparison, clusters with less than five members were combined with their nearest neighbour. The exceptions were cluster G (Montréal, Toronto and Vancouver) and Cluster I (health regions 6101, 6001, 4835 and 5953). Cluster G was not combined with another cluster since these health regions tend to be very different than other regions across the country. Cluster I was not combined with its nearest neighbour, cluster B, since it has characteristics differing than many HRs in cluster B.
There were two clusters that were joined with their closest neighbour. Cluster F (health regions 2417, 2418 and 6201) was combined with its nearest neighbour cluster K (health regions 4604 and 4714). The collapsing of clusters F and K produced a cluster with five health regions, so no additional collapsing was required. This combined cluster was labelled Cluster F. As well, Cluster J (health regions 2416, 3551, 4832 and 4834) was combined with cluster B (health regions 2407, 2414, 2415, 3530, 3536, 3560, 3565, 3566, 4605, 4704, 4706, 4710, 4831, 4833 and 5921. This combined cluster was labelled Cluster B. The result of collapsing the smaller clusters was that the 11 peer groups produced from the final cluster analysis using the FASTCLUS procedure and presented in Table 4.2.1 were reduced to nine groups. A list of Health regions by the final peer groups can be found in Appendix D (Ontario Health Units) and Appendix E (LHINs).
5.1 Strongest Predictors
In order to determine which variables played a key role in defining the health region peer groups, the final clusters were run against all 24 variables in a stepwise discriminant analysis. Partial R2 statistics for entry and removal were set at 0.15. Any variable which had an R2 value of 0.5 or higher when regressed against a variable already in the model was removed from the analysis. Overall, five variables were appeared to be the most important predicators. Table 5.2.1 displays a summary of the results.
The strongest predictors of the final peer groups were population density and the proportion of the population belonging to a visible minority group. No additional variables were removed from the analysis when regressed against population density, whereas the proportion of the population under the age of 20 and the proportion of lone-parent families were removed from the analysis when regressed against the employment rate.
5.2 Principal Component Analysis
Principal component analysis is a multivariate technique which aims to reduce the number of variables in the data to a few factors called principal components. Principal components are linear combinations of the original variables and are uncorrelated. They are derived in decreasing order of importance, so that as much of the total variance in the data can be explained in as few factors as possible. Therefore, the first principal component is the most important factor since it explains the largest proportion of the total variance in the data.
A principal component analysis was performed on the 24 socio-economic variables used in the cluster analysis. The first two principal components accounted for just over 55% of the total variability. The first principal component is comprised of measures including: population living in census metropolitan areas, housing affordability, proportion of immigrants, average dwelling value, proportion of post-secondary graduates, proportion of visible minorities, etc. The second principal component contains measures of: proportion of aboriginal population, proportion of the population aged 0 to 19 years, proportion of the population aged 65 and over, proportion of lone-parent families and proportion of dwellings in which the owner also lives. The third principal component could be interpreted as income inequality (median household income, proportion of low-income children, proportion of low-income persons in private households, long term unemployment rate, employment rate, etc.). The first six principal components accounted for over 86% of the total variability in the data, showing that 24 variables can be reduced to six factors without losing much information. These results are similar to the previous peer group classification, which indicates that the variables which drive the analysis are remaining fairly consistent over time.
5.3 Peer Group Descriptions
The five key variables determined by the stepwise discriminant analysis were used to represent each of the clusters. The mean values of these five variables for each peer group can be found in Appendix B. For each of the five variables, several percentiles were calculated and used to classify the peer groups. Values were classified based on the following ranges.
Very High: X > 85th percentile
High: 65th percentile < X ≤ 85th percentile
Medium: 35th percentile < X ≤ 65th percentile
Low: 15th percentile < X ≤ 35th percentile
Very Low: X ≤ 15th percentile
The results from this classification can be found in Table 5.3.1. While the methodology is crude as a descriptive tool, it does help to distinguish one peer group’s characteristics from another. As shown in the table below, there are no two peer groups which fall into the same category for all five variables. For example, peer group G (which consists of Toronto, Vancouver and Montréal) is the only group with a very high population density, a very low percent of Aboriginals and a very high percent of immigrants.
The results from this classification were used to derive a written summary of the ten peer groups based on the five key variables from the discriminant analysis. This summary is presented in Appendix C.
5.4 Geographic Limitations
Each province and territory defines the geographic boundaries for a health region based on administrative preference and these boundary definitions change over time. Health regions can be strictly population centres or rural or some combination of the two. There may be considerable variability within health regions in regards to health measures due to the lack of geographic homogeneity and this should be taken into consideration when inferences are being made about a certain region. For instance, even though the health indicators in Vancouver compare favourably with the national averages, this should not be interpreted as meaning that the residents of the downtown core in Vancouver have better than average healthNote 2. This lack of homogeneity in defining health region boundaries makes the exercise of assigning health regions to peer groups much more difficult, as it can have a large impact on how well a certain variable represents the entire region and in some cases important defining factors may be missed.
It should also be noted that there may be considerable variability amongst the health regions within a peer group in regards to the socio-economic factors used in the cluster analysis. This should be considered when comparing regions within a certain peer group. This variability can be seen for the 2014 peer groups in Appendix B for the five key variables determined by the stepwise discriminant analysis.
5.5 Collapsing Health Regions
There is one instance where the CCHS combines smaller health regions to increase the sample size for reporting purposes. This occurs in northern Saskatchewan where health region 4711 (Mamawetan Churchill River Regional Health Authority), 4712 (Keewatin Yatthé Regional Health Authority) and 4713 (Athabasca Health Authority) are combined to form 4714 (Mamawetan/Keewatin/Athabasca). The decision was made to use these combined health region (4714) in the creation of the peer groups since the CCHS is one of the principal data sources used in an analysis of health related data by peer groups.
In the fall of 2014, health region boundaries in Nova Scotia were updated and six zones (1210, 1223, 1230, 1240, 1258, and 1269) were replaced by four new zones. The new regions were used for the creation of the peer groups.
5.6 Geographic Representation of Final Peer Groups
The map below is a good visual representation of the geographic clustering of the health regions into the final nine peer groups. Montréal, Toronto and Vancouver form the smallest cluster because in terms of the size and the diversity of their populations, they are too different from the other health regions to be combined with any other peer group.
There are some definite clusters of health regions that formed based on common characteristics due to their location within Canada. The northern regions have clustered based on the Aboriginal composition of their communities. All peer groups have representation across provincial and/or territorial borders.
There are two valuable, yet different, analyses possible with the peer groups; health-related indicators can be compared between and within peer groups. Since peer groups are formed based on regions that have similar socio-economic characteristics, it is expected that differences between peer groups will arise. Peer groups with better socio-economic status indicators are likely to have better health status measures. Estimates of a single peer group can also be compared with national averages. Additionally, it is also possible to compare health regions within a peer group. Once the effects of the various social and economic characteristics known to influence health status have been removed, a more useful comparison of regions by measures of health status is possible.
A detailed analysis involving the peer groups can be found in the paper ‘The Health of Canada’s Communities’, written by Margot Shields and Stéphane Tremblay of Statistics Canada (2002).
As a result of health region boundary changes as of December 2014, and the availability of 2011 Census and NHS data, it was necessary to update the 2007 Peer Group Classification. In keeping with the original working paper, the goal was to produce a classification which would cluster health regions with similar social and economic health determinants into peer groups. Twenty-four variables covering a wide range of social, economic and demographic areas were used to cluster the health regions.
Starting with an initial set of 16 clusters, and ensuring that each cluster contained at least two health regions, the results indicated that the regions naturally grouped themselves into 11 distinct peer groups. Peer groups with fewer than five health regions were combined with their nearest neighbour. This was done to provide enough health regions within a peer group for comparison purposes. Cluster G containing Montréal, Toronto and Vancouver, and cluster I (health regions 6101, 6001, 4835 and 5953) were not forced to join another cluster as these health regions tend to have more in common with themselves than with other health regions. The final result was nine peer groups ranging in size from three to 31 (not including the LHINs in Ontario).
Stepwise discriminant analysis was used to determine which variables had the most influence on the final peer groups. The five most important variables were population density, proportion of visible minority, proportion of Aboriginals, employment rate and the ratio of males to females in the population. Each peer group has at least one distinguishing factor in terms of these five variables.
Peer groups can be useful in an analysis of health-related indicators since once the effects of the various social and economic characteristics known to influence health status have been removed, a more useful comparison of regions is possible. Health indicators can be compared between and within peer groups. As well, the peer groups offer an alternative to the provinces when the results of an analysis cannot be presented at the health region level due to insufficient sample size or high sampling variability.
Andberg, M. R. 1973. Cluster Analysis for Applications. New York: Academic Press.
Everitt, B. S., Landau S., Leese M. and Stahl, D. 2001. Cluster Analysis, 5th Edition. John Wiley and Sons, Ltd: Chichester, UK.
Johnson, R. and Wicheren, D. 2002. Applied Multivariate Statistical Analysis. Prentice Hall, Canada.
SAS Institute Inc. 2003. SAS OnlineDoc®, Version 9. Cary, NC: SAS Institute Inc.
Shields, M and Tremblay, S. 2002, “The Health of Canada’s Communities.” Health Reports (Statistics Canada), Catalogue Number 82-003-XIE.
Appendix A: Variable Definitions
Appendix B: Descriptive Statistics for Final Peer Groups (Excluding LHINs)
Appendix C: Descriptive summary of Final Peer Groups
Appendix D: Health Region Peer Groups (Ontario by Health Unit)
Appendix E: Health Region Peer Groups (Ontario by Local Health Integration Network)