Methodology

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Data sources
Definition of neighbourhood and its impact
Population at risk
Modeling techniques

This section explains some of the methodological considerations of the spatial analysis of crime data. It is not intended to provide a detailed description of the methodology used in each of the series' studies, but rather to provide an overview of the analysis and the data sources used.

Data sources

Neighbourhood characteristics data

Data on all demographic and socio-economic characteristics and on some functional characteristics were drawn from the 2001 Census. The detailed socio-economic data used in this analytical series were derived from the long form of the Census, which is completed by a 20% sample of households. These data exclude the institutional population, that is, individuals living in hospitals, nursing homes, prisons and other institutions. To ensure data confidentiality and reliability, Statistics Canada requires that when using income data, the population of any Canadian geographic area being considered must total at least 250 people living in at least 40 private households. A detailed description of all characteristics (variables) and the coverage is provided in each of the studies.

The cities of Winnipeg, Regina, Montréal, Halifax and Thunder Bay provided data on zoning or land use, based on availability. Some of these data were combined in larger categories and then presented at the neighbourhood level to obtain proportions of broad land use categories that were relatively comparable from city to city.

Crime data

The highly confidential nature of geocoded crime data could limit sharing and dissemination. For many years, Canadian police forces have sent the Canadian Centre for Justice Statistics (CCJS) information on criminal incidents, accused and victims as part of the Uniform Crime Reporting (UCR) Survey for statistical purposes. It was in this well-established relationship of trust that various police forces provided to the CCJS the addresses (or geographic coordinates as the case may be) of selected incidents. The CCJS would again like to thank those persons within the participating police forces who submitted the data files needed for the spatial analyses.

This analytical series focused on a selected set of criminal offences, including most Criminal Code offences and all offences under the Controlled Drug and Substances Act, but excluded offences under other federal statutes, provincial statutes and municipal by-laws. Also excluded were Criminal Code offences for which there was either no expected pattern of spatial distribution or a lack of information about the actual location of the offence. For example, administrative offences such as bail violations, failure to appear and breaches of probation are typically reported at court locations; threatening or harassing phone calls are often reported at the receiving end of the call; and impaired driving offences are more likely to be related to the location of apprehension (for example, apprehension resulting from roadside stop programs).

For this spatial analysis series, police forces sent the CCJS the addresses or geographic coordinates (X and Y) of incidents selected, reported and entered in the Incident-Based UCR database. This information was resolved by the CCJS into a set of geographical coordinates (X and Y) for each address. These coordinates were rolled up to the mid-point of a block-face in the case of specific addresses, and to intersection points in the case of streets, parks and subway stations. For all of the cities examined, the geocoding exercise was successful for more than 92% of the data selected. Incidents that failed geocoding contained information that was too vague, such as a bus number or the Trans-Canada registration. The low percentage of incidents that failed geocoding did not create a bias in offence trends. In fact, geocoded offences and offences prior to geocoding both accounted for the same proportion of overall crime.

Definition of neighbourhood and its impact

Ecological studies such as those conducted in crime-mapping projects require a sufficiently large number of geographic units, or neighbourhoods, for the data modeling to be effective and reliable. The boundaries selected for neighbourhoods can also impact our understanding of the distribution of their characteristics. Few Canadian studies have looked at the changes associated with the various levels of aggregation used in multivariate analyses.

However, there are three elements to be used in guiding the scientific approach to choosing the aggregation level of ecological studies (Messner and Anselin 2004). The first is the relevance of the scale used with respect to the identified objectives. The objectives of this analytical series were to identify patterns of spatial distribution of crime in cities and to target the needs of neighbourhoods most at risk in order to develop crime prevention strategies. Such strategies would have implications for social development at the federal level, but would be implemented at the local level. Using predefined administrative geographic areas made it possible to add layers of additional information (health, education, economic factors, etc.) for an integrated approach to prevention in neighbourhoods with many risk factors. In cases where the research focuses on structural conditions, using predefined administrative geographic areas is a valid approach.

It must be recognized that the relationship between variables measured at the level of administrative geographic areas in an ecological study does not necessarily represent the relationship that exists at the individual level. Considerable caution must be used when making inferences or generalizations about individuals based on ecological studies. Such inferences are acceptable only in rare situations of homogeneity. In other words, the crime rate measured at the neighbourhood level, and its correlates, do not necessarily correspond to the delinquency rate of the residents in these neighbourhoods.

A second element to consider relates to the instability of the variance in crime rates of small geographic units. When all geographic units do not have the same population, the variance in crime rates is not constant and is inversely related to the size of the population (larger populations produce more accurate estimates) (Messner and Anselin 2004). This instability may impact the interpretation of results. However, this problem is more likely to occur when trying to analyse rare events, such as homicides.

A third consideration is the possible interpretation of spatial autocorrelation. Are we really seeing spatial autocorrelation of crime or is it simply a bias caused by the level of aggregation (Messner and Anselin 2004)? In this analytical series, the kernel density distributions compared with the distribution of crime rates at the neighbourhood level showed similar trends in all areas studied. Thus, finding spatial autocorrelation in the cities examined might be the result of a process of spillage or contamination, or of an external variable that was not included in the analysis.

Population at risk

Normally, crime rates are calculated by examining the distribution of incidents in relation to the residential population of a given area. This method offers good results at the municipal, provincial and national levels but it presents some problems when the spatial units of interest, such as neighbourhoods in the city centre, contain small residential populations and large transitory populations. Rates based on residential population alone will artificially inflate the crime rates in these urban core neighbourhoods, since the total population at risk in these areas has not been taken into account.

Rates based on the combined resident and employee populations (population at risk) more closely approximate the total number of people at risk of experiencing crime, that is, the potential target population. These rates are not only better suited to measuring the distribution of violent crimes, which produce victims, but they can also be a better measure of property crimes since the number of residents and workers offers a more accurate estimate of the number of dwellings and businesses that may be the target of property crimes.

Modeling techniques

In this analytical series, ordinary least squares regression was used to examine the distribution of violent and property crime rates as a function of the set of explanatory factors. The use of this method requires continuous or quantitative outcome variables that have a normal distribution. Since several of the variables studied in this analysis do not have normal distributions, it was necessary to standardize the crime variables. Most of the variables, or neighbourhood characteristics, were also adjusted to present a normal distribution. All variables and the related standardization techniques are described in the methodology section of each of the studies.

The regression models were developed using the stepwise regression method. This method consists of a series of multiple regressions in which the variable that accounts for the maximum remaining variance is added at each stage. Any superfluous variables are eliminated at each stage.

The standardized regression coefficients provide a means of assessing the relative importance of the different independent variables in the multiple regression models. The coefficients indicate the expected change, in standard deviation units, of the dependent variable per one standard deviation unit increase in the independent variable, after controlling for the other variables. The maximum possible values are +1 and -1, with coefficient values closer to 0 indicating a weaker contribution to the explanation of the dependent variable.

Many of the neighbourhood characteristics are highly correlated with each other, which means that they convey essentially the same information (correlation matrices are available in the appended documents). This multicollinearity stems from the strong association among many structural factors that are individually linked to crime (Land et al. 1990). To take into account this multicollinearity, which may distort the results of the models, this series made use of variance inflation factors (VIF) to measure the multicollinearity between all of the independent variables in the regression models. A VIF over 10 is indicative of potential multicollinearity problems in a regression model (Montgomery et al. 2001) and, as a result, any variable with a VIF of 5 or above was removed from the final models.

Another aspect that must be taken into account in modeling georeferenced data is spatial autocorrelation.1 Data measured over a two-dimensional study area, such as geocoded criminal incidents, are often affected by the properties of their location. If adjacent observations are subject to the same location properties, the observations will not be independent of one another. This lack of independence must be accounted for in the data analysis to produce accurate and unbiased results.

When autocorrelation is present in the residuals of a regression model, modeling the relationships between neighbourhoods must take their relative geographic position into account. Use of a spatial autoregressive model is therefore required in these situations.

A definition of what constitutes neighbouring locations is also required. In this analytical series, a contiguity structure that included all common borders or vertices that touch between the boundaries of the neighbourhoods is used to define regions as neighbours of each other. The neighbourhood structure defines which locations have a potential influence on each other, the neighbours, and rules out any potential influence of regions that are not considered to be neighbours. The neighbourhood structure is used to test for spatial autocorrelation and to specify the spatial component in the autoregressive spatial model.

Note

  1. A more detailed examination of spatial autocorrelation is provided by Krista Collins in Charron, Mathieu. 2008. "Neighbourhood characteristics and the distribution of crime in Saskatoon." Crime and Justice Research Paper Series. Statistics Canada Catalogue no. 85-561-XIE, no. 12.