Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
The statistical method chosen for this study was time series analysis. This method has been employed by a number of other researchers to investigate the link between specific crime types, economic and demographic change (LaFree, 1999; LaFree et al, 1992; Cohen & Land, 1987). Time series analysis is used in this report because variables in the analysis are time-ordered (for example, trends in crime rates and unemployment rates). This method has the advantage of being able to take account of what happened in the preceding year or two as well as to make current year comparisons among variables. This allows the flexibility to address important questions, such as whether unemployment rates are correlated with crime rates in the current year or after a lag of a year or two.
A primary limitation of time series analysis is that it can include only those factors that have been measured and recorded annually over many time points. This necessarily eliminates the primary source providing statistical data on characteristics of the population, the Census, which is conducted every five years. There are some important exceptions, such as age and sex of the population, which are available annually through intercensal estimates. Table 5 lists the wide range of socio-demographic and economic indicators considered for this analysis, as well as the source and the time period for which they are available.
It is also important for time series analysis that all variables are available in an identical and lengthy time period. Unfortunately, potentially important measures such as lone parent families and a number of economic indicators, including low-inc ome, the percentage of families receiving employment insurance and social assistance are available only from 1980 onward. As a longer time series was needed to yield reliable estimates, analysis was restricted to those variables available for the years relatively consistent with UCR crime rates: 1962 to 2003. These variables are marked in Table 5 with an asterisk. This longer time series provides the maximum number of degrees of freedom and hence more robust time series models.
The primary source of information on crime trends in Canada is Statistics Canada's Uniform Crime Reporting (UCR) Survey. Since 1962, all police departments across the country have supplied the following summary statistics to the UCR Survey:
The predictor variables in this analysis were selected on the basis of their relevance to the criminological explanations for crime summarized earlier in this report, as well as their availability in a time series to 1962. These include the age structure of the population, unemployment, inflation and per capita alcohol consumption.
Age structure of the population
The age composition of the population is one of the most prominent explanations for changes in crime rates. To test the relationship between crime patterns and age, the percentage of the population 15-24 and 25-34 years of age will be included in this analysis. Data for age are derived from the average of quarterly population estimates and the estimates of population in certain age groups used by the Labour Force Survey (LFS uses the estimates obtained from the Census). These estimates are adjusted for any under coverage and population growth.
Unemployment rates are derived from the Labor Force Survey. This survey covers approximately 98% of the population and excludes residents of the Yukon , Northwest Territories and Nunavut, persons living on Indian Reserves, full-time members of the Canadian Armed Forces and inmates of institutions. Unemployed persons are defined as those persons who were available for work and were either on temporary lay off, had looked for work in the past 4 weeks or had a job to start within the next 4 weeks. The unemployment rate excludes discouraged workers who are available to work but are no longer actively seeking employment.
Inflation is derived from the Consumer Price Index (CPI) and is simply the year-over-year difference in the CPI expressed as a percentage of the previous year. Inflation occurs when there is an upward movement in the average level of prices.
Per capita levels of alcohol consumption
Per capita levels of alcohol consumption is based on disappearance of alcohol in Canada (expressed in litres) divided by the total population. Alcohol disappearance is derived from the Control and Sale of Alcoholic Beverages in Canada (Public Institutions Division). In the absence of long term data identifying drinking patterns among Canadian adults, alcohol disappearance is used as a proxy for alcohol consumption; consumption being defined as disappearance minus wastage. In the case of alcohol, the average annual wastage is quite low (3.5%) especially when compared to other food categories (e.g. the average annual wastage for fruits and vegetables is approximately 40%).
In this analysis we are exploring the extent to which changes over time in the dependent variable, the crime rate, can be explained by changes in independent variables, a selection of socio-economic indicators. These socio-economic indicators may, however, move in a similar way to the crime rate over time, but have no causal relationship with the crime rate. As a result, modeling with the variables as they are, using either simple correlation analysis or multiple regression techniques, could lead to a false conclusion that a causal relationship exists, when, in reality, there is none.
In technical terms, this problem exists because the crime rate and other socio-economic indicators to be included in the model are not stationary in the mean (average) or its variance over time. What this means is if, over the whole time series from 1962 to 2002, one took repeated samples of shorter time series for each variable, the variable's mean and its variance would be different across the samples.
What is a logarithm?
Taking the logarithm of a variable is a common technique used on variables with a large range and a high variability among values. The log re-scales the values of the variable to help improve the statistical properties of the variable and therefore the properties of the time series results.
What is a correlation?
A correlation measures the linear relationship between two variables measured over a series of paired observations (in this case, years). Values of a correlation range from -1 to +1. A value of +1 indicates a perfect positive relationship (e.g. the variables move in the same direction) whereas a value of -1 indicates a perfect negative relationship (e.g. the variables move in opposite directions). A value of 0 indicates no linear relationship.
What is time series analysis?
A key statistic in time series analysis is the autocorrelation coefficient, which is the correlation of the time series with itself, lagged by 1 or more periods. The autocorrelation coefficient indicates how values of the variable in question relate to each other at zero lag, lag 1, lag 2, etc. Autocorrelation within the data means that some of the variance in the current value is explained by the history of the variable. For example, unemployment in 2004 is partially explained by unemployment in 2003, all things being equal. For this analysis, lag 1 refers to the past year, lag 6 refers to 6 years in the past, etc.
What is an ARIMA model?
ARIMA models are Autoregressive Integrated Moving Average models, a general model widely used in time series analysis. The technique is premised on investigation of the prior behaviour of a series and is also used to adjust for seasonality. ARIMA models are particularly beneficial if one is interested in forecasting future values to calculate new values of the series and confidence intervals for those predicted values. The estimation and forecasting process is performed on transformed (differenced) data and then the series needs to be integrated (integration is the inverse of differencing) so that the forecasts are expressed in values compatible with the input data. The integration feature givesthe order of differencing needed to achieve stationarity.
The first step in time-series modeling is to transform the variables to be included in the model in a way that reduces the risk of spurious or false correlations by creating a stationary mean and variance. Taking the logarithm of a variable is a common technique to transform variables to achieve this goal, particularly when the variables have a large range and a high variability among the values.
For these time-series models, the log of each variable to be included in the models was calculated and then the growth rate in the log values was calculated, resulting in a transformed data series for each variable. The only exception to this was for the rate of inflation, because the variable itself is a growth rate. For this variable, the log value was sufficient.
Using the transformed variables, bivariate or one-on-one models were constructed to determine which independent variables had a statistically significant relationship with the crime rate. Multivariate models were then constructed by testing different combinations of independent variables that had been significant in the bivariate models.
In any modeling exercise it is usually not possible to include within the model all of the variables that would be important to explaining why changes in the dependent variable, in this case the crime rate, have occurred. Error in the models that result from missing important variables is referred to as the "residual". While it is rare for models to eliminate error, to accurately interpret how significant the variables included in the model are to explaining changes in the crime rate over time, that is to avoid false or spurious results, it is important that this error or "residual" be random or white noise1.
In time series models, autocorrelation coefficients are key statistics that measure whether the dependent variable, the crime rate, is correlated with itself, last years crime rate (lag of 1 period), the crime rate six years ago (lag6), or twelve years ago (lag12) etc. Autocorrelation within the data means that some of the variance in the dependent variable, the crime rate, is explained by the history of the crime rate itself. The presence of autocorrelation in the models results in residuals that are not random but that have a pattern to them. Lag variables, that is the crime rate six years ago, 12 years ago, 18 years ago and 24 years ago are included in the time series models, in order to test for the presence of autocorrelation. Models where these lag variables are statistically insignificant pass the "white noise test", that is the residuals in the models are random or white noise.
Further, as the fit of the models was to be further tested by examining the models' ability to predict observed crime rates in 2002 and 2003, the residuals themselves could contain information that tell us something about the movement in other important variables that are missing from the models. This information can help us to develop better models to predict future crime. Moving average (MA) terms were added to the models to capture any information in the residuals over time that could improve the models' predictive ability. As a result, the time series models developed for this analysis are ARIMA models (Autoregressive Integrated Moving Average models).
The results of all models were then compared with three criteria used to determine the models that "best fit" each crime type studied.
Table 6 presents the "best fit" models for each crime type.
1. In the case of time series, errors will themselves constitute a time series. One usually aims for the errors to be devoid of any structure, although they may be correlated. However, if one can extract the correlation in the errors then one ought to be left with a residual series with no correlation (or structure). Such a series is referred to as a white noise.