Statistics Canada - Government of Canada
Accessibility: General informationSkip all menus and go to content.Home - Statistics Canada logo Skip main menu and go to secondary menu. Français 1 of 5 Contact Us 2 of 5 Help 3 of 5 Search the website 4 of 5 Canada Site 5 of 5
Skip secondary menu and go to the module menu. The Daily 1 of 7
Census 2 of 7
Canadian Statistics 3 of 7 Community Profiles 4 of 7 Our Products and Services 5 of 7 Home 6 of 7
Other Links 7 of 7

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Skip module menu and go to content.


Description of variables
Construction of the prior contacts variable


Conceptually, the unit of analysis in the research is a police decision concerning the disposition of one apprehended young person in one incident4. Operationally, this police decision constitutes one record in the UCR2 Accused file. The UCR2 Accused file records data for persons who are classified as “chargeable” in an incident; i.e. “any person who has been identified by police as being involved in a criminal incident and against whom an information could be laid as a result of sufficient evidence/ information” (Canadian Centre for Justice Statistics, 2002: 74)5. Thus, the population of the study is a population of police decisions, each operationalized by variables in one record in the UCR2 Accused file. For consistency with the interview data, which were collected in 2002, UCR2 data for incidents occurring in 2001 were used. These were for incidents occurring in 2001. If the same person was involved in more than one incident in 2001, his or her last incident in the year was selected, so that each person contributed only one case to the analysis.

Not all police services in Canada could be included in the study. Since the prior contacts variable was constructed by examining UCR2 data for 1995-2001 (see below), only those police services which consistently reported to the UCR2 between 1995 and 2001 could be included. This subset of police services is generally selected when UCR2 data are used to study trends over time; thus, it is known as the UCR2 Trend Database. The largest municipal police service in the UCR2 – Toronto Police Service – was also omitted from the study, because TPS reported very few Accused records to the UCR2 during 1995-2001 for apprehended youth who were not charged: in other words, the data from TPS showed that nearly 100% of youth who were chargeable were charged6. Thus, UCR2 data on the TPS could not be used for the analysis of differences between incidents resulting in charges and those resulting in informal action.

The resulting population of 38,727 decisions came from 186 police services (independent municipal services or detachments of provincial police) in six provinces: New Brunswick, Quebec, Ontario, Saskatchewan, Alberta, and British Columbia.

Description of variables

The dependent variable was: whether the apprehended youth was charged or processed otherwise (i.e. by informal action or pre-charge diversion). This variable is in the Accused (“Charged suspect – Chargeable”) file of the UCR2 Survey7.

The main independent variables were determined by a review of the literature and by the availability of reliable data in the UCR2 Survey. They include: the number of prior contacts with police; the seriousness of the current alleged offence, indicated by the Criminal Code classification of the most serious alleged offence, the degree of harm done to a victim, and the presence of a weapon; the age, sex,8 and aboriginal status of the youth;9 whether the alleged offence was committed alone or with accomplices; any relationship between the accused youth and a victim; whether the youth and a victim were living together; and whether there was evidence that the youth had recently consumed alcohol or drugs.

Several possible influential factors were not included because they are not available within the UCR2. The major omitted variables which have been found by previous research to play a role in police decision-making are: the youth's "demeanour", victim preference as to the disposition, parental involvement in interactions between police and the youth, the living situation of the youth, the youth's school and/or employment situation, and whether the youth is affiliated with a gang. The youth's "demeanour", or attitude and behaviour in his or her interactions with police, may be particularly influential in the decision whether to process the youth informally, because, under the YOA, a youth is not eligible for Alternative Measures if s/he does not accept responsibility for the alleged offence, or if s/he does not "fully and freely consent" to participate. In the larger study of which this research was a component, the impact of these factors was assessed by interviewing police officers (Carrington and Schulenberg, 2003).

When possibly influential factors are omitted from a statistical analysis, there is a risk of drawing spurious causal conclusions. That is, a factor which is included in the analysis may appear to have more impact than it does, because the impact of a related, omitted factor, has not been controlled. Any correlational statistical analysis suffers from the limitation that it is never possible to collect data on, and control for, all possible influential factors - or even to know what they may be. Therefore, the conclusions from such analyses are always subject to modification on the basis of future research.

Construction of the prior contacts variable

Special programming work was required in order to create the prior contacts variable, since it is not routinely captured by the UCR2. The procedure involved examining all UCR2 records for 1995-2001 for the selected subset of police services, and matching records of previous incidents pertaining to youths who were apprehended in 2001. Each record of a previous incident (including earlier contacts in 2001) constituted one prior contact. Prior contacts which occurred before 1995 could not be captured, since relatively few police services reported to the UCR2 before 1995. However, this was not judged to be a major omission, since the impact of prior contacts is generally believed to be related to their recency. This would be particularly true of young persons, who are the subject of the present research. Their ages at the time of the incidents in 2001 ranged from 12 to 17 years, so their histories of prior contacts which were captured by searching back to 1995 would go as far back as the ages of 6 to 11 years10.

Matching of records for the same person was not straightforward, since there is no unique person identifier in the UCR2. Matching must be done using the person’s name, date of birth, and sex. This raises the issue of false positives. Different people have the same name, date of birth and sex. Furthermore, the accused person’s name is not recorded as such in the UCR2 – it is encoded in a 4-character SOUNDEX code, which is not unique; i.e. many names are encoded with the same SOUNDEX. Thus, matching on the SOUNDEX code, date of birth and sex could result in many false positive matches; i.e. many records for different people would be erroneously treated as prior contacts of a single person. The result would be an underestimate of the number of unique persons and an overestimate of the numbers of their prior contacts.

This is not necessarily as great a problem in the present research as it might be in other types of research. The present study is not concerned with distributions of prior contacts in themselves, but in their correlation with the probability of being charged, and other variables. In general, errors in measurement of variables (such as overestimates of prior contacts) result in attenuation of correlations, so the result of such error would be a small underestimate of the impact of prior contacts on police dispositions, and a small overestimate of the impact of other related variables, such as the youth’s age.

Methodologists at Statistics Canada conducted an analysis of the probability of false positive matches by determining the rate of occurrence of each SOUNDEX code in the populations of the provinces of Canada, using electronic telephone directories. This enabled them to establish, for each SOUNDEX code, the expected rate of false positives, when it was used for matching in combination with birth date and sex. SOUNDEX codes vary greatly in their vulnerability to false positive matches, since the names which are encoded by some SOUNDEX codes are very common, and others are not.

The probability of false positives is directly related to the number of records which one is matching, which is approximately proportional to the population of the geographical area, and the number of years, within which matching is being done. There would be many false positives if records for many years for all of Canada were being matched, and few or none if records were matched for only a few years within one town. Thus, in a study such as the present one, where the number of years of matching is fixed (1995 to 2001), the “match quality” or “match efficiency” (i.e. non-vulnerability to false positives) of SOUNDEX codes is related both to the commonness of the names which they encode, and to the population of the area within which matching is being done. Methodologists provided assessments of match quality within:

  • entire provinces (actually, the parts of the province policed by respondents to the UCR2 Trend Database);
  • the groups of police services working in a Census Metropolitan Area (CMA);
  • the jurisdictions of individual police services outside CMA’s (since there was no obvious principle with which to group non-CMA police services); and
  • all police services (in the Trend Database) in a province but outside CMA’s.

On the basis of this quality analysis, four categories of SOUNDEX codes were defined:

  • 0 – SOUNDEX is rare enough that it can be used in province-wide matching, except in Ontario and Quebec (99% or better match efficiency rate).
  • 1 – SOUNDEX is rare enough that it can be used in analysis within a given CMA or individual police service (95% – 99% match efficiency rate).
  • 2 – SOUNDEX is common enough that it should be used with caution in analysis within a given CMA or individual police service (90% - 95% match efficiency rate).
  • 3 – SOUNDEX is too common to be used for analysis – this will result in too many false matches (less than 90% match efficiency rate).

“Match efficiency” refers to the absence of false positives; e.g. 99% match efficiency means that 1% of matches are expected to be false positives, and “99% or better” means that 1% or fewer false positives are expected.

Using 95% match efficiency as a criterion of acceptability, all records with SOUNDEX codes with a quality code of 2 or 3 were omitted (with the exception of Montreal, discussed below). As most jurisdictions have small enough populations that there are very few or no SOUNDEX codes with quality codes of 2 or 3, the impact of this exclusion was minimal. The only jurisdictions included in the study with more than 1% of records with SOUNDEX codes of 2 or 3 are Montreal (28.4%), Quebec City (2.2%), Calgary (1.3%) and Edmonton (3.5%). In the case of Montreal, records with a SOUNDEX quality code of 2 were not omitted, because of the large number of such records. In order to assess the impact of including SOUNDEX codes with quality code 2, the mean number of contacts with police was calculated for the selected population of young persons in Montreal, grouped according to their SOUNDEX quality code. The underlying hypothesis is that false positive matches will result in inflated numbers of contacts in a “person’s” career. Mean numbers of police contacts for persons with SOUNDEX quality codes of 0, 1, and 2 were 2.24, 2.20, and 2.20 respectively. Persons with a SOUNDEX quality code of 2 had slightly fewer contacts than those with SOUNDEX quality code of 0, contrary to the hypothesis. Therefore, it was concluded that it was appropriate to include them in the analysis.

The population of areas of New Brunswick reporting to the UCR2 is small enough that matching could be done with all police services treated as one unit, for SOUNDEX quality codes of 0 and 1. For Saskatchewan, Alberta, and British Columbia, matching was done with all police services in one province treated as a unit for SOUNDEX codes with a quality code of 0. For SOUNDEX codes with a quality code of 1, matching was done within CMA’s. For Ontario and Quebec, matching was done within CMA or individual non-CMA police service for SOUNDEX codes with quality codes of 0 and 1. This resulted in a population of 38,727 young persons chargeable in 2001, and the same number of police dispositions involving them. These young persons had an average of 2.9 contacts, including the current one; or 1.9 prior contacts. The results of three other plausible but less conservative sets of matching criteria were also examined, which produced very similar results, ranging from 38,369 to 38,411 unique youths, and an average number of contacts (in all three cases) of 3.0. Thus, for this study, the results of matching were robust even when less stringent matching criteria were used.

Although the number of prior contacts of youths in the population ranged from 0 to 261, the great majority (96%) had 10 or fewer, and most (90%) had 5 or fewer. In assessing the relationship between the number of prior contacts and the police disposition, no significant information was lost by recoding the number of prior contacts as 0, 1, 2, 3-4, and 5 or more.


The police disposition (charged vs. processed otherwise) was cross-tabulated separately with each of the following independent variables in order to assess the strength of the association of each variable with the indicator of police discretion:

  • the type of offence, indicated by the Criminal Code classification;
  • the level of injury suffered by a victim;
  • the presence of a weapon;
  • the number of prior contacts of the youth with police;
  • the age of the youth;
  • the sex of the youth;
  • whether the youth was an aboriginal;
  • whether the youth was apprehended alone or with other persons;
  • the type of relationship, if any, between the youth and a victim;
  • whether the youth and a victim were living together; and
  • whether there was evidence that the youth had recently consumed alcohol or drugs.

The two latter variables were omitted from further analysis, since they were not significantly related to the police disposition.

In order to assess the impact of the independent variables while controlling for related factors, all independent variables were entered simultaneously into a multiple regression analysis with the police disposition (charged vs. processed otherwise) as the dependent variable11. Incidents involving certain offences were omitted from this analysis, because there were too few youths in the “not charged” group for reliable statistical analysis (see Table 1). Also, a few youth in each category were excluded because, according to the “clearance status” variable in the UCR2 Survey, the reason why they were not charged was not police discretion but some other factor beyond the control of police, such as the disappearance or death of the apprehended youth.

Two statistics were calculated in the multiple regression:

  • The adjusted percentage of youth who were charged, for each category of the independent variable: this is the percentage of youth who “would have been charged if everything about all the alleged offences and offenders were identical, except for variations in this variable”. This statistic indicates the impact of the independent variable in individual incidents, while controlling for all other variables.
  • Partial eta squared: this is an estimate of the amount of variation in all the police dispositions which is accounted for when all other variables are controlled, i.e. its overall impact on the population of police decisions.


  1. Under the Young Offenders Act, a young person was defined as a person who had reached his or her 12th birthday but had not yet reached the 18th birthday, on the date of the alleged offence.
  2. In this report, the terms “apprehended” and “chargeable” are used interchangeably.
  3. This under-reporting of youth not charged was apparently due to technical problems in the recording and reporting process, which were addressed in 1999. Data for subsequent years appear to report more accurately numbers of youth not charged.
  4. In New Brunswick, Quebec, and British Columbia, it is the Crown which makes the decision concerning charging, following submission of a recommendation by police. For New Brunswick and Quebec (Sûreté du Québec only), persons are coded as “charged” in the UCR Survey if the Crown approves the recommendation to charge. In the rest of Quebec and British Columbia, persons are coded in the UCR Survey as charged if police have recommended charging, regardless of the Crown decision (Canadian Centre for Justice Statistics, 2002: 73).
  5. Criminological theories of the response of police to “male” and “female” suspects (e.g. the “chivalry” hypothesis) refer implicitly or explicitly to police stereotypes of (socially defined) gender roles, not to (biological) sex. However, police records and the UCR Survey record the biological sex, not the gender role, of the apprehended person. Therefore, the present research is restricted to analysis of the impact of the sex of the accused.
  6. This field is not reported to the UCR2 by many police services; therefore a large proportion of apprehended youth are coded as “unknown” for this variable. See footnote 2.
  7. However, the reporting of alleged offences by children aged 11 or younger (who legally cannot be charged with criminal offences) is not consistent across respondents to the UCR2 nor is it consistent over the time period covered.
  8. The conventional approach to multivariate analysis with a dichotomous dependent variable and a set of discrete independent variables is the discrete logit or probit model. In this case, the ordinary least squares regression model was preferred because it can estimate adjusted means (see below), the differences among which provide a simple and intuitive estimate of the impact of each independent variable. These differences can also be compared with the differences among the unadjusted (simple) means, to assess the impact of introducing control variables. Although the parameter estimates produced by OLS regression with a dichotomous dependent variable are unbiased (Long, 1997: 38-39), they are not the most efficient, i.e. they have inflated standard errors, and therefore the associated confidence intervals and significance tests are inaccurate. This was not an issue in the present research, for three reasons: (i) the data are not a random sample, but a subset of a population, so the issue of generalizing to a population does not arise, and “significance tests” are consequently not reported; (ii) the number of observations (30,812) is so large that all differences of any magnitude would be “significant” if significance levels were calculated, and (iii) despite the theoretical inferiority in this situation of OLS regression to logit or probit models, simulation research has found that all three types of models produce equivalent results when the split on the dependent variable is not extreme (i.e. skewed)(Judge et al. 1985: 768; Hanushek & Jackson 1977: 209-210) – as in the present case, where 52% of the cases resulted in charges, and 48% did not.

Home | Search | Contact Us | Français Top of page
Date modified: 2004-09-14 Important Notices
Crime and Justice Research Paper Series Online catalogue - Prior police contacts and police discretion with apprehended youth Main page Background Findings Data tables Methodology Bibliography More information Previous issues of the Crime and Justice Research Paper Series PDF version