Methodology1

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Objective and content of the Demosim microsimulation projection model
General functioning of the model
Probabilities associated with simulated events

Objective and content of the Demosim microsimulation projection model

The population projections contained in this report were produced with Demosim, a microsimulation model developed at Statistics Canada with the specific objective of making projections on the ethnocultural diversity of the entire population of Canada according to a detailed geographic structure that includes Canada's thirty-three census metropolitan areas (CMAs) and the rest of the provinces and territories. This objective largely shaped the choices that were made regarding the database that serves as the starting population, the variables contained in the model and the methods, models and data sources that underlie the projections.

The starting point for the projections is the microdata file for the 20% sample of the 2006 census of the population of Canada.2 This database, which includes close to seven million persons with their characteristics, has been adjusted to take account of the net undercoverage in the census according to age, sex and place of residence. These adjustments were made by recomputing the sampling weights associated with each individual in the database. Also, some variables of interest needed for projection but absent or incomplete in the census were imputed into the database. These included individuals' graduation dates, the generation status of the population under 15 years of age and the province or territory of birth for a small portion of the respondents to the 2006 Census.

The variables contained in the initial file can be divided into two major groups. The first consists of variables that were projected with a view to eventual release:

  • Age
  • Sex
  • Place of residence
  • Religion (see Box 1)
  • Visible minority group
  • Immigrant status
  • Generation status
  • Continent/region of birth
  • Mother tongue
  • Highest level of schooling
  • Labour market participation3

The second group consists of so-called support variables, that is, variables that are included in the model only because they serve to increase the quality of the projection for the variables in the first group. Most of the time, these are variables used to predict events simulated by the model. They are the following:

  • Marital status4
  • Province or territory of birth of non-immigrants
  • Year of immigration
  • Age at immigration
  • Aboriginal identity
  • Registered Indian status
  • Number or presence of children in the home
  • Age of youngest child in the home
  • Sex of youngest child in the home
  • Dates on which diplomas were obtained

Box 1. Projections of religious denomination

The question on religion was not asked in the 2006 Census. Therefore, unlike all the other variables, religion was projected starting from data coming from the 2001 Census and then aligned to the results of the main series, which starts from 2006. The alignment was done by age, place of residence, visible minority group and generation status. The model used to project religion starting from 2001 is similar to the main model, although it underwent a few adaptations to take account of the composition of immigration by religion and the differences between religious groups as to their propensity to enter into unions, to form common-law unions, to have children and, to some extent, to migrate. It also includes a module that simulates changes of religion over the person's lifetime.

Sections 1 and 2 of this document describe the methods, assumptions and scenarios in the main projection series, which uses 2006 as a starting point and does not include religion. In these sections it is sometimes noted that religion is taken into consideration in the modelling of a given event, but readers should keep in mind that this was done only in models used to project religion from 2001 data and which, consequently, were specially adapted to the projection of religion.

General functioning of the model

Like any population projection model, Demosim makes the initial population change over time by adding births and immigrants and subtracting deaths and emigrants. Also, as in "traditional" models, the future number of births, deaths, immigrants and emigrants is based on assumptions that can be changed or combined in various scenarios.

However, since it proceeds on the basis of microdata, the functioning of Demosim differs greatly from that of models based on aggregate data.5 As in "traditional" projection models, the method is designed to obtain an estimate of the population of Canada at a future reference date; but it obtains this by simulating one at a time the future of each individual included in the original file. These individuals are therefore likely to "experience," in the course of projection, a number of events, the main ones being the following: birthday, birth of a child, death, migration from one part of Canada to another, emigration, change of education level, change of marital status, change in labour market participation and change of religion (see Box 1). Using a Monte Carlo procedure and the probabilities associated with each event, the model calculates for each person, based on his or her particular characteristics, the probabilities that the person will experience these events as well as the time that will elapse before they occur (waiting time). The event with the shortest waiting time is the one that will occur first. After each event, the probabilities and waiting time are recalculated to take account of the new individual situation. The model accordingly advances the individuals to the end of the projection period, unless they die or emigrate in the meantime. New individuals are also added over time through birth or immigration, after which they are subject, like the rest of the population, to the probabilities of experiencing the events simulated by Demosim.

It should be added that the functioning of Demosim is maintained by Modgen, a programming language specially designed by Statistics Canada's Modelling Division to facilitate the development of microsimulation models. Modgen has been used to develop various microsimulation models, including LifePaths and Pohem.6

Probabilities associated with simulated events

Nor could Demosim function if the various probabilities associated with each event that it simulates were not established in advance. The methods used and the variables selected for calculating the parameters of the model were determined on the basis of data availability and the objectives of the different modules of the model. The rest of this section briefly describes the functioning of the main modules7 of Demosim, summarized in Table 1.

Table 1 Key methods, data sources and variables used for parameters estimates in Demosim

The fertility module was designed in part to take account of the differences in fertility observed in the literature between visible minority groups, religious groups, immigrant groups and other categories of the population.8 Based on 2006 Census data to which the own-children method9 was applied, this module was created in two main stages. In a first stage, a base risk of giving birth to a child was derived from fertility rates by age, number of children and Aboriginal identity. These base rates were aligned by age to vital statistics data for 2006 and 2007 and then, for subsequent years, projected so as to attain targets with respect to the scale and age structure of fertility (see the section on assumptions and scenarios). In a second stage, relative risks, calculated using log-log type logistic regressions carried out on the same database and stratified by age, number of children and Aboriginal identity, were applied to base risks so as to increase or decrease the probability of giving birth according to a number of relevant variables. For non-Aboriginals, the variables used in the models are age, marital status, place of residence, place of birth, period of immigration and generation status, visible minority group, highest level of schooling, an interaction between highest level of schooling and visible minority status and religious denomination (see Box 1). For Aboriginals, these variables are age, Aboriginal identity, registered Indian status, place of residence, marital status and highest level of schooling.

In general, this approach, which distinguishes between base risks and relative risks, has the following two advantages: 1) it lends itself to creating parameters that combine the robustness of a data source such as Vital Statistics with the wealth of variables offered by other sources such as surveys; and 2) it makes it easier to prepare alternative assumptions, which can be obtained by changing base risks only, relative risks only or both.

When a birth occurs in the simulation process, a new record is added to the database and must be assigned at birth a value for each projected characteristic so that new records will have the minimal attributes to enable them to be subject to the probabilities of "experiencing" the events that the model provides for. Most characteristics of newborns are assigned deterministically: children are 0 years of age, not in a union, have no high school diploma, are born in the mother's region of residence, and so forth. Mother tongue, visible minority group and Aboriginal identity are instead assigned probabilistically, using mother-to-child characteristics transition matrices calculated on the basis of 2006 Census data to which the own-children method10 has previously been applied. These matrices include the following variables: a mother tongue is assigned to the child based on the mother's mother tongue, immigrant status and region of residence; the child's visible minority group depends on that of the mother and her immigrant status; and Aboriginal identity is assigned to the child based on the mother's Aboriginal identity and registered Indian status.

Assigning generation status to newborns is a special case, in that it is necessary to know the father's immigrant status when the mother herself is not an immigrant; in that event, the child is second generation if the father is an immigrant and third generation or more if the father is not an immigrant. Because births are linked only to the mothers in Demosim, the information regarding the father's immigrant status was "registered" along with the mother's marital status (which indicates whether or not her spouse has the same immigrant status or, in other words, whether or not the union is mixed). This makes it possible to assign newborns' generation status correctly and directly, based solely on their mother's characteristics.11

The mortality module was designed to reflect the secular decline of mortality in Canada along with the differences that separate, in this regard, the various population groups for which the projection is made.12 As in the case of fertility, the method used for doing this entails two stages. In a first stage, a base risk of dying was calculated according to age and sex on the basis of mortality rates projected by means of a variant of the Lee-Carter model applied to Canadian vital statistics data from 1981 to 2006.13 In a second stage, relative risks of dying according to place of residence, immigrant status, period of immigration, visible minority group, Aboriginal identity, highest level of schooling, age and sex were obtained from a proportional hazards regression model stratified by age group applied to a longitudinal database on mortality follow-up.14 These relative risks serve to increase or reduce, as the case may be, the basic risks obtained from the projected rates by age and sex.

The functioning of the immigration module, central to the future ethnocultural composition of the population, assumes, firstly, that a number of newcomers is determined for each year of the projection period. This number, which is set outside the model, can be changed to create alternative assumptions regarding the volume of immigration. Next, each new immigrant must be assigned a value for each of the projected characteristics, which is done using a donor imputation method. Donors are selected in the micro database for the 2006 Census from among persons who report having recently immigrated to Canada. The model is then "forced" to accommodate a distribution of immigrants by country of birth, which is produced on the database of Citizenship and Immigration Canada (see the section on assumptions and scenarios).15 Thus, alternative assumptions can also be created on the composition of immigration.

The emigration module was developed according to the same principle as the fertility and mortality modules, namely by distinguishing between base risks and relative risks, notably taking account of the immigrants' greater propensity to emigrate, especially in the first years after their arrival in Canada.16 The base risks were derived from net emigration ratios17 by age and sex, calculated using Statistics Canada annual population estimates. These were then augmented or reduced using the results of a proportional hazards regression which, carried out on the Longitudinal Administrative Database,18 estimates the probability of emigrating according to place of residence, age, being a recent immigrant (settled for 15 years or less) and, for persons in the latter category, place of birth and time elapsed since immigration to Canada.

The internal migration module serves to project changes of residence between the 47 regions in the model, taking account of the various characteristics of inter-regional migrants, namely age, marital status, presence of children, age of youngest child, place of birth, time elapsed since immigration, visible minority group, Aboriginal identity, mother tongue, highest level of schooling, generation status and religion. It draws on Canadian population censuses, which include, apart from the variables of interest, information on individuals' geographic mobility. On this basis, the probabilities of leaving each of the 47 regions were first calculated using log-log logistic regression models including a number of variables suited to the specificities of the regions for which they were estimated. Origin-destination matrices, which take account of age, place of birth, time elapsed since immigration, visible minority group, mother tongue and Aboriginal identity, are then used to distribute the migrants among the other 46 regions. This method can also be used to create alternative assumptions, by estimating the models and matrices for different periods.

To make projections of religious denomination (see Box 1), it was necessary to add a religious mobility module, so as not to underestimate the future number of persons who report having no religion, since this group has seen its numbers grow over time owing to the mobility of individuals who have left their religion and not subsequently reported having another one.19 This module was constructed in the same way as the geographic mobility module. First, the probabilities of migrating from one religion to another-"exit rates," so to speak-were established by age and sex for each of the main religious groups by combining the information drawn from the 2002 Ethnic Diversity Survey (EDS) and a cohort-based analysis of the 1981, 1991 and 2001 censuses.20 The "migrants" were then distributed among the other religions using origin-destination matrices by sex drawn from the Ethnic Diversity Survey.21

Demosim also includes two socioeconomic modules, one modelling changes in highest schooling level and the other modelling labour market participation. The results for these modules are not described here, since they lie outside the framework of this analysis. The education module is made up of probabilities of graduating, which are designed to reflect differences in this regard between the projected ethnocultural groups. They were established as follows. First, probabilities of graduating by age cohort, sex and place of birth were estimated using logistic regression models, applied to data from the 2001 General Social Survey. These probabilities were then projected to 2006 before being calibrated so as to allow exact reproduction of the population distributions by schooling level, age, sex and place of birth, visible minority groups and Aboriginal identity in the 2006 Census.22

Labour market participation is simulated by annually imputing a labour market activity status to each individual. Participation rates used for imputation were derived in two steps. Firstly, participation rates by age, sex, highest level of schooling and province of residence were established by drawing on annual data from the Labour Force Survey. Ratios based on labour market activity contained in the 2006 Census were used, secondly, to increase or decrease, for each combination of age, sex and schooling level, the labour market participation of the population according to visible minority group, immigrant status and immigration period.

Demosim also includes other modules primarily designed to update, in the course of projection, variables that influence other events in the model. Among them, the marital status module stands out in that it greatly improves the projection of births in particular. The function of this module is to assign-i.e., to impute-annually to each individual a marital status according to the results of logistic regression models estimated on the basis of the 2006 Census. Stratified by sex and Aboriginal identity, these models estimate the probability of being in a union and then, among persons in a union, the probability of being married (by opposition to be living in a common-law union), taking account of age, place of residence, visible minority group, mother tongue, presence of children at home, age of youngest child, generation status, education, registered Indian status and religious denomination. The mixed or non-mixed nature of women's union (that is, whether or not they are in a union with spouses with a different immigrant status or registered Indian status) is then modelled using logistic regressions so as to make it possible to assign generation status or registered Indian status to children born over the course of the simulation. Trend parameters were also added to the model, in part to take account of the increase in common-law unions within the Canadian population.

A module for projecting the departure of children from the parental home was also developed in order to update the number of children in the home, an intermediate variable important for the internal migration module; it basically consists of the results of two proportional hazard regression models (one for males and the other for females) estimated with data from the 2006 General Social Survey. The child's age, sex, visible minority status and place of birth as well as the father's or mother's place of birth were covariates in these models.


Notes

  1. Of course, this section is based on the existing documentation on the model, of which it is both an update and an extension. Readers interested in a more detailed description of the Demosim methodology are invited to view the Demosim Methodology Report (to be made available on the Statistics Canada website).
  2. Except for religious denomination, which is projected separately based on the 2001 Census. See Box 1 for more information on this subject.
  3. Although they are part of the simulation model, the results on labour market participation, like those on highest level of schooling, are not presented here because they lie outside the framework established for this analysis.
  4. Including the mixed or non-mixed nature of the union. Two types of mixed unions are possible: with a partner whose immigrant status is different and/or with a partner whose registered Indian status is different. This information is used to assign generation status and registered Indian status to newborns.
  5. See Evert Van Imhoff (1997) for a discussion of the details of microsimulation projection models and Bélanger et al. (2008), op. cit., for a discussion of the previous version of the model.
  6. More information on Modgenis available at the Statistics Canada website: www.statcan.gc.ca/microsimulation/modgen/modgen-eng.htm. Also, Statistics Canada's Modelling Division can be contacted at microsimulation@statcan.gc.ca.
  7. Demosim has one module per simulated event.
  8. On this subject, see Bélanger and Gilbert (2003), McQuillan (2004), Ram (2004) and Caron Malenfant and Bélanger (2006)
  9. This is an indirect method of estimating fertility that considers women living with at least one of their children under one year of age at the time of the census as having given birth during the previous year. Please see Cho et al, 1986, Desplanques, 1993 and Bélanger and Gilbert, 2003 for a description and discussion of this method.
  10. This is basically the same method as was used to develop fertility parameters.
  11. The module for mother-to-child transmission of registered Indian status is largely based on the same principle.
  12. In particular, see Chen, Wilkins and Ng (1996) and Wilkins et al. (2008).
  13. Li, N. and R. Lee. (2005)
  14. This database results from records linkages between the 1991 Census and Canadian vital statistics data from 1991 to 2001. On this subject, see Wilkins et al. (2008).
  15. It should be noted that the model allows the addition of non-permanent residents over time. The module that manages these additions functions similarly to the immigration module, that is, by setting an annual number of new non-permanent residents and then imputing characteristics to them by the use of donors, who in this case are non-permanent residents in the base population.
  16. On this subject, see Aydemir and Robinson (2006) and Michalowski and Tran (2008).
  17. Net emigration is the number of emigrants minus returning emigrants plus net temporary emigration.
  18. This database consists of a longitudinal sample created by matching tax data to the longitudinal database on immigrants.
  19. Readers interested in data on the increase in the number of persons reporting no religion, or more generally in the change over time in the numbers for the major religions in Canada, are invited to consult Canada (2003 (1)).
  20. The Ethnic Diversity Survey (EDS) allows us to compared respondents' religion with that of their mother when they were under 15 years of age. The results of the EDS must therefore be interpreted as measuring both intergenerational mobility (since respondents are compared with their mother) and intragenerational mobility (since a change in religion can take place in one's later years). The age at the time of a change was estimated by means of a cohort-based analysis of data from the 1981 to 2001 censuses, similar to what was used by Guimond (1999) to estimate the ethnic mobility of Aboriginals.
  21. It should be noted that in the model, this module is applied only to non-Aboriginal populations, since Aboriginals are not part of the target population of the Ethnic Diversity Survey. By way of compensation, the results of a mother-to-child religion transmission matrix calculated with 2001 Census data are used to assign stochastically a religion to Aboriginals who are born in the course of simulation.
  22. The modelling of education in Demosim is documented in Spielauer (2009).
Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: