Demosim: An Overview of Methods and Data Sources
by the Demosim team
Report prepared by Éric Caron-Malenfant, Simon Coulombe and Dominic Grenier
Skip to text
This report is the work of the Demosim team, led by Éric Caron-Malenfant. The following are or were members of the Demosim team during the development of the versions of the model based on the 2011 National Household Survey: Éric Caron-Malenfant, Jonathan Chagnon, Simon Coulombe, Patrice Dion, Harry François, Nora Galbraith, Mark Knarr, Stéphanie Langlois, Samuel MacIsaac, Laurent Martel and Jean-Dominique Morency of the Demography Division; Melanie Abeysundera, Dominic Grenier, Chantal Grondin and Soumaya Moussa of the Social Survey Methods Division; Karla Fox of the Statistical Research and Innovation Division; Martin Spielauer of the Social Analysis and Modelling Division; Jean-Pierre Corbeil and René Houle of the Social and Aboriginal Statistics Division.
For the development of this version of Demosim, funding was received from Indigenous and Northern Affairs Canada (INAC), Immigration, Refugees and Citizenship Canada (IRCC) and from Canadian Heritage (PCH). Representatives of these departments were also consulted at different stages, through an interdepartmental working group and an interdepartmental steering committee. It is also important to acknowledge the contribution of the Demosim scientific committee, whose mandate is to make recommendations on the methods, assumptions, data sources and products based on the Demosim model. The committee, chaired by Michael Wolfson (University of Ottawa), is composed of Stewart Clatworthy (Four Directions Project Consultants), David Coleman (Oxford University), Eric Guimond (INAC), Peter Hicks (consultant), Jack Jedwab (Association for Canadian Studies), Don Kerr (University of Western Ontario) and Réjean Lachapelle (consultant).
Thanks to Carol D’Aoust, Daniel Bannatyne and Simon Larose for their technical support in preparing this report. Thanks also to the reviewers of the preliminary versions of this report: Rosalinda Costa, André Cyr, Marc Lachance, Anne Milan and Pierre Turcotte.
Demosim is a Statistics Canada’s microsimulation demographic projection model designed to project the Canadian population according to various ethnocultural characteristics. Created in 2004, the model—then known as Popsim—was used to produce Population Projections of Visible Minority Groups, Canada, Provinces and Regions, 2001 to 2017 (Statistics Canada 2005).Note 1 Its subsequent versions, under the name Demosim, were used to prepare Projections of the Diversity of the Canadian Population, 2006 to 2031 (Statistics Canada 2010), Population Projections by Aboriginal Identity in Canada, 2006 to 2031 (Statistics Canada 2011), Projected trends to 2031 for the Canadian labour force (Martel et al. 2011), Projections of the Aboriginal Population and Households in Canada, 2011 to 2036 (Statistics Canada 2015-2), and a number of analytical and methodological articles,Note 2 among others.
In 2015, Demosim: An Overview of Methods and Data Sources (Statistics Canada 2015-3) was published for the first time to document the version of the model (Demosim 2015) used to prepare Projections of the Aboriginal Population and Households in Canada, 2011 to 2036. These projections were the result of a major method and content redesign of the projection model, which took into account the latest data sources. The release of the projections initiated a new release cycle for prospective products based on the 2011 National Household Survey (NHS), the previous cycles having been based on the 2001 and 2006 censuses.
Since then, a new version of the model (Demosim 2017), also based on the 2011 NHS, has been completed. The new version of Demosim was developed to help prepare two analytical reports, Immigration and Diversity: Projections of the Population of Canada and Its Regions, 2011 and 2036 and Language Projections for Canada, 2011 to 2036. Although consistent with the 2015 version of Demosim, it includes a few additions and modifications that require updating the methodological documentation (see summary of changes in Box 1).
This document provides an overview of how the 2017 version of the Demosim projection model works. It describes the base population, data sources and methods for each component. The assumptions and scenarios used in presenting the results from this version are described, for purposes of interpreting the results, in the analytical reports of which this report is a technical supplement.
Lastly, the description of Demosim in this report should be seen as an extension of the documentation related to previous versions of the model.Note 3 Even though the versions of the model used in this projection cycle have been completely redesigned and updated, the description of Demosim herein has been constructed and developed based on this documentation.
Start of text box 1
Summary of the changes since the 2015 version of Demosim
The key modifications made to Demosim since the 2015 version described in Statistics Canada 2015-3 are as follows:
- A Canadian citizenship variable has been added to the base population. To keep it up to date during projections, a module for the acquisition of Canadian citizenship of immigrants was added to Demosim, and then modifications were made accordingly to the module for assigning characteristics to newborns.
- The internal migration module was modified to take the linguistic variables and the francophone character (or absence thereof) of the destination regions into account in more detail.
- A number of changes have also been made to the modules for linguistic transitions over an individual’s lifetime and for intergenerational language transmission, in particular with respect to selecting the transitions that are modelled for simulation purposes and the explanatory variables included in the models.
End of text box 1
The base population for this version of Demosim is essentially derived from the 2011 National Household Survey (NHS) microdata file,Note 4 a database containing approximately 7.3 million records that represent the Canadian population living in private households on May 10, 2011. The main variables of the base population are as follows:Note 5
- Place of residence: census metropolitan area (CMA), province or territory, Indian reserve, and Inuit Nunangat;
- Aboriginal group;
- Registered Indian status;
- Registration category on the Indian Register (6(1) or 6(2));
- Marital status (including mixed unions);
- Place of birth (province/territory or country/world region);
- Immigrant status and time elapsed since immigration;
- Generation status;
- Immigrant admission category;
- Canadian citizenship;
- Visible minority group;
- Mother tongue;
- Language spoken most often at home;
- Knowledge of official languages;
- Highest level of schooling;
- Head-of-household status;
- Head-of-family status;
- Labour force participation.
Registration category on the Indian Register and immigrant admission category are two variables that were added through file linkages, as they were not included in the NHS database. Registration category, which defines the conditions for children to inherit registered Indian status from their parents, was obtained from a pre-existing linkage between the Indian Register and the 2011 NHS (through the 2011 Census), to identify, among NHS respondents who reported being Registered Indians, those who had status under subsection 6(1) of the Indian Act and those who had status under subsection 6(2).Note 6 In 66% of cases, registration category was determined through file linkage. For the remaining 34%, categories 6(1) and 6(2) were either deterministically imputed using information on the registration of other census family members, or else were imputed using a probabilistic model (logistic regression).Note 7 Immigrant admission category (economic, humanitarian (refugees), family reunification or other) was obtained using pre-existing linkages between the Immigration, Refugees and Citizenship Canada landing files from 1980 to 2011 and the 2011 NHS (through the 2011 Census). The linkage was successful for 82% of those who reported in the NHS being admitted to Canada as immigrants in 1980 or later. For the remaining 18%, the variable was probabilistically imputed using multinomial logistic regression models. For immigrants admitted before 1980, the admission category remains unknown.
Some adjustments were also made to the NHS microdata file so that the Demosim base population reflects the entire Canadian population as closely as possible.
The population living on the 31 Indian reserves that were incompletely enumerated in the 2011 Census and the five additional Indian reserves that did not respond to the NHS were added to the Demosim base population. For 13 of the 31 incompletely enumerated reserves in the census, the NHS questionnaire was administered during a special collection several months later than the planned date, as forest fires prevented the collection from being carried out.Note 8 The records from this special collection were added to the Demosim base population. For the population of the 18 other Indian reserves that were incompletely enumerated in the census, it was assumed that the population was consistent with estimates produced by the Social Survey Methods Division of Statistics Canada. Records were then imputed with characteristics that were representative of similarly sized reserves enumerated in the same province. A similar imputation was performed for the five reserves enumerated in the census but not in the NHS, calibrating their population to the 2011 Census counts by age group and sex.
Lastly, adjustments were made to obtain a population that was representative of the population estimates of May 10, 2011, which included people living in collective dwellings, and which account for net undercoverage in the census. The adjustment for collective dwellings involved multiplying the sampling weights of individuals by the ratio of the 2011 Census population (which includes collective dwellings) to the NHS population by age, sex and place of residence.Note 9 Net undercoverage rates were then applied to the sampling weights by age, sex and place of residence.
All of these adjustments increased the total population by about 1.4 million people. These adjustments had a greater impact on certain population subgroups—notably young adults, who had higher net undercoverage rates, and Registered Indians, because of the adjustment for the incompletely enumerated reserves.Note 10
Demosim’s general functions and features
Demosim is a microsimulation model, which means that it projects individuals in the population one by one, rather than projecting the population on the basis of aggregate data, as is done with cohort-component and multistate models.Note 11 Demosim simulates the life of each person in its base population, as well as the newborns and immigrants who are added to the population during the simulation.Note 12 The individuals advance through time and are subject to the likelihood of ‘experiencing’ various events simulated by the model (for example, the birth of a child, death, a change in education level or change in knowledge of official language) until they die, emigrate or reach the end of the simulation.
The probabilities (or risks) of ‘experiencing’ each event depend on the individual’s characteristics. The probabilities are used to derive waiting times, which—being a function of the probabilities associated with the events, individual characteristics and a random process—correspond to the time that will elapse between the present and the occurrence of each event (see Box 2). The event with the shortest waiting time occurs first. After an event occurs, a new set of waiting times are calculated for the events that depend on the characteristic that has changed; the individual then advances through time to the next event (again, the one with the shortest waiting time), and so on. Since Demosim is a continuous-time model, the various simulated events may occur at any time of the year, although some of them occur on a fixed date (for example, birthdays). As well, some characteristics are imputed annually to the individuals. Events and waiting times are managed using the computer language Modgen,Note 13 in which Demosim is programmed.
This approach is used to yield projections at a level of detail that is not possible using standard models because of their matrix nature. The simultaneous and consistent projection of a large number of variables made possible through microsimulation permits the use of more characteristics, both as determinants of simulated events and for tabulating results. Demosim has also shown flexibility in formulating assumptions and projection scenarios, as well as an ability to reproduce the results of cohort-component models at an aggregate level.Note 14
The use of this method assumes that the probabilities (or risks) associated with simulated events have been calculated in advance. Calculations are done using existing data sources (censuses, surveys and administrative data) to which various methods are applied. The next section describes these methods in more detail.
Start of text box 2
On calculating waiting times, as well as the concepts of transition rate (risk) and probability
In a continuous-time model like Demosim, events can occur at any time. Their occurrence depends on waiting times, which are associated with each individual based on his or her present characteristics. The individual waiting times required to run a microsimulation model like Demosim cannot be obtained from observation data; they must be derived.
Waiting times are derived from the transition rate (which quantifies the risk) denoted by . The transition rate is defined by the number of events observed divided by the number of person-years lived. An example of a transition rate in demography is the mortality rate (), found in mortality tables alongside the death probability (), which represents the probability that a person will die during the year.
The waiting time before an event occurs follows an exponential distribution of parameter . Under this exponential law, it is assumed that the risk of experiencing an event (e.g., death) remains constant during a given period of time. The risks in Demosim are thus assumed to be constant, as long as the characteristics that determine the modelled event remain unchanged for the individual. Since most events in Demosim are age-dependent, this period for these events is a maximum of one year.
The probability of an event occurring before or exactly at time is given by the exponential distribution function:
The inverse exponential distribution function, , indicates the time at which a proportion of the population will have experienced the event, knowing that the transition rate is .
Demosim uses a random process in conjunction with the inverse exponential distribution function to generate individual-level waiting times for each simulated event. First, a random value is obtained from the uniform distribution U[0,1]. This value is inserted into the inverse exponential distribution function in place of . For example, if an event has a transition rate of = 0.15 and the random number generated is 0.5, then the waiting time generated for this event will be 4.62 years. Any lower random value will give a waiting time below 4.62 years, and any higher value will give a longer waiting time.
The projection parameters often consist of probabilities rather than transition rates. When this is the case, they need to be converted to transition rates by isolating in the exponential distribution function to obtain , and then replacing with the annual probability and with 1 year. Therefore, an annual probability of dying of 0.10 has a transition rate of 0.1053 because .
Although the concept of probability is frequently used in this document, it should be noted that the Demosim model actually uses risks to derive waiting times. Furthermore, the term “risk” is used in reference to the more precise concept of transition rate only for the sake of simplicity.
To learn more about calculating waiting times, please consult Willekens (2011).
End of text box 2
Main projected components in Demosim
This section, which aims to document the main components projected by Demosim, is subdivided into three main parts. The first is concerned with events that are modelled using waiting times, the second discusses characteristics that are imputed annually, and the third gives an overview of how individuals are created during the simulation.
It should be noted that from this point onward, the document will at times refer to the concept of “module” because Demosim is built in a modular way, with each of its components corresponding to a module. A module includes the computer code specifying the dimensions and functioning of the modelled event, including its relation to other parts of the model and its associated parameters. Table 1 presents the different components of the projection model and summarizes the methods and data sources used in modelling these components.Note 15
Events with waiting times
The first category of events contains those events that are modelled using waiting times (see Box 2). These events make it possible to create a dynamic and distinct life course for each simulated individual. Events in this category are fertility, mortality, internal migration, emigration, registration on the Indian Register and reclassification of registration categories over a lifetime, intragenerational ethnic mobility of Aboriginal people, intragenerational religious mobility, intragenerational linguistic mobility, acquisition of Canadian citizenship by newcomers and changes in education level.
The fertility module has been designed to obtain a projection of births that reflects the differences in fertility between the various groups projected—for example, Aboriginal people and immigrants. The module contains, on one side, the ‘base probabilities’ of an individual having given birth to one or more children during the year preceding the 2011 NHS. These base probabilities were calculated by age, number of children in the home, and having or not having an Aboriginal identity, using the own-children methodNote 16 applied to the NHS data.Note 17 They include adjustments for children not living with their mother, and for mortality. They are also adjusted to reflect what is observed in Vital Statistics. The base probabilities are then combined with the results of complementary log-log regressions (see Box 3) computed using the same NHS data to which the own-children method was applied. The regressions aim to estimate the probability—for various combinations of age group, number of children in the home and Aboriginal identityNote 18 — of having given birth to one or more children during the same period according to other variables. These variables are marital status, education, Aboriginal group, registered Indian status, immigrant status, time elapsed since immigration, generation status, immigrant admission category, place of birth, visible minority group, religion, mother tongue and detailed place of residence (on or off Indian reserves, Inuit Nunangat, CMA, and province and territory).Note 19 For the sake of consistency between the base probabilities and the regressions results, both estimate the number of women having given birth to one or more children, and not the total number of births (since multiple births are possible). To obtain the total number of births, an adjustment consisting of ratios between the number of births during the period and the number of women who have given birth, by Aboriginal identity or visible minority group, is applied.
Start of text box 3
Combining base rates or probabilities with regression results
In a number of Demosim modules, the likelihood of certain events is estimated using base rates combined with relative factors derived from results of regressions. Combining base rates and regression results increases the flexibility for developing projection assumptions and makes it possible to incorporate information from various data sources for event modelling. However, some difficulties are associated with it.
First, the reference category of regression models is normally not the entire population, but only one or more subgroups. A conversion or adjustment is therefore required if the results of such regressions are to be combined with one or more rates that refer to the entire population.
A second difficulty lies in differences between the composition of the population used to calculate the regression models and the composition of the population to which these regression results are applied, that is the base population for the projection. When a data source other than the Demosim base population is required to calculate parameters, the weighted sum of probabilities from the regression will not necessarily be equal to the sum that would be obtained by weighting these same probabilities using the population from another data source.
To address these difficulties, a calibration method is used to adjust the y-coordinate at the origin of the regression models without changing the model’s other coefficients. This adjustment is made in such a way as to reproduce target rates (base rates) within a population whose composition is the same as the composition of the Demosim base population.
To illustrate the method, let us suppose that we performed a logistic regression estimating the probability than an event Y will occur according to a set of characteristics X. This regression was carried out on a survey that we will call source A. This data source A has a population whose composition according to the X characteristics differs from the composition of the Demosim base population, to which, however, the probabilities from source A will be applied. We will call the Demosim base population data source B. Let us suppose that we also want to assume that the probability that the event in question will occur reaches a pre-established target, as is often the case when making projections.
Keep in mind that with a logistic regression, the probability (with A referring here to data source A) that event Y will occur given the set of characteristics X is calculated using the following formula:
If we weight the probabilities using weights derived from data source A (), the overall probability (or the mean probability in the population) that event Y will occur is
If we apply the probabilities resulting from this regression to another data source—say, source B—and we weight it based on the composition of this source (with weights ), we will produce an overall probability of event Y occurring that will not be equal to either the one that we would obtain using only source A, or the one that that we could have obtained using only source B.
Let us now suppose that we want to find an adjustment to reproduce the overall probability for source A or source B. Since the result of the above equation gives the overall probability neither for source A nor for source B, it would not be sufficient to multiply it by the quotient of the two ().
The adjustment that we use for this purpose consists in using an iterative method to find an that we add to the ordinate at the origin so that
This method, while used in the above example to reach a target such as the overall probability for source A or source B, can also be generalized to any target. In Demosim, to modify the assumption by changing the base rates (the target to be reached), a new adjustment specific to the desired target can simply be calculated. This type of adjustment can also be adapted to other types of regressions.
This method of calibrating the model makes it possible to preserve the odds ratios of the regression model while accurately reaching the target, assuming a population whose composition does not change from the start of the projection. Therefore, this method makes it possible to maintain compositional effects that may arise during the projection. As well, since the target set by the base rates is reached by adjusting the regression’s intercept, it limits the probabilities to no more than 100%, regardless of the target. That also means that relative differences are maintained during the projection in terms of odds ratios (for logistic regression) or risk ratios (for proportional risk regression).
End of text box 3
The mortality module has a similar structure to the fertility module, as it also uses base rates combined with regressions results (see Box 3). The mortality module aims to simulate the future number of deaths, taking into account the differences between the projected groups. Because of the fragmented nature of the available data, mortality is modelled separately for Inuit and non-Inuit populations, and among non-Inuit, separately for the population aged 25 and older and that aged 24 and younger.Note 20
- For non-Inuit aged 25 or older, the base rates consist of mortality rates projected by age and sex at the national level, consistent with the methods documented in the technical report of the most recent national projections (Dion et al. 2014).Note 21 The base rates are then combined with the results of proportional hazards regressions (Cox models), stratified by sex and broad age groups. These regressions estimate the risk of dying by Aboriginal ancestry group, visible minority group, time elapsed since immigration, education, living on an Indian reserve, and province or territory of residence. The models were estimated with the data from the 1991-to-2006 Canadian Census Mortality Follow-up Study,Note 22 which linked 1991 Census data for the population aged 25 and older to Vital Statistics data up to 2006.
- For the non-Inuit population aged 24 and younger, mortality is modelled differently for the non-Aboriginal population, the Aboriginal group of First Nations people and the Aboriginal group of Métis. For the non-Aboriginal population, Demosim uses the Li-Lee model to project mortality rates by age, sex and province or territory of residence. For the Aboriginal group of First Nations people, mortality rates come from mortality tables for Registered Indians,Note 23 and are then projected under the assumption that the difference between these rates and the rates for non-Aboriginal people remain constant.Note 24 For the Aboriginal group of Métis, as there are no data on this particular population, the rates are obtained by multiplying the rates for Registered Indians by a factor derived from the difference between the mortality of Métis and that of Registered Indians in the population aged 25 to 64, according to the 1991-to-2006 Canadian Census Mortality Follow-up Study.
- For the Inuit population, mortality tables were calculated using death data for Inuit living in Nunavut from special Vital Statistics data extractions for the years 2000 to 2002, 2005 to 2007, and 2010 and 2011. Among Inuit, the risks of dying are projected while keeping constant the relative difference observed during these periods between the Inuit population and the overall Canadian population.Note 25
The purpose of the internal migration module is to simulate movements among the 84 geographic regions in Demosim, taking into account the main characteristics included in the projection. Two types of migration are modelled:
- Interregional migration refers to migration between the model’s 50 main regions (CMAs and non-CMAsNote 26).
- Intraregional migration pertains to migration on and off reserve, or within and outside Inuit Nunangat, within each of the main regions where there are Indian reserves or Inuit Nunangat regions.
Modelling internal migration includes several steps. The first steps aim to estimate interregional and intraregional migration based on the relationship between place of residence one year earlier and current place of residence (mobility one year) contained in a database consisting of the 2001 and 2006 censuses and the 2011 NHS, to which a constant geography has been applied.Note 27
Interregional migration is modelled in two separate stages. First, complementary log-log regressions models are used to estimate the probabilities of an individual leaving each region based on age, Aboriginal group, registered Indian status, immigrant status, time elapsed since immigration, marital status, place of birth, generation status, visible minority group, number of children at home, age of the youngest child at home, mother tongue, language spoken most often at home, knowledge of official languages and living on an Indian reserve or in an Inuit Nunangat region.Note 28 Migrants are then assigned a destination region. When assigning the destination region, logistic regression models first determine whether the migrants will settle in a francophone region and, if so, whether they will settle in a predominantly francophone region.Note 29 The models are stratified by immigrant status and by whether the place of residence is in or outside a francophone region and take into account the same variables used to estimate the probability to leave a region. Matrices that take into account the region of origin, place of birth, mother tongue, Aboriginal group, registered Indian status, visible minority group and age are then used to determine the specific destination region from the model’s remaining 49 major regions. If the destination region includes Indian reserves or an Inuit Nunangat region, additional models which take into account registered Indian status or having an Inuit identity determine whether or not the individual will live on a reserve or in an Inuit Nunangat community in the destination region.Note 30 Intraregional migrations are simulated using origin- and destination-specific migration rates that take into account age and, as the case may be, Aboriginal identity and education.
The following steps are a series of adjustments to the results of the regression models, the origin-destination matrices and the additional vectors. They are performed using the same database as the previous steps, but they correlate information on mobility over the past year with information on mobility over the last five years. These adjustments were intended to allow the Demosim migration parameters to reproduce (at the start of a projection) the contribution of net internal migration to the population growth of each region, as observed on average from 1996 to 2001, 2001 to 2006, and 2006 to 2011. One advantage of basing regional migration schemas on a longer period is reducing the weight of exceptional short-term phenomena that could substantially change net migration counts in a given year.
The purpose of the emigration module is to project net emigration, which is defined according to the components of the Statistics Canada Demographic Estimates Program (DEP) as the sum of emigration, plus net temporary emigration, minus return emigration. The emigration module has a similar structure to the fertility and mortality modules, as it combines base rates and regression results (see Box 3) for the population aged 18 and older, making it possible to take into account several characteristics, in particular immigrant status, which is known to be a predisposing factor for emigration.Note 31 For the population aged 18 and older, the base emigration rates were calculated by age and sex at the national level by dividing the net number of emigrants, as estimated by the Statistics Canada DEP from 2002/2003 to 2011/2012, by the population (excluding non-permanent residentsNote 32) from the same source for the same period.Note 33 They are combined with the results of a proportional hazards regression model (Cox model) that uses a linkage between the Longitudinal Administrative Database and immigration data from 1995 to 2010 to estimate the propensity of the adult population to emigrate, by country/region of birth, period of immigration, province or territory of residence, age and sex.
For the population aged 17 and younger, net emigration rates were calculated by age, sex, and province or territory using population estimates for 2002/2003 to 2011/2012.
Registration on the Indian Register and reclassification of registration category over an individual’s lifetime
The modules involving registration on the Indian Register have three separate purposes: 1) to model registrations that may occur during an individual’s lifetimeNote 34 as a result of legislative amendments or the agreement recognizing the Qalipu Mi’kmaq First Nation;Note 35 2) to model the reclassifications of registration category from 6(2) to 6(1) that may occur during an individual’s lifetime;Note 36 and 3) to model the late registration of individuals who were entitled to registration at birth.
- The legislative amendments that could cause individuals to register during their lifetime are the 1985 amendments of the Indian Act (Bill C-31) and the Gender Equity in Indian Registration Act (Bill C-3) which came into effect as of January 2011. For these legislative amendments, target numbers of registrations were estimated from the number of registrations by year. For registrations under Bill C-31 from May 2011 to August 2014, target numbers were calculated from the actual registrations recorded on the Indian Register. For subsequent periods, target numbers were calculated by continuing the average downward trend in registrations observed on the Indian Register data from 2007 to 2014. For Bill C-3, the initial targets were again the number of registrations from May 2011 to August 2014 according to the Indian Register. For subsequent years, target numbers of this type of registration were obtained from projections by Indigenous and Northern Affairs Canada. For registrations resulting from the agreement recognising the Qalipu Mi’kmaq First Nation, the target numbers were calculated from the registrations that occurred between the order-in-council coming into force on September 22, 2011 (the date of the band’s creation) and the Supplemental Agreement of June 2013, whose impact on the number of Qalipu registrations is currently unknown. Once the targets had been determined, the individuals in the base population who were likely to register under these components were identified from among those who did not have registered Indian status (according to distributions specific to each component), and then their time of registration was determined in advance.Note 37 The vast majority of individuals selected in this way were initially Non-Status Indians.
- Reclassification from registration category 6(2) to category 6(1) may result from the application of Bill C-3, or for various other reasons.Note 38 The changes resulting from Bill C-3 were modelled using a method similar to the one described above for registration because of legislative amendments. Target numbers were first determined by age, sex and year of change. From May 2011 to August 2014, the numbers were determined using reclassifications that occurred during that period according to the Indian Register. For subsequent years, target numbers of reclassifications vary at the same rate as registrations under Bill C-3. For reclassifications not resulting from Bill C-3, annual reclassification rates were calculated by age and sex based on 2010 data from the Indian Register. They are assumed to be constant throughout the projection period, except from 2011 to 2014 where they were adjusted to reflect the reclassifications observed on the Indian Register.
- The late registration of individuals entitled to registration on the Indian Register at birth is modelled separately for children and adults. For children born during the simulation, the modelling is done in two steps. First, children entitled to Indian registration are identified from among the simulated births.Note 39 Entitlement depends on the mother’s registration category, whether or not she was in a mixed union at the time the child was born, and the inheritance rules for registered Indian status. Those entitled to registration but not having a registered Indian status at birth are assigned a probability of registering on the Indian Register that depends on their age and whether or not they live on an Indian reserve. The probabilities were derived so that they can reproduce the progression by age observed in the NHS of the proportion of children with a registered Indian status among children who are in principle entitled to register—namely the children with two parents who are Registered Indians, or with one parent who is in registration category 6(1).Note 40 Late registration rates derived from the same data are also applied to some children in the base population. Among adults aged 19 years and older, populations at risk are applied late registration rates by age and sex, calculated by dividing the average annual number of registrations of this type from 2008 to 2014, according to the Indian Register, by the 2011 estimated population of Non-Status Indians living off reserve that will not become registered as Qalipu or for legislative reasons.
Intragenerational ethnic mobility of Aboriginal people
The purpose of the Aboriginal intragenerational ethnic mobility module is to simulate changes in reporting of Aboriginal group from one census to the next, a phenomenon that is behind a significant part of the increase in the number of Métis and First Nations people observed at least since 1986 in Canada.Note 41 The parameters of the intragenerational ethnic mobility of Aboriginal people were calculated using a residual method applied to the 1996, 2001 and 2006 censuses and the 2011 NHS, adjusted for net undercoverage. It involves calculating the share of the growth of a given Aboriginal group that remains unexplained after fertility, mortality and net migration have been taken into account, for each five-year period. This unexplained growth is interpreted as resulting from changes in the Aboriginal group reported in the censuses (or the NHS). The net gains in Métis and First Nations people obtained this way were divided by the population that was non-Aboriginal, non-immigrant and not belonging to a visible minority group at the start of the period to obtain probabilities of an individual joining the First Nations group and Métis group over five years, taking into account age and region of residence. The probabilities were averaged for the three periods considered (1996 to 2001, 2001 to 2006 and 2006 to 2011).Note 42
Intragenerational religious mobility
Intragenerational religious mobility refers to changes in religion occurring over an individual’s lifetime. The probability of changing religions was estimated by religion, age, immigrant status and place of birth by applying (similar to intragenerational ethnic mobility) a residual method to the 2001 Census and the 2011 NHS (and alternatively to the 1991 and 2001 censuses), adjusted for net undercoverage.Note 43 Net losses for a given religion were divided by the religion’s population at the start of the period to yield ‘exit’ net rates over 10 years. Individuals who have left a given religion are then distributed among the ‘gaining’ religions in proportion to the net gains recorded over the same period by these religions.
Intragenerational linguistic mobility
The intragenerational linguistic mobility module models changes that may arise during one’s lifetime to the language spoken most often at home and to the knowledge of official languages.Note 44 It includes two separate sets of parameters:
- Changes in the language spoken most often at home have been estimated using logistic regression models that draw on data from linkages between the 2001 and 2006 censuses.Note 45 The models have been stratified by immigrant status, whether the place of residence is in or outside Quebec, and the linguistic profile at the start of the period, defined by the language spoken most often at home and the mother tongue.Note 46 The models estimated the likelihood of a change in the language spoken most often at home over the given five-year period, taking into account age, sex, region of residence, generation status, age at immigration, period of immigration, geolinguistic origin of immigrants and knowledge of official languages. Only the most frequent transitions have been modelled; therefore, the least common transitions based on the data from the linkage between the 2001 and 2006 censuses cannot occur during the simulation.Note 47
- Changes in the knowledge of official languages have been modelled using logistic regressions specific to the initial linguistic profile (defined on the basis of the knowledge of official languages and the language spoken most often at home), the immigrant status and whether the place of residence is in or outside Quebec. The models are estimated using data from the linkage between the 2006 and 2011 censuses, taking into account age, sex, time elapsed since immigration, age at immigration, generation status, place of birth, place of residence, education and mother tongue. Here too, only the most frequent transitions have been modelled, so the others cannot occur during the simulation.Note 48
Acquisition of Canadian citizenship
In Demosim, the acquisition of Canadian citizenship refers to the process through which immigrants become naturalized citizens after they arrive in Canada. This event was modelled by applying a logistic regression model to the 2011 NHS data, adjusted for net undercoverage, that estimates the likelihood of having obtained Canadian citizenship on the basis of the number of years elapsed since immigration, place of birth, immigrant admission category, visible minority group and age at immigration. Because of the rules for obtaining citizenship through naturalization, immigrants settling in Canada during the simulation cannot become citizens until they have lived in Canada for three years. As well, since the data indicate that the percentage of immigrants with Canadian citizenship stops increasing once 15 years have elapsed after immigration status is acquired, the rates of acquisition of citizenship are considered nil after 15 years of residence in Canada.
Change in level of education
The last event projected using waiting times is change in education level. The probabilities associated with this event were derived by combining 2001 General Social Survey (GSS) and 2011 NHS data adjusted for net undercoverage which, together, include the information required for the projection. First, probabilities of change in education level by year of birth, age, sex and immigrant status were obtained by applying logistic regression models to historical data from the 2001 GSS. The population of the 2001 GSS was then projected to 2011 using the calculated probabilities, which were calibrated in three separate steps. An initial calibration was done to specifically reproduce the NHS distributions by education level, year of birth, age, sex and immigrant status. A second calibration added differential probabilities by visible minority group, Aboriginal identity and registered Indian status to reproduce the distributions of these groups in the NHS. The third calibration added differentials by province and territory of birth.
Characteristics imputed annually
Some components of Demosim are not meant to project events but rather to impute certain characteristics to individuals, including marital status, head-of-family and head-of-household status as well as labour force participation. These characteristics are assigned once a year on a fixed date.
Marital status is a variable that is projected mainly for its use in determining other events during the simulation, particularly fertility, to which it is closely related. The marital status module is derived from logistic regression models that are estimated using data from the adjusted 2011 NHS for the population aged 15 and older. The initial models determine whether or not the individual is in a union. If the individual is in a union, other models determine the type of union (married or common law). The models are stratified by sex and by having or not having an Aboriginal identity. They take into account age, number of children at home, immigrant status, generation status, time elapsed since immigration, place of birth, mother tongue, visible minority group, religion, Aboriginal group, registered Indian status and place of residence. The probabilities derived from these models evolve during the projection based on trends observed in the 2001 and 2006 censuses and the 2011 NHS (adjusted), which showed an increasing propensity for couples to live in a common-law union. The marital status module finds its complement in the mixed union parameters that are used to assign characteristics such as generation status to newborns (see the “Creation of individuals during the simulation” section).Note 49
Head of household
A head-of-household statusNote 50 is assigned to individuals annually to obtain a projection of the number of private households by certain characteristics, including a household’s Aboriginal composition. The headship rates methodNote 51 is used to establish a relationship between the number of heads of household and the population, by certain characteristics of the projected population. This rate is then multiplied by the projected population to obtain a future number of private households. For the purposes of these projections, different types of households were identified in the data from the 2011 NHS according to a combination of household characteristics (Aboriginal composition, household size and the presence of individuals younger than 19 years of age, etc.Note 52). The number of heads of household for each type by age, Aboriginal group, registered Indian status, marital status and place of residence was determined and then divided by the total population with the same characteristics to obtain headship rates for use in the annual imputation of head-of-household status during the simulation.Note 53 Head-of-household status is used strictly to derive a number of households. It is not used as a determining factor for other events during simulation. The same holds true for labour force participation.
Labour force participation
The purpose of the labour force participation module is to impute a status to individuals aged 15 and older regarding their labour force participation. The module has been designed to take into account differences in labour force participation among the various groups projected (for example, Aboriginal people, visible minority groups and immigrants). It includes two sets of parameters. The first is composed of labour force participation rates by sex and age group taken from Labour Force Survey (LFS) data. The parameters are then adjusted to take into account populations excluded from the survey, in particular Indian reserves. The second is composed of the results of logistic regressions that estimate (separately by sex and age group) the probability of being in the labour force by the following variables: Aboriginal group, registered Indian status, visible minority group, immigrant status, time elapsed since immigration, generation status, place of birth, marital status, presence of children and age of youngest child, education, knowledge of official languages, and place of residence. The logistic regressions use data from a file that combines data from the 2001 and 2006 censuses and the 2011 NHS (adjusted). These two sets of parameters are combined each January to determine the labour force participation for the upcoming year. An alternative version of this module, based only on the 2011 NHS, also includes immigrant admission category.
Creation of individuals during the simulation
Aside from individuals in the Demosim base population, individuals may be added to the population during the simulation as a result of births, immigration and the arrival of non-permanent residents. New individuals are added by creating complete records, that is, records having all the characteristics required for them to be projected by Demosim. The process for assigning characteristics to new individuals is described below.
Creation of newborns
The creation of newborns from births occurring after the start of the simulation requires the use of methods that differ according to the characteristic to be assigned to the new individuals. First, a number of characteristics to be assigned to newborns do not require any parameters to be calculated and can be assigned automatically, for example marital status (not in a union), education (less than high school), immigrant status (non-immigrant), Canadian citizenship (Canadian citizenship at birth), etc. Other characteristics are assigned probabilistically. The sex of the child is determined by applying a sex ratio of 105 boys born for every 100 girls, as has been observed in Canada for several decades. Religion, the three linguistic variables, visible minority group and Aboriginal group are assigned based on parameters derived by applying the own-children method to data from the adjusted 2011 NHS. Linking the youngest children in this data source to the woman most likely to be the mother makes it possible to calculate the probability that the child has a given set of characteristics, depending on the characteristics of the mother (see the characteristics considered in Table 2). Transition matrices and vectors were created for assigning religion, visible minority group and Aboriginal group. For language variables, the probabilities of the most frequent transitions come from multinomial logistic regression models, and the other probabilities from transition matrices. To maintain consistency among the language variables, the mother tongue and the language spoken most often at home are assigned at the same time using the same models.Note 54 For the same reason, the models for the transmission of knowledge of official languages take into account the results of the models above, in addition to the mother’s characteristics.
The methods for assigning registered Indian status, registration category and generation status differ from the methods above by indirectly taking into account information about the child’s father. Births in Demosim are generated by women, and women are not associated with a spouse. Therefore, it is not possible to directly know the father’s characteristics at the time of birth. However, spousal characteristics may be associated with mothers through mixed unions. This is done in Demosim when a child is born.
A first mixed-union module determines whether the mother is in a union with a category 6(1) Registered Indian, a category 6(2) Registered Indian or an individual not having registered Indian status. The probability that the mother is in one of these types of unions is estimated using a file derived from adjusted 2011 NHS microdata that—by using information on the relationship among members of the same census family—links women in a union who gave birth during the previous year to their spouse and their children. The same probabilities were also calculated using the 2001 Census to establish trends related to mixed unions. Registered Indian status (including the registration category) is then assigned to newborns probabilistically, using transition matrices that take into account the mother’s type of mixed union, as well as other characteristics of the mother and the child (Table 2).Note 55
A second mixed-union module uses the same data source to calculate the probability that the woman is in a union with a spouse whose immigrant status is identical or different at the time the child is born in order to determine the child’s generation status. The module consists of logistic regression models that take into account age, religion, visible minority group, Aboriginal groupNote 56, time elapsed since immigration, mother tongue, language spoken most often at home, presence of young children at home and place of residence.Note 57 Generation status is then assigned to the newborn as follows: the newborn is considered second generation if the mother is an immigrant not in a mixed union, 2.5 generation if the mother is in a mixed union, and third generation or higher if the mother is not an immigrant and not in a mixed union.
Immigration also involves the creation of individuals possessing all the characteristics required for their simulation following their arrival in Canada. This module includes two main dimensions. First, the number of new immigrants is projected annually. Second, the characteristics of the new immigrants are determined using a donor imputation method, with donors being selected from among the immigrants in the Demosim base population. The result is a projected immigrant population whose composition is representative of the immigrant population of the donor pool (which itself may be a subset of the immigrant population, for example recently-admitted immigrants). Adjustments are also made to some of the characteristics that are likely to have changed between the time of immigration and the time of the 2011 NHS—the survey on which Demosim is based—so that they will be as close as possible to what they were at the time of arrival. For example, when a new immigrant is created, the age assigned to the new immigrant is the donor’s age at immigration (and not the donor’s current age); the marital status is imputed on arrival using Demosim’s annual marital status imputation parameters; and education on arrival is imputed using the Demosim education module.
The final Demosim component that requires the creation of individuals is the arrival of new non-permanent residents. This component functions similarly to immigration. Like immigrants, non-permanent residents are projected in two steps: 1) determining an annual net gain in non-permanent residents; and 2) imputing the characteristics of the new non-permanent residents using donors who are selected from among the non-permanent residents in the Demosim base population.Note 58
To learn more about Demosim
To learn more about the 2017 version of Demosim, please refer to the analytical reports Immigration and Diversity: Projections of the Population of Canada and Its Regions, 2011 to 2036 (Statistics Canada 2017-1) and Language Projections for Canada, 2011 to 2036 (Statistics Canada 2017-2). The descriptions of certain concepts and of the selected assumptions and scenarios therein supplement the descriptions in this document.
You can also contact Demography Division, Statistics Canada, by email (email@example.com) or by telephone (1-866-767-5611).
AMOREVIETA-GENTIL, Marilyn, David DAIGNAULT, Norber t ROBITAILLE and Rober t BOURBEAU. 2014. La mortalité des Indiens inscrits (1989-2008), report from the Groupe d’études sur la dynamique démographique des Indiens inscrits (GEDDII), University of Montréal, Demography Department.
AYDEMIR, Abdurrahman and Chris ROBINSON. 2006. Return and Onward Migration Among Working Age Men, Statistics Canada Catalogue no. 11F0019.
BÉLANGER, Alain, Éric CARON-MALENFANT, Laurent MARTEL and René VÉZINA. 2008. “Projecting Ethno- Cultural Diversity of the Canadian Population Using a Microsimulation Approach”, minutes of proceedings of the UNECE/Eurostat workshop on demographic projections in Bucharest (Romania).
BÉLANGER, Alain and Stéphane GILBERT. 2003. “The Fertility of Immigrant Women and their Canadian-Born Daughters”, Report on the Demographic Situation in Canada, 2002, Statistics Canada Catalogue no. 91-209.
BOHNERT, Nora, Patrice DION and Jonathan CHAGNON. 2014. “Projection of Emigration”, in BOHNERT, Nora, Jonathan CHAGNON, Simon COULOMBE, Patrice DION and Laurent MARTEL (editors), Population Projections for Canada (2013 to 2063), Provinces and Territories (2013 to 2038): Technical Report on Methodology and Assumptions, Statistics Canada Catalogue no. 91-620.
BOUCHER, Alexandre, Norbert ROBITAILLE and Éric GUIMOND. 2009. “La mobilité ethnique intergénérationnelle des enfants de moins de 5 ans chez les populations autochtones, Canada, 1996 et 2001”, Cahiers québécois de démographie, volume 38, no. 2.
BRITISH COLUMBIA PROVINCIAL HEALTH OFFICER. 2012. The Health and Well-Being of the Aboriginal Population: Interim Update, Provincial Health Officer’s Special Report.
CARON-MALENFANT, Éric, Anne GOUJON and Vegard SKIRBEKK. (forthcoming). “The religious mobility of immigrants in Canada”.
CARON-MALENFANT, Éric. 2015. “Demosim’s Population Projection Model: Updates and New Developments”, Proceedings of Statistics Canada Symposium 2014: Beyond traditional survey taking: adapting to a changing world.
CARON-MALENFANT, Éric and Alain BÉLANGER. 2006. “The Fertility of Visible Minority Women in Canada”, Report on the Demographic Situation in Canada, 2003 and 2004, Statistics Canada Catalogue no. 91-209.
CARON-MALENFANT, Éric, Patrice DION, André LEBEL and Dominic GRENIER. 2011. “Immigration et structure par âge de la population canadienne : quelles relations?”, Cahiers québécois de démographie, volume 40, no. 2.
CARON-MALENFANT, Éric, Simon COULOMBE, Eric GUIMOND, Chantal GRONDIN and André LEBEL. 2014. “Ethnic Mobility of Aboriginal Peoples in Canada Between the 2001 and 2006 Censuses”, Population, volume 69, no. 1.
CYR, André, Julien BÉRARD-CHAGNON, Patrice DION, Dominic GRENIER and Éric CARON-MALENFANT. 2010. From Traditional Demographic Calculations to Projections by Microsimulations, workshop presented at the 2010 International Methodology Symposium.
DESPLANQUES, Guy. 1993. “Mesurer les disparités de fécondité à l’aide du seul recensement”, Population, volume 48, no. 6.
DION, Patrice, Éric CARON-MALENFANT, Chantal GRONDIN and Dominic GRENIER. 2015. “Long-term Contribution of Immigration to Population Renewal in Canada: A Simulation”, Population and Development Review, volume 41, no. 1.
DION, Patrice, Nora BOHNERT, Simon COULOMBE and Laurent MARTEL. 2014. “ Projection of Mortality”, in BOHNERT, Nora, Jonathan CHAGNON, Simon COULOMBE, Patrice DION and Laurent MARTEL (editors), Population Projections for Canada (2013 to 2063), Provinces and Territories (2013 to 2038): Technical Report on Methodology and Assumptions, Statistics Canada Catalogue no. 91-620.
GRABILL, Wilson R. and Lee Jay CHO. 1965. “Methodology for the Measurement of Current Fertility from Population Data on Young Children”, Demography, volume 2, no. 1.
GUIMOND, Eric. 1999. “Ethnic Mobility and the Demographic Growth of Canada’s Aboriginal Populations from 1986 to 1996”, Report on the Demographic Situation in Canada, 1998 and 1999, Statistics Canada Catalogue no. 91-209.
GUIMOND, Eric, Norbert ROBITAILLE and Sacha SENÉCAL. 2007. “Définitions floues et explosion démographique chez les populations autochtones du Canada de 1986 à 2001”, paper presented at the Statistiques sociales et diversité ethnique conference, organized jointly by the Quebec Inter-university Centre for Social Statistics (QICSS) and the Institut national d’études démographiques (Ined).
HOULE, René and Jean-Pierre CORBEIL. 2013. Methodological Document on the 2011 Census Language Data, Statistics Canada Catalogue no. 98-314-X2011051.
LEPAGE, Jean-François. 2011. “L’oubli des langues maternelles: les données du recensement sous-estiment-elles les transferts linguistiques?”, Cahiers québécois de démographie, volume 40, no. 2.
LI, Nan and Ronald LEE. 2005. “Coherent Mortality Forecast for a Group of Populations: An Extension of the Lee- Carter Method”, Demography, volume 42, no. 3.
MARTEL, Laurent, Éric CARON-MALENFANT, Jean-Dominique MORENCY, André LEBEL, Alain BÉLANGER and Nicolas BASTIEN. 2011. “Projected Trends to 2031 for the Canadian Labour Force”, Canadian Economic Observer, Statistics Canada Catalogue no. 11-010, August 2011.
MORENCY, Jean-Dominique. 2015. “Projections of Aboriginal Families and Households in Canada”, Proceedings of Statistics Canada Symposium 2014: Beyond traditional survey taking: adapting to a changing world.
MORENCY, Jean-Dominique and Éric CARON-MALENFANT. 2014. “Variations de la fécondité selon diverses caractéristiques au recensement”, presented at the seminar of the Association des démographes du Québec, Congrès de l’ACFAS 2014 (Montréal).
RAM, Bali. 2004. “New Estimates of Aboriginal Fertility 1966-1971 to 1996-2001”, Canadian Studies in Population, volume 31, no. 4.
SPIELAUER, Martin. 2010. “Persistence and Change of the Relative Difference in Educational Attainment by Ethnocultural Group and Gender in Canada”, Vienna Yearbook of Population Research, volume 8.
SPIELAUER, Martin. 2014. “The Relation Between Education and Labour Force Participation of Aboriginal Peoples: A Simulation Analysis Using the Demosim Population Projection Model”, Canadian Studies in Population, volume 41, no. 1-2.
STATISTICS CANADA. 2005. Population Projections of Visible Minority Groups, Canada, Provinces and Regions 2001 to 2017, Statistics Canada Catalogue no. 91-541.
STATISTICS CANADA. 2010. Projections of the Diversity of the Canadian Population, 2006 to 2031, Statistics Canada Catalogue no. 91-551.
STATISTICS CANADA. 2011. Population Projections by Aboriginal Identity in Canada, 2006 to 2031, Statistics Canada Catalogue no. 91-552.
STATISTICS CANADA. 2012. Census Dictionary: Census year, 2011, Statistics Canada Catalogue no. 98-301.
STATISTICS CANADA. 2013. National Household Survey Dictionary, 2011, Statistics Canada Catalogue no. 99-000.
STATISTICS CANADA. 2014. Population Projections for Canada (2013 to 2063), Provinces and Territories (2013 to 2038), Statistics Canada Catalogue no. 91-520.
STATISTICS CANADA. 2015. Census Technical Report: Coverage. 2011 Census, Statistics Canada Catalogue no. 98-303.
STATISTICS CANADA. 2015-2. Projections of the Aboriginal Population and Households in Canada, 2011 to 2036, Statistics Canada Catalogue no 91-552.
STATISTICS CANADA. 2015-3. Demosim: An Overview of Methods and Data Sources: Demosim 2015, Statistics Canada Catalogue no. 91-621.
STATISTICS CANADA. 2016. Population and Family Estimation Methods at Statistics Canada, Statistics Canada catalogue no. 91-528.
STATISTICS CANADA. 2017-1. Immigration and Diversity: Population Projections for Canada and Its Regions, 2011 to 2036, Statistics Canada catalogue no. 91-551.
STATISTICS CANADA. 2017-2. Language Projections for Canada, 2011 to 2036, Statistics Canada catalogue no. 89-657.
TJEPKEMA, Michael and Russell WILKINS. 2011. “Remaining Life Expectancy at Age 25 and Probability of Survival to Age 75, by Socio-Economic Status and Aboriginal Ancestry”, Health Reports, volume 22, no. 4.
UNITED NATIONS. 1973. Manual VII. Methods of Projecting Households and Families, Demographic Studies no. 54 ST/OA/SERA, New York.
VAN IMHOFF, Evert and Wendy POST. 1998. “Microsimulation methods for population projection”, Population, an English selection, 10th year, no. 1.
WILKINS, Russell, Michael TJEPKEMA, Cameron MUSTARD and Robert CHOINIÈRE. 2008. “The Canadian Census Mortality Follow-Up Study, 1991 Through 2001”, Health Reports, volume 19, no. 3.
WILLEKENS, Frans. 2011. “La microsimulation dans les projections de population”, Cahiers québécois de démographie, volume 40, no. 2.
- Date modified: