Longitudinal Immigration Database (IMDB) Technical Report, 2019
8 Comparability

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Skip to text

Text begins

8.1 Historical coverage changes

Over the years, the coverage and content of the IMDB has evolved. The original IMDB_T1FF files included only data on immigrants who landed in Canada in 1980 or thereafter. Since the 2013 IMDB release, for the 1982 tax year and subsequent tax years, non-permanent resident filers were added to the IMDB_T1FF files. As a result of this change, it is now possible to have temporary resident permit information for immigrants with pre-admission experience in Canada.

In 2012, the IMDB underwent a redesign. Coverage of the IMDB was modified to include in the database immigrants who obtained landed immigrant status in 1980 or thereafter and have filed at least one tax return since 1982, regardless of whether or not they filed taxes after admission. The IMDB initially included only individuals who obtained landed immigrant status in 1980 or subsequent years and had filed at least one tax return after becoming landed immigrants. Prior to this cycle, the IMDB included up to the first 16 years of tax files belonging to a given permanent resident (Dryburgh 2004). This cap on the number of tax files for a given individual no longer applies.

The tax data included in the IMDB initially came mainly from T1 forms, and only a select number of key tax variables at the person level were retained. For the 2006 IMDB and subsequent iterations of the IMDB, files in the T1FF for 1982 and subsequent years were used and resulted in an initial linkage rate of 80%. From this point in time, the IMDB excluded the 1980 and 1981 tax files since information for these years is not available in the T1FF.

The Field Operations Support System (FOSS) was initially used to gather the immigration data included in the IMDB. For the 2013 immigration year and subsequent immigration years, the Global Case Management System (GCMS) will be used. As a result some variables have ceased to be provided by IRCC. These legacy variables will be available on the file PNRF_extra; they are listed in the immigration component of the IMDB data dictionary.

With the 2018 IMDB release, new changes have been made with respect to the persons integrated into the IMDB. First, there has been a file structure change: the taxfilers and the non-taxfilers have merged. As well, there has been a coverage change: The IMDB Universe has changed to include all immigrants since 1952, as well as all non-permanent residents since 1980. The non-permanent (temporary) residents’ tax are also now included in the IMDB_T1FFs.

8.2 Methodological changes

The methodology used to perform the record linkage has been modified over the years.

The initial IMDB linkage rate was 55% for the 1995 IMDB (Langlois and Dougherty 1997), but the tools and methods used to perform the record linkages have evolved. This explains the improvement in linkage rates through the years.

In the late 2000s, the linkage rate was approximately 81%. For the 2012 IMDB, information on dependents was used to perform the record linkage; this allowed for linking a greater proportion of immigrant children. This information was available from the Canadian Child Tax Benefit (CCTB) file. It is to be noted that the addition of children does not improve the taxfiler rate. As a result of the methodological changes, the linkage rate of the 2014 IMDB was 89%.

For the 2015 IMDB, including in the linkage process the Social Insurance Register (SIR) – a database specifically for SIN data – increased the linkage rate to 97%. The Social Insurance Register provides very high-quality data, and about 730,000 Social Insurance Numbers are found exclusively on this register (Diaz-Papkovich 2017).

For the 2016 IMDB, a new record linkage process was used in order to facilitate the linkage of the IMDB to other data sources. Immigration data were linked to tax data via the SDLE (see section 4). From this point the linkage will be to Statistics Canada’s Derived Record Depository.

For the 2018 IMDB, in order to improve the record linkage results, a combination of Social Data Linkage Environment (SDLE) linkage results as well as results from the Linkage Control File (LCF) was employed, like the 2015 instalment. For the 2019 IMDB, SDLE linkage results were used exclusively.

8.3 Historical database content changes

Please refer to IMDB dictionaries (immigration and tax components) for a complete description of file content.
Some key recent IMDB content modifications are listed below.

In 2012, the 2009 IMDB underwent a redesign; a flag was added to identify outliers on the T1FF files being created. The spouse identification number (SP_IDI) variable was introduced in the 2010 IMDB, allowing for the identification of immigrants with immigrant spouses. The year and month of death were added to the 2013 IMDB; this allowed for the identification of immigrants admitted to Canada in 1980 or thereafter who were deceased. Following the addition of non-permanent resident data to the IMDB, some temporary resident permit details (type, effective dates, etc.) have been available since the 2013 IMDB.

For the 2016 IMDB, a flag was added to identify Express Entry immigration category, along with Syrian refugee resettlement waves and the year and month of citizenship.

For the 2018 IMDB, there has been a coverage change. The IMDB has expanded to include permanent residents admitted from 1952 to 1979, along with the taxfiles of all non-permanent residents since 1980. In addition, several data modules have been added to the IMDB: children, settlement, wages, as well as details on express entry.

8.4 Comparability with other immigration data sources

The IMDB is one of many statistical programs that can serve to produce estimates pertaining to the immigrant population. In some instances, these estimates will differ as a result of a number of factors, such as coverage and limitations due to the type of data (administrative data versus survey data versus census data). Some of these statistical programs and differences with the IMDB are described in this section. The 2013 IMDB is used in performing the comparisons.

8.4.1 Longitudinal Administrative Databank (LAD)

The Longitudinal Administrative DatabankNote (LAD) consists of a 20% longitudinal sample of Canadian taxfilers. It is linked to the IMDB to include a sample of 20% of the IMDB record and to add immigrant-specific variables, such as landing year, immigration category, and marital status at admission. It contains information about individuals and census families. It is useful for longitudinal analysis, which compares immigrant income and mobility with those of Canadian taxfilers. Any analysis comparing immigrant taxfilers to the Canadian taxfiler population should employ this dataset.

It is to be noted that the LAD contains fewer immigration variables than the IMDB. For example, pre-admission information, such as the number of work permits and study permits, is not available in the LAD. Admission information, including the intended occupation and the destination province, is also not available in the LAD.

Table 10 contains the mean and median total income (XTIRC) from the 2012 tax year of immigrants who landed during the period from 1982 to 2013, by gender, illustrating how comparable the estimates produced from these databases are. The mean and median total income by gender, as expected, are similar for both data sources. The differences can be explained by the fact the LAD is a 20% sample of the Canadian population and the fact that the IMDB is a census of linked immigrant taxfilers admitted to Canada since 1980. The population counts are different, but neither sources should be used for population counts, the LAD being a sample of tax filers and the IMDB being limited to immigrant taxfilers. The population of the LAD is estimated by multiplying the records by a weight of 5.


Table 10
Comparability of the 2012 total income between the LAD and the IMDB for immigrants who landed in any year from 1982 to 2013
Table summary
This table displays the results of Comparability of the 2012 total income between the LAD and the IMDB for immigrants who landed in any year from 1982 to 2013 Male, Female, Total, Population, Mean and Median, calculated using number and dollars units of measure (appearing as column headers).
Male Female Total
Population Mean Median Population Mean Median Population Mean Median
number dollars number dollars number dollars
Individual
IMDB 2,776,700 41,900 29,400 2,906,000 28,700 20,600 5,682,690 35,000 24,200
LAD 2,686,300 41,700 29,200 2,803,100 28,700 20,500 5,489,390 34,900 24,100
Family
IMDB ... 73,700 56,300 ... 69,900 51,400 ... 71,700 53,700
LAD ... 73,800 56,500 ... 69,700 51,500 ... 71,700 53,900

In Table 11, the comparability was restricted to the 2012 total income of immigrants who landed in 2011. The estimated differences observed between the IMDB and the LAD for this group are greater than those observed for the immigrant population that landed in any year from 1980 to 2013. This could be explained by the fact that the population of interest is smaller and more specific. The LAD estimates are derived from the records included in the 20% sample of immigrants who landed in 2011. These records do not always correspond to the 20% of the specific population in the IMDB. They are likely to constitute a smaller proportion of the specific population in the IMDB, as the sample was not drawn to be representative of this specific population. The IMDB estimates are derived from the linked immigrant population who landed in 2011 and filed taxes in 2012. Thus, the estimates from LAD may take on slightly different values than the IMDB when subsets of populations are examined.


Table 11
Comparability of mean and median 2012 total income for immigrants who landed in 2011
Table summary
This table displays the results of Comparability of mean and median 2012 total income for immigrants who landed in 2011 Male, Female, Total, Mean and Median, calculated using dollars units of measure (appearing as column headers).
Male Female Total
Mean Median Mean Median Mean Median
dollars
Individual
IMDB 30,100 22,400 18,900 14,100 24,300 17,800
LAD 29,500 22,100 18,700 13,900 23,900 17,500
Family
IMDB 49,900 39,300 48,200 37,100 49,000 38,200
LAD 49,300 39,000 48,000 36,600 48,600 37,800

8.4.2 Census

The census long form and the 2011 National Household Survey (NHS) collect data on immigrants. These data are collected for a proportion of the population (refer to Census Program description for exact proportion, as this value has differed throughout time). The place of birth, place of birth of parents, immigration status, year of immigration, age at immigration, and citizenship are collected. Since the 2016 Census immigration category is also available. The Census collects data on first-, second-, and older-generation Canadians, whereas the IMDB collects only data on newcomers and their families. The Census also contains data on visible minorities, education, housing and language for the census year although, unless the landing year is a Census year, it holds no record of this information at admission. The Census does not allow longitudinal study of the economic outcomes or long-term mobility of immigrants. More details on the Census Program are available on the Statistics Canada website. There is a detailed technical report available to obtain more information.

The 2011 National Household Survey (NHS) estimated that over 4.6 million immigrants living in Canada in 2011 had landed during the period from 1981 to 2011. Table 12 compares the estimates of immigrant populations by admission decades from NHS and the PNRF. The 2013 PNRF should not be used to estimate population counts, even after identified death records are removed. Doing so would result in an overestimation of the immigrant population living in Canada who were admitted during the period from 1981 to 2011 because the PNRF does not take into account emigration. Also, the PNRF is a subset of the immigrant population, as only taxfilers are included in this file. This may account for lower population counts in the PNRF than in the NHS for the most recent cohort of immigrants (2001 to 2011). Deaths shown in Table 12 are based on the Death_indicator (described in Section 7.2.2).


Table 12
Comparability of population estimates between the Longitudinal Immigration Database and the National Household Survey
Table summary
This table displays the results of Comparability of population estimates between the Longitudinal Immigration Database and the National Household Survey. The information is grouped by Landing decade (appearing as row headers), NHS estimates and 2013 PNRF estimates, calculated using number units of measure (appearing as column headers).
Landing decade NHS estimates 2013 PNRF estimates
number
1981 to 1990 949,890 1,052,650
1991 to 2000 1,539,055 1,896,235
2001 to 2011 2,154,985 2,120,290
Total 4,643,930 5,069,175

8.4.3 Longitudinal Survey of Immigrants to Canada (LSIC)

The Longitudinal Survey of Immigrants to Canada (LSIC) was designed to provide information on how new immigrants adjust to life in Canada during their first four years of settlement and to understand the factors that can help or hinder this adjustment. Data on immigrants aged 15 years and older who landed in Canada from abroad at any time from October 1, 2000, to September 30, 2001, were collected for three waves. The LSIC allows studies on language proficiency, housing, education, foreign credential recognition, employment, health, values and attitudes, the development and use of social networks, income, and perceptions of settlement in Canada. The IMDB contains characteristics such as education and language only at admission, whereas the LSIC allows for the evaluation of changes through time. Additional information on the LSIC is available on the Statistics Canada website.

The LSIC estimated that 164,200 immigrants aged 15 years and older landed in Canada from abroad at any time from October 1, 2000, to September 30, 2001. The estimate for the same population is 156,670 for the 2013 IMDB when calculated according to the PNRF (Table 13). Some of this difference is due to the combination of the exclusion of non-filers from the PNRF estimate and emigration not being captured in the IMDB. Part of the difference is explained by the fact that the LSIC is a survey that introduces variance estimates. As shown in Table 14, the coverage proportions by age group vary across age groups despite the LSIC population being of tax filing age. It is to be noted that the LSIC age is the age approximately six months after admission while the IMDB is the age at admission. Also, the calculation of the LSIC estimates used wave 1 weights, which were designed to estimate the number of immigrants in this cohort still living in Canada six months after admission. The lower proportion of immigrants aged 65 years and older could be due to a lower proportion of filers for this age group. The higher number of immigrants aged 15 to 24 in the IMDB than in the LSIC likely results from emigration not being accounted for.


Table 13
Gender distribution: Longitudinal Immigration Database compared to Longitudinal Survey of Immigrants to Canada
Table summary
This table displays the results of Gender distribution: Longitudinal Immigration Database compared to Longitudinal Survey of Immigrants to Canada 2013 LSIC and 2013 PNRF, calculated using number and percent units of measure (appearing as column headers).
2013 LSIC 2013 PNRF
number percent number percent
Male 81,550 49.7 77,640 49.6
Female 82,650 50.3 78,830 50.4
Total 164,200 100.0 156,470 100.0

Table 14
Age group distribution: Longitudinal Immigration Database compared to Longitudinal Survey of Immigrants to Canada
Table summary
This table displays the results of Age group distribution: Longitudinal Immigration Database compared to Longitudinal Survey of Immigrants to Canada. The information is grouped by Age groups (appearing as row headers), LSIC and 2013 PNRF, calculated using number and percent units of measure (appearing as column headers).
Age groups LSIC 2013 PNRF
number percent number percent
15 to 24 26,730 16.3 27,990 17.9
25 to 34 65,500 39.9 63,050 40.3
35 to 49 53,970 32.9 49,030 31.3
50 to 64 12,890 7.8 12,280 7.8
65 and over 5,100 3.1 4,120 2.6
Total 164,200 100 156,470 100.0

8.5 Discussion of the IMDB with different linkages

To enhance the analytical capacity of the IMDB, several data sources have been integrated, including the Census, Canadian Community Health Survey (CCHS), Discharge Abstract Database (DAD), General Social Survey (GSS), Longitudinal Administrative Databank (LAD), and Longitudinal Survey of Immigrants to Canada (LSIC). Below a brief overview of each is given.

8.5.1 Census

Conducted every five years, the Census of Population is the primary source of sociodemographic data for specific population groups such as lone-parent families, Aboriginal peoples, immigrants, seniors and language groups. Adjusted population counts from the Census are used as the base for the Population Estimates Program.

The Census is delivered in two questionnaires, the short form and the long form. The short form is used to enumerate all usual residents of all private dwellings in the 2016 Census and residents who are overseas (In 2016, this included Canadian government employees (federal and provincial) and their families, and members of the Canadian Forces and their families). It contains questions on basic demographic information, such as age, sex, knowledge of official languages, household composition, and more.

In 2016, a sample of 25% of Canadian households received a long-form questionnaire. It contains topics ranging from level of education, activity limitations, ethnic origins, and more. Income data were obtained from personal income tax and benefits files. Additional immigration data on admission category were obtained from administrative files from Immigration, Refugees and Citizenship Canada.

The Census undergoes a complex process of frame and sample design, collection, coding, edit and imputation, and certification before dissemination. For more information regarding any aspect of the 2016 Census, please refer to the Guide to the Census of Population, 2016, or the plethora of available reference material.

8.5.2 Canadian Community Health Survey (CCHS)

The Canadian Community Health Survey is a joint project between Statistics Canada and Health Canada.The annual component of the Canadian Community Health Survey (CCHS) collects cross-sectional information about the health, health behaviours, and health care use of the non-institutionalised household population aged 12 or older.

The survey excludes full-time members of the Canadian Forces and residents of reserves and some remote areas, together representing about 4% of the target population. The CCHS was first conducted in 2001 (cycle 1.1), and was repeated every two years until 2005 (cycle 3.1), each time with a sample of size of approximately 130,000. Starting in 2007, the survey was conducted annually (sample size of 65,000). Response rates ranged from 69.8% to 78.9%. Details about the sampling strategy and content are available in the CCHS user guide and data documentation, which are available from your RDC analyst.

The CCHS focus content surveys are designed to provide cross sectional provincial level results on specific focused health topics. Two focus content cycles were used in this linkage project. The CCHS Mental Health and Well-being (2002 and 2012) collected information about mental disorders, mental health system use, and disability associated with mental health problems among the household population aged 15 and older. There is a detailed technical report available to obtain more information.

For more information regarding the CCHS, please refer to the Canadian Community Health Survey on the Statistics Canada website.

8.5.3 Discharge Abstract Database (DAD)

The Discharge Abstract Database (DAD) is a national database collecting administrative, clinical, and demographic information on all separations from acute care institutions, including discharges, deaths, sign-outs and transfers, within a fiscal year (April 1 to March 31). With time, DAD has been extended to capture data on day-surgery procedures, rehabilitation, long-term care, and other types of care. Note that DAD is event-based, meaning that there will be more than one record for a person hospitalised more than once in a fiscal year. Collection requirements change by data year and by jurisdiction.

More than 3.2 million abstracts are submitted annually to the DAD, representing approximately 75% of all acute inpatient separations in Canada. Quebec does not submit data to the DAD; Quebec’s acute inpatient separations are reported to the Hospital Morbidity Database (HMDB) and usually account for 25% of total inpatient separations in Canada. About 2.4 million day surgery abstracts are submitted to the Canadian Institute for Health Information (CIHI) annually; approximately 35% are sent to the DAD and 65% are sent to the National Ambulatory Care Reporting System (NACRS).

The population of reference usually includes all separations from acute inpatient care and day surgery institutions in Canada (excluding stillbirths and cadaveric donor cases) from April 1 to March 31. All acute care data except that from Quebec is submitted to the DAD; Quebec acute care data is submitted via Quebec’s ministère de la Santé et des Services sociaux once per year and is included in the HMDB. Day surgery data from Ontario, Alberta and Nova Scotia is submitted to NACRS.

For more information regarding the DAD, please refer to the Discharge Abstract Database on the CIHI website. There is a detailed technical report available to obtain more information.

8.5.4 General Social Survey (GSS)

GSS (Canada’s General Social Survey) is an independent annual, cross-sectional survey, thoroughly examining one topic, in order to monitor changes in living conditions. The GSS is a measure of the Canadian well-being and is able to yield information on specific social policy issues. Each survey collects in-depth socio-demographic data age, sex, education, religion, ethnicity, income, etc.

The GSS is a comprehensive look at several essential topics, including families, caregiving, time-use, victimisation, volunteering, etc. Each of the six survey themes is repeated comprehensively approximately every 5 years.

Until 1998, its sample size was set at 10,000. It increased in 1999 to a 25,000 person target.  The larger sample size allows the basic estimates to be available at the provincial, national, and certain census metropolitan area levels (CMAs).

For more information regarding the GSS, please refer to the General Social Survey on the Statistics Canada website.

8.5.5 Longitudinal Administrative Databank (LAD)

The LAD is a random, 20% sample of the T1 Family File (T1FF) tax database. Selection for LAD is based on an individual’s SIN. There is no age restriction, but people without a SIN can only be included in the family component. Once a person is selected for the LAD, the individual remains in the sample and is picked up each year from the T1FF if he or she appears on the T1 that year. Individuals selected for the LAD are linked across years by a unique non-confidential LAD identification number (LIN__I) generated from the SIN, to create a longitudinal profile of each individual.

The LAD is augmented each year with a sample of new tax filers so that it consists of approximately 20% of tax filers for every year. The 20% sample has increased from 3,227,485 people in 1982 to 5,579,280 in 2016 (an increase of 73%). This increase reflects increases in the Canadian population and increases in the incidence of tax filing as a result of the introduction of the Federal sales tax credit in 1986 and the Goods and Services Tax credit in 1989.

For more information regarding the LAD, please refer to the Longitudinal Administrative Databank or to LAD Data Dictionary.

8.5.6 Longitudinal Survey of Immigrants to Canada (LSIC)

The Longitudinal Survey of Immigrants to Canada (LSIC) is used to capture information to better understand the lives of recent immigrants to Canada. The LSIC is designed to capture the first four years of their settlement in Canada, providing indicators of how immigrants are meeting challenges such as knowing or becoming more fluent in one of both of Canada’s official languages, participating in labour market, accessing or education training.  The first four years are a time immigrants when build their ties to Canada, economic, social, and cultural.  

Objective of the survey is to: study the lives of new immigrants in Canada their adjustment over time and to see what facilitates and hinders their integration into Canadian society.

The target population for the survey consists of immigrants that must fulfil the following requirements:

The population of interest to LSIC are those immigrants still living in Canada at the time of the interview

The survey comes in three waves, which are three separate questionnaires that were each put through a rigorous testing process.

For more information regarding the LSIC, please refer to Longitudinal Survey of Immigrants to Canada

For more details on the LAD, please refer to the description available on the Statistics Canada website or enquire about a detailed technical report

Date modified: