Longitudinal Immigration Database (IMDB) - Technical Report, 2015
Appendices
Archived Content
Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
A) Links to key IMDB documents and web pages
Dictionaries (tax and immigration component):
Available to data users or upon request by contacting Statistics Canada by email at STATCAN.infostats-infostats.STATCAN@canada.ca
http://www5.statcan.gc.ca/COR-COR/COR-COR/objList?lang=eng&srcObjType=SDDS&srcObjId=5057&tgtObjType=ARRAY
http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getInstanceList&Id=7196
http://www5.statcan.gc.ca/COR-COR/COR-COR/objList?lang=eng&srcObjType=SDDS&srcObjId=5057&tgtObjType=DAILYART
http://www5.statcan.gc.ca/COR-COR/COR-COR/objList?lang=eng&srcObjType=SDDS&srcObjId=5057&tgtObjType=STUDIES
The Consumer Price Index (62-001-X):
http://www5.statcan.gc.ca/olc-cel/olc.action?lang=en&ObjId=62-001-X&ObjType=2
Description of the annual Income Estimates for Census Families and Individuals (T1 Family File):
http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=4105&lang=fr&db=imdb&adm=8&dis=2.
B) Coverage
The 2015 IMDB was used to produce these counts. Filers are linked immigrants who have filed a tax return at least once since 1982.
Landing year | TaxfilersTable 15 Note 1 | Non-taxfilers | Total | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Immigrants | PR | PR | NPR | Deaths | Immigrants | PR | PR | NPR | Deaths | Immigrants | PR | PR | NPR | Taxfilers | |
number | percent | |||||||||||
1980 | 121,400 | 116,500 | 4,900 | 17,900 | 21,700 | 21,200 | 600 | 1,500 | 143,100 | 137,700 | 5,500 | 84.8 |
1981 | 108,100 | 94,700 | 13,400 | 16,000 | 20,500 | 19,300 | 1,100 | 1,300 | 128,600 | 114,100 | 14,500 | 84.1 |
1982 | 103,800 | 87,700 | 16,100 | 14,200 | 17,300 | 16,200 | 1,100 | 1,200 | 121,100 | 103,900 | 17,200 | 85.7 |
1983 | 77,500 | 62,900 | 14,600 | 11,300 | 11,600 | 10,800 | 800 | 1,000 | 89,000 | 73,600 | 15,400 | 87.1 |
1984 | 77,800 | 61,200 | 16,600 | 10,600 | 10,200 | 9,400 | 800 | 900 | 88,000 | 70,600 | 17,400 | 88.4 |
1985 | 75,300 | 59,100 | 16,300 | 9,200 | 8,600 | 8,100 | 500 | 600 | 83,900 | 67,200 | 16,800 | 89.7 |
1986 | 89,200 | 65,400 | 23,800 | 9,500 | 9,600 | 9,000 | 600 | 500 | 98,800 | 74,400 | 24,400 | 90.3 |
1987 | 137,500 | 102,900 | 34,600 | 11,800 | 13,700 | 12,900 | 700 | 600 | 151,200 | 115,800 | 35,300 | 90.9 |
1988 | 145,300 | 127,700 | 17,600 | 11,300 | 15,400 | 14,500 | 900 | 500 | 160,800 | 142,300 | 18,500 | 90.4 |
1989 | 172,500 | 147,900 | 24,600 | 11,600 | 18,200 | 16,800 | 1,400 | 500 | 190,700 | 164,700 | 26,000 | 90.5 |
1990 | 194,400 | 160,400 | 34,000 | 12,200 | 21,100 | 19,400 | 1,700 | 500 | 215,400 | 179,700 | 35,700 | 90.3 |
1991 | 210,800 | 139,500 | 71,400 | 13,200 | 21,000 | 18,100 | 2,900 | 500 | 231,800 | 157,600 | 74,200 | 90.9 |
1992 | 231,200 | 146,700 | 84,500 | 13,300 | 22,700 | 19,200 | 3,500 | 400 | 253,900 | 165,800 | 88,100 | 91.1 |
1993 | 233,800 | 168,100 | 65,700 | 12,500 | 21,900 | 19,100 | 2,800 | 500 | 255,700 | 187,200 | 68,500 | 91.4 |
1994 | 202,000 | 163,100 | 38,800 | 10,100 | 21,600 | 19,900 | 1,700 | 400 | 223,600 | 183,100 | 40,500 | 90.3 |
1995 | 191,500 | 150,900 | 40,600 | 8,200 | 20,600 | 19,100 | 1,600 | 300 | 212,200 | 170,000 | 42,200 | 90.2 |
1996 | 201,700 | 159,200 | 42,500 | 7,000 | 23,700 | 22,000 | 1,700 | 300 | 225,400 | 181,200 | 44,200 | 89.5 |
1997 | 192,500 | 156,300 | 36,200 | 5,800 | 22,900 | 21,600 | 1,300 | 200 | 215,500 | 177,900 | 37,500 | 89.3 |
1998 | 158,700 | 126,400 | 32,300 | 4,200 | 15,000 | 13,800 | 1,200 | 190 | 173,700 | 140,200 | 33,500 | 91.4 |
1999 | 172,600 | 137,800 | 34,800 | 4,000 | 16,800 | 15,600 | 1,200 | 150 | 189,400 | 153,400 | 36,000 | 91.1 |
2000 | 205,200 | 165,300 | 39,900 | 4,100 | 21,600 | 20,200 | 1,400 | 180 | 226,700 | 185,400 | 41,300 | 90.5 |
2001 | 222,900 | 183,800 | 39,100 | 4,200 | 26,900 | 25,300 | 1,500 | 170 | 249,800 | 209,100 | 40,700 | 89.2 |
2002 | 199,700 | 166,500 | 33,200 | 3,700 | 28,600 | 27,200 | 1,300 | 170 | 228,200 | 193,700 | 34,500 | 87.5 |
2003 | 190,000 | 156,700 | 33,300 | 3,100 | 30,600 | 29,200 | 1,300 | 140 | 220,500 | 185,900 | 34,600 | 86.2 |
2004 | 199,300 | 156,200 | 43,100 | 2,500 | 36,000 | 34,100 | 1,900 | 130 | 235,300 | 190,300 | 45,000 | 84.7 |
2005 | 217,600 | 167,800 | 49,800 | 2,100 | 44,200 | 41,700 | 2,500 | 80 | 261,800 | 209,400 | 52,300 | 83.1 |
2006 | 208,300 | 154,200 | 54,100 | 2,100 | 42,800 | 40,100 | 2,700 | 110 | 251,100 | 194,200 | 56,900 | 83.0 |
2007 | 193,300 | 140,900 | 52,400 | 1,700 | 42,900 | 40,200 | 2,700 | 90 | 236,200 | 181,100 | 55,100 | 81.8 |
2008 | 198,200 | 143,700 | 54,500 | 1,500 | 48,400 | 45,800 | 2,600 | 90 | 246,600 | 189,500 | 57,100 | 80.4 |
2009 | 201,500 | 143,700 | 57,800 | 1,300 | 50,100 | 47,200 | 3,000 | 90 | 251,600 | 190,800 | 60,800 | 80.1 |
2010 | 216,500 | 156,000 | 60,500 | 1000 | 63,600 | 59,800 | 3,800 | 70 | 280,100 | 215,800 | 64,300 | 77.3 |
2011 | 189,600 | 135,300 | 54,300 | 700 | 58,500 | 54,400 | 4,100 | 40 | 248,100 | 189,700 | 58,500 | 76.4 |
2012 | 196,400 | 135,500 | 60,900 | 600 | 60,800 | 56,400 | 4,500 | 10 | 257,200 | 191,900 | 65,300 | 76.4 |
2013 | 195,100 | 130,500 | 64,600 | 500 | 63,400 | 58,500 | 4,900 | 10 | 258,500 | 189,000 | 69,500 | 75.5 |
2014 | 192,000 | 109,400 | 82,600 | 180 | 67,500 | 61,500 | 6,000 | 10 | 259,500 | 171,000 | 88,600 | 74.0 |
2015 | 185,300 | 102,500 | 82,800 | 30 | 85,700 | 78,500 | 7,200 | 0 | 271,000 | 181,000 | 90,000 | 68.4 |
Total | 6,308,500 | 4,782,400 | 1,526,200 | 243,210 | 1,125,700 | 1,046,100 | 79,500 | 13,430 | 7,434,000 | 5,828,200 | 1,605,900 | 84.9 |
Source: Statistics Canada, 2015 Longitudinal Immigration Database. |
Sex and cohorts | Age at landing | ||||||
---|---|---|---|---|---|---|---|
0 to 14 | 15 to 24 | 25 to 34 | 35 to 44 | 45 to 64 | 65 and older | Total | |
percent | |||||||
1980 to 1989 cohorts | |||||||
Male | 82.9 | 94.1 | 95.0 | 93.6 | 86.6 | 61.9 | 89.4 |
Female | 82.2 | 92.0 | 93.2 | 92.8 | 83.1 | 59.9 | 87.2 |
Total | 82.6 | 93.0 | 94.1 | 93.2 | 84.6 | 60.7 | 88.7 |
1990 to 1999 cohorts | |||||||
Male | 83.3 | 94.4 | 93.7 | 93.2 | 91.3 | 77.6 | 90.6 |
Female | 82.0 | 94.6 | 94.3 | 93.7 | 90.3 | 77.0 | 90.5 |
Total | 82.6 | 94.5 | 94.0 | 93.4 | 90.8 | 77.3 | 90.7 |
2000 to 2009 cohorts | |||||||
Male | 50.2 | 95.7 | 92.7 | 93.2 | 93.3 | 89.7 | 83.6 |
Female | 49.2 | 96.0 | 94.5 | 94.6 | 94.1 | 88.6 | 85.4 |
Total | 49.8 | 95.9 | 93.7 | 93.9 | 93.7 | 89.1 | 84.2 |
2010 to 2015 cohorts | |||||||
Male | 6.7 | 85.4 | 94.2 | 92.3 | 90.0 | 84.4 | 73.1 |
Female | 6.8 | 87.8 | 94.8 | 93.7 | 90.3 | 83.4 | 76.1 |
Total | 6.7 | 86.7 | 94.5 | 93.0 | 90.2 | 83.8 | 73.4 |
Source: Statistics Canada, 2015 Longitudinal Immigration Database. |
C) Previous analysis
Since its creation, the IMDB has been used to produce several analyses. The following is a summary of some Statistics Canada studies that have made use of the IMDB.
In recent years, several releases in The Daily have featured the IMDB. The subjects discussed include changes in the regional distribution of new immigrants to Canada, income and mobility of immigrants, immigrants in the hinterlands, and immigrants who leave Canada. These articles are accessible via the Statistics Canada website.
Papers using the IMDB have been published in the Perspectives on Labour and Income publication series (75-001-X) and the Analytical Studies Branch Paper Series. Among the topics covered were the income of immigrants who pursue postsecondary education in Canada, and the earnings advantage of landed immigrants who were previously temporary residents in Canada.
D) Best practices and tips for analysts
D.1 Programming tips
This section provides programming information for individuals who want to have a better understanding of the programming structure used to access data from IMDB files. Please note that individuals may conduct their own programming. There are two types of IMDB files—the yearly IMDB data files and the immigration data (for more details on IMDB files, refer to Section 3). IMDB tax variables are identified with a variable name that consists of three parts: (1) the acronym name as described in the IMDB tax data dictionary, (2) the aggregate level (I or F), and (3) the year (the four-digit year extension exists in most, but not all, cases).
Example: The interest and investment income at the individual level for 2014 would be named INVI_I2014.
Observations in the IMDB files are sorted according to a variable, IMDB_id (note that there is no year extension for this variable), which enables users to maintain a link across years. Data access takes place by means of the SAS programming language. A sample SAS program designed to access IMDB data is provided below. The samples below are created to perform the following task:
"retrieving the number of Social Assistance (SA) recipients for immigrants who landed between 2000 and 2005, living in Ontario between 2010 and 2012, and did not have any earnings appearing on their T4 slips by sex and year (2010 to 2012)"
Researchers who are new to the IMDB are encouraged to go through this sample SAS program. There are generally three components in the sample.
- Library set-up: The library assignments on the first two lines are the locations for the input files (first line) and the output files (the second line).
- Steps to generate a working dataset:
- The input files are stored in SAS format and can therefore be accessed with a SET or MERGE statement.
- This program is aimed at retrieving the number of Social Assistance (SA) recipients for immigrants who:
- landed at any time from 2000 to 2005
- lived in Ontario from 2010 to 2012
- did not have any earnings on their T4 slips
- And generate the number of SA recipients by sex and year (in this case, 2010 to 2012).
- The dataset used to produce the number of the SA recipients: The part, which starts with "proc freq," produces the numbers of interest as they are specified in the rest. At the end of the program, four tables are created from the output data file.
It is generally recommended that programs use the variables available in the PNRF rather than the yearly tax files for consistency. For example, the sample program uses the variable GENDER, a variable found in the PNRF, rather than SXCO_I&year, the variable found in the yearly IMDB_T1FF. In this program, only individuals who have filed every year from 2010 to 2012 are selected.
When programming in SAS, one should keep in mind the distinction between missing values and zeros in numeric fields. With SAS, most mathematical operations performed with missing values will return missing values. In IMDB, in years that an individual is present, numeric variables not relevant to that individual have a value of "0" (zero). For example, if a person without a spouse filed in 2010, the value for RRSPSI2010 (contributions to a spouse's RRSP) should be "0" (zero). If that individual did not file in 2010, the value will be missing.
Sample IMDB program
* Sample SAS program using the IMDB;
libname source1 '\\f8prod05\cic\1.Database'; * location of IMDB files ;
libname Out '\\f8prod05\cic\3.workarea'; * user's directory ;
* This sample program's objective is to use the IMDB to retrieve the number of Social Assistance (SA) recipients in Ontario that did not have any earnings appearing on their T4 slips, according to sex and year (in this case, 2010 to 2012). Data for provinces and earnings are from the yearly IMDB files whereas the sex variable is from the PNRF_2015. ;
* The first step is to create a datafile containing all the information that we need to produce our tables. This datafile will be called SAOnt and will be saved in the 'out' directory. The Longitudinal Identifier Number (IMDB_ID) is used to merge the annual IMDB datasets. ;
data out.SAOnt;
merge
source1.imdb_t1ff_2010(where=(prco_i2010 = 5 and outlier_ind2010 = 0) in=a keep=imdb_id prco_i2010 outlier_ind2010 saspyf2010 t4e__i2010)
source1.imdb_t1ff_2011(where=(prco_i2011 = 5 and outlier_ind2011 = 0) in=b keep= imdb_id prco_i2011 outlier_ind2011 saspyf2011 t4e__i2011)
source1.imdb_t1ff_2012(where=(prco_i2012 = 5 and outlier_ind2012 = 0) in=c keep= imdb_id prco_i2012 outlier_ind2012 saspyf2012 t4e__i2012)
source1.pnrf_2015(keep= imdb_id gender landing_year immigration_category);
by IMDB_id ;
If a and b and c and (landing_year>=2000 and landing_year<=2005);
*person must be taxfiler in all three years, not be flagged as an outlier, and must have landed between 2000 and 2005 (population of interest);
* We create a flag variable that identifies the SA recipients for each year. The result is three variables,
flag_sa2010, flag_sa2011 and flag_sa2012, taking a value of either 1 or 0.;
If (t4e__i2010=0 and saspyf2010>0) then flag_sa2010 = 1 ;
else flag_sa2010 = 0 ;
if (t4e__i2011=0 and saspyf2011>0) then flag_sa2011 = 1 ;
else flag_sa2011 = 0 ;
if (t4e__i2012=0 and saspyf2012>0) then flag_sa2012 = 1 ;
else flag_sa2012 = 0 ;
run;
* The SAS 'freq' procedure is used to produce our tables. We would also need to make sure that confidentiality guidelines standards are respected. ;
proc freq data = out.SAOnt;
tables immigration_category*flag_sa2010*flag_sa2011*flag_sa2012
gender*flag_sa2010*flag_sa2011*flag_sa2012 /missing ;
run ;
* End of the sample program;
D.2 Creating a cohort
Prior to starting an analysis, the cohort of interest needs to be defined. The cohort can be restricted by landing year, geography, or any other variable of interest (e.g., admission category or gender) according to the researcher's need. A clearly defined single cohort should be followed to allow comparability. For example, a researcher might be interested in women who landed in 2000 and who lived in a family that received social assistance in 2001 (Table 17). A study question regarding this cohort could be "What proportion of this cohort received social assistance in the following two years (2002 and 2003)?" It is worth noting that the Canada Revenue Agency (CRA) requires the spouse with the higher net income to report the social assistance payment. As a result, measurement on social assistance (SASPY_F), even for individuals, is best reported with the family-level information.
IMDB_ID | Landing year | Gender | SASPY_F2001 | SASPY_F2002 | SASPY_F2003 |
---|---|---|---|---|---|
dollars | |||||
IM583 | 2000 | Female | 20,500 | 19,000 | 14,000 |
IM145 | 2000 | Female | 3,000 | 0 | 0 |
IM548 | 2000 | Female | 11,500 | 13,800 | 0 |
IM798 | 2000 | Female | 16,000 | 18,000 | 8,000 |
IM961 | 2000 | Female | 10,000 | 0 | 0 |
IM967 | 2000 | Female | 9,500 | 0 | 0 |
IM110 | 2000 | Female | 5,000 | 2,000 | 1,000 |
IM125 | 2000 | Female | 1,000 | 0 | 200 |
Source: Statistics Canada, example from Longitudinal Immigration Database (IMDB). |
D.3 Calculating retention rates
A key strength of the IMDB is the presence of geographic variables that allow for the study of mobility and retention. No other dataset contains a comparable level of detail on taxfilers annually, especially when it comes to smaller geographies. Having annual provincial, census division (CD), census metropolitan area (CMA), census agglomeration (CA), census subdivision level (CSD), and census tract level updates allows for a broad range of analyses.
Individual mobility trajectories can be studied simply by flagging changes in postal codes, and mobility trends can be calculated by studying relocations at specific levels of geography. For example, CSD-level mobility (year-to-year changes in CSD) and provincial mobility (year-to-year changes in province) significantly vary by a number of immigrant characteristics, such as age and admission class. These geographies are derived from the postal code (IMDB variable PSCO at the individual and family levels). The postal code is a six-character alphanumeric code that locates the point of delivery of mail addressed to post office customers in Canada. See Section 3.4.1 for a description of the geography variables.
In the example below (Table 18), the researcher is interested in mobility until 2002. IM798, IM961, IM967 and IM110 could be excluded from the mobility study because data (or files) are missing.
IMDB_ID | Landing year | Destination province | PRCO 2000 | PRCO 2001 | PRCO 2002 |
---|---|---|---|---|---|
IM583 | 2000 | B.C. | B.C. | B.C. | B.C. |
IM145 | 2000 | Alta. | Alta. | Sask. | Sask. |
IM548 | 2000 | Alta. | Ont. | Ont. | Ont. |
IM798 | 2000 | Ont. | Note ..: not available for a specific reference period | Ont. | Ont. |
IM961 | 2000 | N.B. | N.B. | N.B. | Note ..: not available for a specific reference period |
IM967 | 2000 | Ont. | Note ..: not available for a specific reference period | Alta. | Ont. |
IM110 | 2000 | Note ..: not available for a specific reference period | Que. | Note ..: not available for a specific reference period | Que. |
.. not available for a specific reference period Note: PRCO is province of residence Source: Statistics Canada, example from Longitudinal Immigration Database (IMDB). |
While mobility, at the individual level, is fairly straightforward, retention of immigrants in a jurisdiction can be calculated in several ways. How retention is calculated is an analytical decision based on the individual researcher's particular needs. The number of individuals retained is fairly straightforward to define—it is the number of individuals filing taxes in the jurisdiction of interest at a given time. A decision has to be made about what constitutes the initial landing cohort about which retention is calculated (the denominator in the retention rate).
The provincial rates reported in The Daily are defined as the proportion of immigrant taxfilers who reside in the province where they landed (defined as the province of intended destination) at a given time. For a given cohort (e.g., landing year) and a given tax year (or years since landing), the denominator is the number of taxfilers with the selected province of landing. The numerator is the number of taxfilers with the selected province of landing who are also residing in the province.
For example, using CANSIM table 054-0003 to compute retention rates three years after landing for the 2011 cohort, a researcher would choose all provinces of landing (i.e., the province of intended destination), all provinces of residence, landing year = 2011, and reference year = 2014. The table would look as follows:
Province of landing | Province of residence | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Total province of residence | Newfoundland and Labrador | Prince Edward Island | Nova Scotia | New Brunswick | Quebec | Ontario | Manitoba | Saskatchewan | Alberta | British Columbia | Other residence | |
number of immigrants | ||||||||||||
Total province of landing | 174,740 | 405 | 330 | 1,365 | 880 | 31,505 | 70,590 | 9,695 | 6,120 | 26,965 | 26,390 | 500 |
Newfoundland and Labrador | 515 | 325 | 0 | 5 | 0 | 5 | 75 | 5 | 0 | 60 | 30 | 0 |
Prince Edward Island | 1,245 | 0 | 265 | 25 | 10 | 30 | 560 | 0 | 0 | 50 | 295 | 0 |
Nova Scotia | 1,460 | 10 | 5 | 1,080 | 10 | 25 | 185 | 0 | 5 | 90 | 30 | 10 |
New Brunswick | 1,340 | 0 | 10 | 35 | 750 | 55 | 275 | 0 | 10 | 80 | 120 | 0 |
Quebec | 36,275 | 10 | 10 | 35 | 15 | 30,200 | 3,255 | 40 | 75 | 1,190 | 1,400 | 45 |
Ontario | 69,135 | 35 | 25 | 115 | 70 | 875 | 63,145 | 275 | 335 | 2,815 | 1,325 | 115 |
Manitoba | 11,190 | 0 | 0 | 15 | 0 | 55 | 645 | 9,170 | 80 | 825 | 380 | 10 |
Saskatchewan | 6,360 | 0 | 0 | 0 | 0 | 20 | 295 | 45 | 5,370 | 445 | 165 | 10 |
Alberta | 21,940 | 10 | 0 | 20 | 0 | 95 | 810 | 65 | 140 | 20,170 | 590 | 35 |
British Columbia | 25,000 | 5 | 0 | 30 | 5 | 140 | 1,330 | 85 | 100 | 1,200 | 22,030 | 70 |
Other | 280 | 0 | 0 | 0 | 0 | 0 | 15 | 0 | 0 | 35 | 20 | 200 |
Source: Statistics Canada, CANSIM table 054-0003. |
Results for Nova Scotia shed some light on the matter. A total of 1,460 individuals landed in Nova Scotia in 2011 and filed taxes in 2014. Of those, 1,080 had Nova Scotia as their province of residence in 2014. Nova Scotia's three-year retention rate would be 1,080/1,460, or about 74%. The CANSIM table also provides information on secondary migrantsNote 1—1,365 individuals who landed in 2011 resided in Nova Scotia in 2014, of which 1,080 intended to land in Nova Scotia, and 285 had a destination province other than Nova Scotia.
The above definition of retention assumes that the number of taxfilers with the specific province of intended destination is the total population that can be retained in a year (i.e., if all 1,460 individuals who had intended to land in Nova Scotia had filed taxes there in 2014, the province would have 100% retention). This method does not take into account late or sporadic tax filing behaviour. While the total population in the 2014 tax year for Nova Scotia is 1,460, in the 2013 tax year 1,425 immigrants who intended to land in Nova Scotia filed taxes.
One alternative is a purely longitudinal approach, where a single landing cohort is selected (according to the province of intended destination, the province of initial tax filing, or both), and the retention rate is calculated as the proportion of this cohort that is still filing taxes in the province. When the province of initial tax filing is used to define the landing cohort, it is recommended that the first tax file occur in the year the immigrants were admitted (landing year = tax year), to exclude individuals who may have first arrived elsewhere and subsequently migrated to the region before filing taxes for the first time. A further restriction can be made if a researcher is interested in the population whose destination geography matches the geography of the first tax file.
Given that a portion of each annual cohort do not file taxes for their year of landing, it may be necessary to increase the population size for a region by defining the landing cohort as anyone who first filed taxes in the region within two years of landing (i.e., first_tax_year = landing_year or landing_year+1). Allowing individuals whose first tax filing occurred several years after landing to be part of a "landing cohort" is not recommended, as it is possible that they first landed elsewhere but did not file taxes. It is also a good idea to exclude intermittent filers from these analyses, as their place of residence is unknown in the years for which there is no tax data. Retention calculated this way will show a gradual decline in numbers; this decline is due to immigrants who stop filing, out-migration, and death.
If researchers are interested in secondary migrants to a region, this can be found by removing individuals in the defined landing cohort from the total number of immigrants filing taxes in the region at the time of interest. Again, however, these analyses should be restricted to individuals who first filed taxes within the same time period (year 0 or year 1) to avoid mistaking late-filers for in-migrants. If the landing cohort is restricted to immigrants whose destination geography matches the geography of first tax filing, a subsequent distinction should be made between secondary migrants who first filed elsewhere (and subsequently filed in the region of interest) and immigrants who first filed in the region of interest but were subsequently recruited by other jurisdictions (or information on their intended destination is missing altogether).
The following table presents an example of a longitudinal approach to provincial retention using fictitious data, with various definitions of the initial landing cohort.
Years since landing | Taxfilers who first filed taxes in B.C. in year 0 | Retention rate | Taxfilers who first filed taxes in B.C. in year 0 or 1 | Retention rate | Taxfilers who first filed taxes in B.C. in year 0 or 1 and province of intended destination was B.C. | Retention rate |
---|---|---|---|---|---|---|
number | percent | number | percent | number | percent | |
0 | 20,000 | 100 | 20,000 | Note ...: not applicable | 17,500 | Note ...: not applicable |
1 | 18,000 | 90 | 25,000 | 100 | 19,000 | 100 |
2 | 17,000 | 85 | 23,000 | 92 | 18,000 | 95 |
3 | 16,500 | 83 | 22,000 | 88 | 17,500 | 92 |
... not applicable Source: Statistics Canada, example from the Longitudinal Immigration Database. |
In the above example, retention in British Columbia can be calculated according to three definitions of the population, and the three-year retention rate varies per the definition adhered to. Importantly, all individuals in the sample filed taxes at each point in time.
Finally, analysts should use caution when studying low-level census geographies over a long period of time, as CA and CMA boundaries change and CSDs are dropped and added. If possible, analysts should run the Postal Code Conversion File (PCCF+) program to standardize postal codes to a constant census geography.
D.4 Calculating income trajectories over time
As is the case with retention, calculating year-to-year changes in wages, salaries and commissions earnings (or, for that matter, any economic variable) requires consecutive information. For example, if a researcher wants to compare the median wages, salaries and commissions earnings of the 2000 cohort of women aged 24 to 54, 1 year after landing and 5 years since landing (Table 21), records with missing T1FF files could be removed from the analysis. The decision to remove these records would be based on the desire to evaluate the cohort's median income versus the cohort filer's median income.
IMDB_ID | Landing year | Age at landing | Gender | Wages income 2001 | Wages income 2005 |
---|---|---|---|---|---|
dollars | |||||
IM583 | 2000 | 34 | Female | 20,500 | 49,000 |
IM145 | 2000 | 53 | Female | Note ..: not available for a specific reference period | 56,000 |
IM548 | 2000 | 29 | Female | 11,500 | 33,800 |
IM798 | 2000 | 31 | Female | 36,000 | 0 |
IM961 | 2000 | 42 | Female | 10,000 | Note ..: not available for a specific reference period |
IM967 | 2000 | 40 | Female | Note ..: not available for a specific reference period | Note ..: not available for a specific reference period |
IM110 | 2000 | 35 | Female | 0 | 59,000 |
.. not available for a specific reference period Source: Statistics Canada, example from Longitudinal Immigration Database. |
Use caution when calculating the "first year in Canada" income as it might not represent a full year of taxation. For example, someone who landed in November of 2013 and filed taxes for 2013 would have only two months of income in 2013. A best practice is to use the first full year of income (landing year +1, see Table 19). One exception is pre-filers, those who filed taxes in Canada before landing and filed at landing year as well, are most likely reporting income for the entire year.
Over-time income should also be studied in constant dollars. Consequently, Consumer Price Index (CPI) adjustments should be made (Appendix D.7). This adjustment is made in the IMDB CANSIM tables.
D.5 Rounding data
Respecting the privacy of Canadians is important to Statistics Canada. Consequently, any tables produced from IMDB_TIFF files are subject to rounding. The purpose of rounding is to ensure that no small cells are released that may reveal information on specific individuals or small groups of individuals. In general, the macros will take an unrounded input dataset of various statistics (counts, means, medians, etc.) and output a rounded dataset.
The rounding rules are confidential, but the rounding macros are available to all researchers. Documentation describing how to use the macros is available. These macros are applied to the output tables of all researchers, to all external data requests, and to the released CANSIM tables.
D.6 Identifying outliers
The variable OUTLIER_IND was created to identify outliers within the T1FF (see Section 5.5). It should be used to remove outlier data from any calculation (e.g., mean, median, or regression) employing tax data. Outliers differ from one year to another, meaning that a person's data may be identified as an outlier for a given year but not for a subsequent year.
The following table gives the distribution of the outliers in the tax files for 1982 and subsequent years by type of resident for the 2015 IMDB. Less than 0.1% records were identified as outliers per tax year. The proportion of outliers increased from 1995 to 1996 as a result of updates to the outlier detection method applied to tax files for 1997 and subsequent taxation years.
PR | PR with NPR permit | Total | ||
---|---|---|---|---|
number | percent | |||
1982 | 10 | 10 | 20 | 0.01 |
1983 | 50 | 10 | 60 | 0.02 |
1984 | 70 | 20 | 90 | 0.03 |
1985 | 30 | 0 | 40 | 0.01 |
1986 | 50 | 10 | 60 | 0.01 |
1987 | 60 | 10 | 70 | 0.01 |
1988 | 80 | 20 | 100 | 0.01 |
1989 | 70 | 20 | 90 | 0.01 |
1990 | 60 | 20 | 80 | 0.01 |
1991 | 70 | 30 | 100 | 0.01 |
1992 | 120 | 40 | 160 | 0.01 |
1993 | 150 | 70 | 220 | 0.01 |
1994 | 60 | 20 | 90 | 0.01 |
1995 | 150 | 80 | 240 | 0.01 |
1996 | 280 | 180 | 450 | 0.02 |
1997 | 340 | 230 | 570 | 0.03 |
1998 | 460 | 250 | 710 | 0.03 |
1999 | 380 | 230 | 610 | 0.03 |
2000 | 460 | 270 | 730 | 0.03 |
2001 | 450 | 260 | 700 | 0.03 |
2002 | 500 | 270 | 770 | 0.03 |
2003 | 440 | 240 | 670 | 0.02 |
2004 | 530 | 280 | 800 | 0.02 |
2005 | 550 | 310 | 850 | 0.02 |
2006 | 600 | 320 | 910 | 0.03 |
2007 | 580 | 340 | 920 | 0.02 |
2008 | 600 | 400 | 990 | 0.02 |
2009 | 720 | 380 | 1,100 | 0.03 |
2010 | 680 | 420 | 1,090 | 0.02 |
2011 | 860 | 490 | 1,350 | 0.03 |
2012 | 620 | 390 | 1,020 | 0.02 |
2013 | 780 | 470 | 1,250 | 0.03 |
2014 | 760 | 410 | 1,170 | 0.02 |
2015 | 750 | 390 | 1,140 | 0.02 |
Notes: PR: Permanent resident; NPR: Non-permanent resident. Source: Statistics Canada, 2015 Longitudinal Immigration Database. |
D.7 Adjusting income for the Consumer Price Index (CPI)
In order to take into account the cost of living, all incomes should be adjusted to the Consumer Price Index (CPI) for Canada. "The Consumer Price Index (CPI) is an indicator of changes in consumer prices experienced by Canadians. It is obtained by comparing, over time, the cost of a fixed basket of goods and services purchased by consumers. Since the basket contains goods and services of unchanging or equivalent quantity and quality, the index reflects only pure price change."Note 2 The adjustment factors for 2015 are available in Table 23. To transform data to constant dollars of a specific year (base year), data users need to multiply the dollar values in all but the base year by a year-specific adjustment factor. To obtain the adjustment factors, data users need to divide the CPI of the base year by the CPI of the specific year. In Table 23, the base year is 2015.
Year | 2015 consumer price index adjustment equals 126.6 divided by |
---|---|
number | |
1982 | 54.9 |
1983 | 58.1 |
1984 | 60.6 |
1985 | 63.0 |
1986 | 65.6 |
1987 | 68.5 |
1988 | 71.2 |
1989 | 74.8 |
1990 | 78.4 |
1991 | 82.8 |
1992 | 84.0 |
1993 | 85.6 |
1994 | 85.7 |
1995 | 87.6 |
1996 | 88.9 |
1997 | 90.4 |
1998 | 91.3 |
1999 | 92.9 |
2000 | 95.4 |
2001 | 97.8 |
2002 | 100.0 |
2003 | 102.8 |
2004 | 104.7 |
2005 | 107.0 |
2006 | 109.1 |
2007 | 111.5 |
2008 | 114.1 |
2009 | 114.4 |
2010 | 116.5 |
2011 | 119.9 |
2012 | 121.7 |
2013 | 122.8 |
2014 | 125.2 |
2015 | 126.6 |
Source: Statistics Canada, CANSIM table 326-0021. |
D.8 Calculating key income measures
The IMDB CANSIM tables contain several income measures. Table 22 describes which variables of the T1FF are included in their calculation.
Measure | Components | Formula |
---|---|---|
Wages, salaries and commissions income | Earnings from T4 slips; other employment income | T4E__i + OEI__i |
Self-employment income | ||
Since 1988 | Self-employment income from business, profession, commission, farm, and fishing; limited partnership income | SEI__i + LTPI_i |
Before 1988 | Self-employment income from business, profession, commission, farm, and fishing | SEI__i |
Investment income | ||
1982 to 1987 | Interest and investment income; dividends; capital gains/losses, net taxable | INVi_i + XDIV_i + (CLKGLi * 2) |
1988 and 1989 | Interest and investment income; dividends; capital gains/losses, net taxable | INVi_i + XDIV_i + (CLKGLi * 3/2) |
1990 to 1999 | Interest and investment income; dividends; capital gains/losses, net taxable | INVi_i + XDIV_i + (CLKGLi * 4/3) |
2000 | Interest and investment income; dividends; capital gains/losses, net taxable | INVi_i + XDIV_i + (CLKGLi * 100/64.58) |
Since 2000 | Interest and investment income; dividends; capital gains/losses, net taxable | INVi_i + XDIV_i + (CLKGLi * 2) |
Employment Insurance benefits | Employment Insurance benefits | EINS_i |
Social welfare benefits | Social welfare benefits (use family-level) | SASPYf |
Total income | Sum of all measures described above | |
Source: Statistics Canada, 2015 Longitudinal Immigration Database, CANSIM table processing. |
It is to be noted that all outliers are removed from these calculations (Outlier_ind=1), that the variable Province of Residence at the End of the Year (PRCO_) is used to identify the province, and that all incomes are adjusted according to the Consumer Price Index (CPI) of the year of the most recent T1FF available. "Mean with income" is the mean income of immigrant tax-filers with income of the given type. "Median with income" is the median income of immigrant tax-filers with income of the given type.
Notes
- Date modified: