Longitudinal Administrative Data Dictionary, 2017

Release date: September 24, 2019 Correction date: November 8, 2019

Correction Notice

Corrections have been made to this product.
Please take note of the following changes:

November 8, 2019

In Section 10, the median for “After Tax Income - StatCan definition” (AFTAX), and the median and aggregate figures for “Tax, net provincial calculated” (NPTXC) in 2017, have been updated.

Skip to text

Text begins

1 Introduction

The Longitudinal Administrative Databank (LAD) is a subset of the T1 Family File (T1FF). The T1FF is a yearly cross-sectional file of all taxfilers and their families. Census families are created from information provided annually to the Canada Revenue Agency in personal income tax returns. Both legal and common law spouses are attached by the spousal Social Insurance Number (SIN) listed on the tax form, or by matching based on name, address, age, sex, and marital status. Children are identified through a similar algorithm and supplementary files. Prior to 1993, non-filing children were identified from information on their parents’ tax form. Information from the Family Allowance Program was used to assist in the identification of children. Since 1993, information from the Child Tax Benefit Program has been used for this purpose.

The LAD is a random, 20% sample of the T1FF. Selection for LAD is based on an individual’s SIN. There is no age restriction, but people without a SIN can only be included in the family component. Once a person is selected for the LAD, the individual remains in the sample and is picked up each year from the T1FF if he or she appears on the T1 that year. Individuals selected for the LAD are linked across years by a unique LAD identification number (LIN__I) generated from the SIN, to create a longitudinal profile of each individual. The LAD is augmented up each year with a sample of new taxfilers so that it consists of approximately 20% of taxfilers for every year. The 20% sample has grown over the years: 3.2 million people in 1982, 4.05 million in 1992, 4.7 million in 2002 and 5.3 million in 2012. This growth reflects increases in the Canadian population and increases in the incidence of tax filing as a result of the introduction of the Federal sales tax credit in 1986 and the Goods and Services Tax credit in 1989.

The LAD is organized into four levels of aggregation, namely the individual, spouse/parent, family, and child levels. The databank contains information on demographics, income, and other taxation data at the different levels of aggregation from 1982-2017, with new years of data being added as the information becomes available. Changes in tax legislation and in the design of the T1 form itself have resulted in some variables not being available for all years as well as some minor definitional changes from one year to the next.

The LAD also obtains information through microdata linkages to other administrative data sources including Tax Free Savings Account (TFSA) information, private corporation ownership information from Schedule 50 of the T2 tax form, and immigration information from the Landing file administrative data. In addition, a linking key resides on the Longitudinal Immigration Database (IMDB) – a database containing immigration records from 1980 to present – which allows for research to be conducted using a linked IMDB-LAD database. All microdata linkages have been approved by the relevant Statistics Canada management and privacy bodies. Further information is available at http://www.statcan.gc.ca.

The LAD has been designed to serve as a research tool from which custom tabulations can be prepared. This dictionary, in turn, has been created to assist researchers in identifying the type of information that is available from the LAD. It identifies and defines the LAD variables including historical changes.

2 Confidentiality

Statistics Canada protects the confidentiality of individuals’ tax data. Only aggregated information that conforms to the confidentiality provision of the Statistics Act is released. The LAD resides within Statistics Canada and all retrievals are done on site. Only employees of Statistics Canada can access such data directly. More information on the confidentiality procedures can be obtained from Client Services.

3 Geography

Data from the LAD are available for various levels of geography including Canada, provinces/territories, and regions (such as Census Division (CD), Census Metropolitan Area /Census Agglomeration (CMA/CA), Census sub-division (CSD) and Census Tracts (CT), etc.). Many other levels of geography are not included on the main LAD database, for example Economic Region (ER) and Federal Electoral District (FED); however these may be available in the LAD using the Postal Code Conversion File. Note that geography classifications on the LAD are based on converting postal code areas to other geographic boundaries.

4 Dictionary format and contents

Outlined below is a brief description of the next eight sections of the LAD Dictionary.

The LAD register (Section 5) is a file that is used in conjunction with the yearly LAD files. The Register outlines the years that an individual is on the LAD and provides information on the taxfiler’s sex, year of birth, and year of death. This section provides a brief description of this file and describes how it can be used to enhance LAD data analysis.

The Programming tips section (Section 6) provides information on writing programs for LAD retrievals. This information will assist those individuals who want to better access data from LAD files using the effective programming structure.

The Design of LAD variable acronyms (Section 7) is a description of the variable acronym structure. It provides insight into how to interpret the variable acronyms and information on the aggregation levels.

The What’s New section (Section 8) is a description of changes to the LAD database since the previous LAD release. It also provides a list of the new variables added to the LAD for the present income year. These new variables may also be available for previous years. Users are encouraged to check each new variable to determine the years available.

The LAD variable definitions (Section 9), typologically lists each variable by name. In addition, the following information is provided for each variable:

The Variable counts and amounts for individuals (section 10), outlines, for many variables, at the individual aggregate level, the count of individuals and the dollar amounts reported for the two most recent years of LAD data. Persons included in these counts and amounts are those who have been selected into the LAD sample.

The Definition of total income variables (Section 11) identifies and defines total income variables and highlights historical changes. Also provided are tables that outline and compare the variables that comprise market income and the Canada Revenue Agency’s (CRA) and Income Statistics Division’s (ISD) definitions of total income.

The tables outlined in this section are the following:

Finally, How to obtain more information on the inside cover provides information on how to contact us by telephone, mail, fax, or e-mail from across Canada.

5 LAD register

The LAD register is a companion data file to the yearly LAD files. It contains a selected number of variables for all individuals who are present at any time in the LAD. These variables have characteristics that should remain constant over time and thus may not be identified in a particular yearly file. A new LAD register is created every year with the addition of a new LAD yearly file from taxfiler information provided from living or deceased taxfilers and imputed individuals. Thus, the current register contains the most up-to-date information on individuals present in the LAD. On rare occasions, new information on individuals may differ from that on the existing file. In these instances, current information supersedes information in the existing LAD register.

The LAD register is a quick reference tool that can provide basic data without accessing the yearly files. For example, information such as the number of individuals in the LAD by age and sex in a given year can be tabulated directly from the register. Further, the LAD register can be employed in conjunction with the yearly files. In particular, it is recommended that the age of an individual be calculated from the register’s information on the year of birth rather than relying on the age information in the yearly files in order to ensure that it is consistent across years.

Following is a list of the variables that can be found on the register:

6 Programming tips

This section provides programming information for individuals who want to have a better understanding of the programming structure used to access data from the LAD files. Please note that individuals may undertake their own programming, however, only a small staff within Statistics Canada can carry out these retrievals. Access to the LAD files is restricted to protect the confidentiality of an individual’s tax data and any data that are made available will be screened through a set of rules designed to prevent disclosure.

There are two types of LAD files— the yearly LAD data files and the LAD register (for more details on the LAD register, refer to section 5, LAD register). LAD variables are identified with a variable name that consists of three parts: 1) the acronym name, 2) the aggregate level, and 3) the year (the four-digit year extension exists in most, but not all cases). Observations in the LAD files are sorted by a variable, named lin__i (note that there is no year extension for this variable), which enables users to maintain a link across years.

Data access is undertaken with SAS programming language. The next page contains a sample SAS program designed to access LAD data. The library assignments on the first three lines are the locations for the input files (first two lines) and the output files (the third line). The input files are in SAS format and can therefore be accessed with a SET or MERGE statement. This 20% sample based program is aimed at retrieving the number of Social Assistance (SA) recipients in Ontario that did not have any earnings appearing on their T4 slips, according to sex and year (in this case, 2000 to 2002). It is generally recommended that programs use the variables available in the register rather than the yearly files because the register information contains the most recent data. For example, the following program uses sxco_i, a variable found in the register, rather than sxco_i&yr, the variable found in the yearly LAD files. The flag_i&yr variables in the register are useful to identify individuals who have filed in a given year. In this program, only individuals who have filed every year from 2000 to 2002 are selected. At the end of the program, four tables are created from the output data file. Note that for confidentiality purposes, the weight variables wgt__i (with the LAD 10% sample) or wgt2_i (with the LAD 20% sample) must be used whenever a SAS procedure such as FREQ or LOGISTIC is invoked.

When programming in SAS, it is important to keep in mind the distinction between missing values and zeros in numeric fields. With SAS, most mathematical operations undertaken with missing values will return missing values. In LAD, in years that an individual is present, numeric variables not relevant to that individual have a value of zero. For example, if a non-family person has filed in 2000, then the value for RRSPSI2000 (contributions to a spouse’s RRSP) should be zero. If that individual has not filed in 2000, then the value will be missing. Thus, as a safety precaution, it is suggested that all numeric variables to be used in mathematical expressions be initialized to zero if missing, before using them.

Sample LAD program

* Sample SAS program using the LAD;

libname source1 ‘/LADdata/data1;          * first 10% sample ;
libname source2 ‘/LADdata/data2;          * second 10% sample ;
libname Out ‘/LADuser/xxxx/data’;          * user’s directory ;

* This sample program’s objective is to use the 20% LAD to retrieve the number of Social Assistance (SA) recipients in Ontario that did not have any earnings appearing on their T4 slips, according to sex and year (in this case, 2000 to 2002). Data for provinces and earnings are from the yearly LAD files whereas the sex variable is from the 2002 LAD register.

* The first step is to create a datafile containing all the information that we need to produce our tables. This datafile will be called SAOnt and will be saved in the ‘out’ directory. The Longitudinal Identifier Number (LIN__I) is used to merge the annual LAD datasets. ;

data out.SAOnt;
merge
source1.lad2000(where=(prco_i2000 = 5) keep=lin__i  prco_i2000 saspyi2000 t4e__i2000)
source2.lad2000(where=(prco_i2000 = 5) keep=lin__i  prco_i2000 saspyi2000 t4e__i2000)
source1.lad2001(where=(prco_i2001 = 5) keep=lin__i prco_i2001 saspyi2001 t4e__i2001)
source2.lad2001(where=(prco_i2001 = 5) keep=lin__i  prco_i2001 saspyi2001 t4e__i2001)
source1.lad2002(where=(prco_i2002 = 5) keep=lin__i  prco_i2002 saspyi2002 t4e__i2002)
source2.lad2002(where=(prco_i2002 = 5) keep=lin__i  prco_i2002 saspyi2002 t4e__i2002)
source1.reg2002(keep=lin__i sxco_i flag_i2000-flag_i2002 wgt2_i)
source2. reg2002(keep=lin__i sxco_i flag_i2000-flag_i2002 wgt2_i);

by lin__i ;

If flag_i2000=1 and flag_i2001=1 and flag_i2002=1; *person must be taxfiler in all 3 years;

* We create a flag variable that identifies the SA recipients for each year. The result is three variables, flag_sa2000, flag_sa2001 and flag_sa2002, taking a value of either 1 or 0.

If (t4e__i2000=0 and saspyi2000>0) then flag_sa2000 = 1 ;
          else flag_sa2000 = 0 ;
if (t4e__i2001=0 and saspyi2001>0)  then flag_sa2001 = 1 ;
          else flag_sa2001 = 0 ;
if (t4e__i2002=0 and saspyi2002>0) then flag_sa2002 = 1 ;
          else flag_sa2002 = 0 ;

run ;

* The SAS ‘freq’ procedure is used to produce our tables. We would also need to make sure that confidentiality guidelines standards are respected. ;

proc freq data = out.SAOnt;

          tables sxco_i*flag_sa2000*flag_sa2001*flag_sa2002 /missing;
          weight wgt2_i ;

run ;

* End of the sample program ;

7 Design of LAD variable acronyms

Most LAD variables have a ten-character acronym. Each acronym consists of three parts, namely the variable name (five characters), the aggregate level (one character), and the calendar year (four characters), e.g. XTIRCI2000.

The variable name is the principal component of the acronym. The characters identify the type of information provided by the variable (see section 9 “LAD Variable Definitions”).

The one-character aggregate level character provides information on individuals of the census family according to the designated level of aggregation. There are four possibilities, namely ‘I’, ‘P’, ‘F’, and ‘K’ representing individual, parents, family and children (kids) respectively. The family types outlined in these aggregate levels refer to the status of the family at the end of the tax year. Following are details about each of these aggregate levels:

The four-characters for the calendar year, identifies the year to which the variable is associated. The LAD data are stored in separate files for each calendar year; therefore all variables in a particular year file will have the same four-character calendar year reference. The only exception in the yearly files is the variable LIN__I, the LAD individual identification number, which is available for each observation present in each year file, but does not have a calendar year as part of the acronym (note that there is also a variable for spousal LIN (LIN__PyyyyNote ) which does have the year extension as part of the acronym name). In the register file, the exceptions to the four character year are LIN__I, SXCO_I, YOB__I, YOD__I, LNDYRI, TTNFLI and IMMFLI, which are the individual’s LIN, sex, year of birth, year of death, landing year, temporary SIN flag, and immigrant flag, respectively.

8 What’s New – LAD 2017

There have been a number of changes and improvements to the LAD and to the LAD data dictionary since the release of the 2016 LAD.

Important changes to Immigration Variables linked to the LAD

Starting in 2017, users wishing to compare LAD with the Longitudinal Immigrant Database (IMDB) will do so by using a linking key variable which will be available on the IMDB. This new linking key will allow researchers to access the full range of IMDB variables which can be linked to the LAD. As a consequence, the LAD will no longer be carrying the existing 13 immigration variables which were previously linked to the LAD from the IMDB. Landing Year (LNDYRI) and Immigration Flag (IMMFLI) information are still going to be available on the LAD, however these variables will now be independently constructed by LAD using the same administrative files as are used by the IMDB. These updated variables (LNDYRI and IMMFLI) are entirely consistent across time and will replace the current versions of these two variables for all LAD years. Note that the new variables will no longer align with the corresponding IMDB variables and consequently earlier versions of the LAD. However both versions can be analyzed and compared. These changes will allow the increasing number of LAD users to obtain timely, extensive, and accurate LAD data while also gaining full access to a wider range of IMDB variables.

Modified variables

We have made some corrections and modifications to the LAD database since the 2016 release.

The caregiver amount variables, Additional personal exemptions (APXMP) and Caregiver Amount (CAREG), were eliminated for the tax year 2017. These have been replaced by two new caregiver amount variables (see New Variables below).

The federal education and textbook amounts were eliminated for tax year 2017. This affects variables
EDUPT, EDUDC, and EDUDN. Those researchers wishing to know the tenure of a taxfiler in post-secondary education will find three new variables have been added (see New Variables below). Previous education and textbook amounts can be carried forward and deducted as part of the tuition deduction variable (TUTDN).

As of July 2016, the Canada child benefit (CCB) has replaced the Canada child tax benefit (CCTB), the national child benefit supplement (NCBS), and the universal child care benefit (UCCB). The CCB is a tax-free payment. The amounts for the new CCB program can be found in the existing UCCB_ variable.

New variables

Several new variables have been added to the LAD database. More information regarding these variables can be found in the data description section.

As noted above, changes to the funding for caregiver amount means that two new variables (CCCAMC_ and CCCODC_) are being introduced in 2017. These variables replace the two older exemption variables (APXMP and CAREG). Users wishing to produce estimates of caregiver amounts over time are advised to examine the details of both the older and the newer programs.

As well, a group of variables have been added relating to post-secondary education enrollment status as reported by the taxfiler. These self-reported amounts provide the number of months a taxfiler is in post-secondary education full-time or part-time, or if the taxfiler will be considered as full-time status based on disability status, during a calendar year (see NMTFLTSE_ , NMTPRTSE_ , and PTSTUDIS_).

A new variable Available Contribution Room (TFSAACR_) is being added to the Tax Free Savings Account section. This variable provides a measure of the amount of available contribution room that a taxfiler has for their tax free savings account at the beginning of the relevant reporting year.

We have also added variables to provide further information relating to the age of taxfilers or their children. A new variable which identifies the month of birth of the taxfiler (MOB__) has been added to the Register files. This new numerical month of birth variable may be useful for those researchers wishing to observe particular governmental programs and subsidies that rely on age eligibility rules. As well, Children’s birthdate (BRDT_) is a new variable which has been added to the Kid files providing the full birth date of children from the taxfiler’s family, available for the years 1985 to 2017. Due to data limitations, information for earlier periods is not available.

The table below lists the variable names and descriptions for the new additions to the 2017 LAD, with a fuller explanation provided in the main variable definition section.

Table 1
New variables available on the LAD as of income year 2017
Table summary
This table displays the results of New variables available on the LAD as of income year 2017. The information is grouped by New variables (appearing as row headers), Years Available (appearing as column headers).
New variables Years Available
Canada Caregiver Credit Amount for Spouse or Common-Law Partner, 2017
or Eligible Dependant Age 18 and Older (CCCAMC_) Canada Caregiver Credit Amount for Other Dependant Age 18 and Older (CCCODC_) 2017
Number of Months of Full-Time School Enrollment (NMTFLTSE_) 2017
Number of Months of Part-Time School Enrollment (NMTPRTSE_) 2017
Part-Time Student is Considered Full-Time Due to the Individual’s Disability Status (PTSTUDIS_) 2017
Tax-Free Savings Account Available Contribution Room (TFSAACR_) 2012-2017
Month of Birth (MOB__) 1982-2017
Children’s date of birth (BRDT_) 1985-2017

9 LAD variable definitions

10 Selected income variable counts and medians for individuals, 2016 to 2017

11 Definition of total income variables

This section specifies the exact definitions of the three measures of total income that are available on the LAD, which are:

The first measure of total income is TIRC, which is the Canada Revenue Agency Taxation definition of total income as per the T1 form. The second measure, XTIRC, has been derived by the Small Area and Administrative Data Division of Statistics Canada as a more appropriate measure for statistical analysis. The components of income that are included in XTIRC are generally described in Table 1, Components of XTIRC in 2017, while the details are given in Table 5, Definition of XTIRC, 1982 to 2017.

The largest difference between XTIRC and TIRC occurs from 1986 onward because non-Taxable income is added to XTIRC. In 1986, the Government of Canada introduced the Federal Sales Tax (FST) Credit directed at the low-income population. In order to determine eligibility for the FST Credit, filers had to report their
non-Taxable income. This was defined as Social Assistance payments, Guaranteed Income Supplement (GIS), Spouse’s Allowance (SPA), and Workers’ compensation payments. As a result of adding non-Taxable income to XTIRC in 1986, the user is cautioned in comparing pre-1986 values of XTIRC with later values. For example, an increase in XTIRC from 1985 to 1986 may simply reflect the reporting of non-Taxable income on the 1986 T1 form but not on the 1985 T1, i.e. perhaps no increase in income occurred.

Other new differences are the exclusion of RRSP income for people who are less than 65 years old and the inclusion of Indian exempt employment income to TIRC.

Another difference between TIRC and XTIRC is that capital gains are included in the former but not in the latter. The remaining differences are detailed in Table 4, Differences between TIRC and XTIRC.

The third measure of total income available from LAD is market income (MKINC). MKINC is derived from XTIRC by removing government transfer payments. The components of MKINC are generally described in Table 2, Components of MKINC, 1982 to 2017, while Table 6, Definition of MKINC, 1982 to 2017, gives the detailed derivation.

Besides the change to XTIRC in 1986 due to the addition of sales tax credits, changes in tax legislation and in the content of the T1 form itself have resulted in differences in the availability of the components of total income. The trend has been towards greater availability. For example, in 1992, the components of non-Taxable income are reported separately on the T1 form, adding three variables to the LAD: NFSL, denoting net federal supplements (GIS and SPA), WKCPY, denoting Workers’ compensation payments, and SASPY denoting social assistance payments. From 1986 to 1991, only the total of these three payments was reported. A history of the changes in XTIRC is given in Table 3, History of Components of XTIRC.

In summary, this part of the LAD Dictionary specifies the components of TIRC, XTIRC, and MKINC for each year of LAD from 1982 to 2017 via:

 
Date modified: