Longitudinal Administrative Data Dictionary, 2019

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Release date: Novermber 12, 2021

Skip to text

Text begins

1 Introduction

The Longitudinal Administrative Databank (LAD) is a subset of the T1 Family File (T1FF). The T1FF is a yearly cross-sectional file of all taxfilers and their families. Census families are created from information provided annually to the Canada Revenue Agency in personal income tax returns. Both legal and common law spouses are attached by the spousal Social Insurance Number (SIN) listed on the tax form, or by matching based on name, address, age, sex, and marital status. Children are identified through a similar algorithm and supplementary files. Prior to 1993, non-filing children were identified from information on their parents’ tax form. Information from the Family Allowance Program was used to assist in the identification of children. Since 1993, information from the Child Tax Benefit Program has been used for this purpose.

The LAD is a random, 20% sample of the T1FF. Selection for LAD is based on an individual’s SIN. There is no age restriction, but people without a SIN can only be included in the family component. Once a person is selected for the LAD, the individual remains in the sample and is picked up each year from the T1FF if he or she appears on the T1 that year. Individuals selected for the LAD are linked across years by a unique LAD identification number (LIN__I) generated from the SIN, to create a longitudinal profile of each individual. The LAD is augmented up each year with a sample of new taxfilers so that it consists of approximately 20% of taxfilers for every year. The 20% sample has grown over the years: 3.2 million people in 1982, 4.05 million in 1992, 4.7 million in 2002 and 5.3 million in 2012. This growth reflects increases in the Canadian population and increases in the incidence of tax filing as a result of the introduction of the Federal sales tax credit in 1986 and the Goods and Services Tax credit in 1989.

The LAD is organized into four levels of aggregation, namely the individual, spouse/parent, family, and child levels. The databank contains information on demographics, income, and other taxation data at the different levels of aggregation from 1982-2019, with new years of data being added as the information becomes available. Changes in tax legislation and in the design of the T1 form itself have resulted in some variables not being available for all years as well as some minor definitional changes from one year to the next.

The LAD also obtains information through microdata linkages to other administrative data sources including Tax Free Savings Account (TFSA) information, private corporation ownership information from Schedule 50 of the T2 tax form, and immigration information from the Landing file administrative data. In addition, a linking key resides on the Longitudinal Immigration Database (IMDB) – a database containing immigration records from 1980 to present – which allows for research to be conducted using a linked IMDB-LAD database. All microdata linkages have been approved by the relevant Statistics Canada management and privacy bodies. Further information is available at http://www.statcan.gc.ca.

The LAD has been designed to serve as a research tool from which custom tabulations can be prepared. This dictionary, in turn, has been created to assist researchers in identifying the type of information that is available from the LAD. It identifies and defines the LAD variables including historical changes.

2 Confidentiality

Statistics Canada protects the confidentiality of individuals’ tax data. Only aggregated information that conforms to the confidentiality provision of the Statistics Act is released. The LAD resides within Statistics Canada and all retrievals are done on site. Only employees of Statistics Canada can access such data directly. More information on the confidentiality procedures can be obtained from Client Services.

3 Geography

Data from the LAD are available for various levels of geography including Canada, provinces/territories, and regions (such as Census Division (CD), Census Metropolitan Area /Census Agglomeration (CMA/CA), Census sub-division (CSD) and Census Tracts (CT), etc.). Many other levels of geography are not included on the main LAD database, for example Economic Region (ER) and Federal Electoral District (FED); however these may be available in the LAD using the Postal Code Conversion File. Note that geography classifications on the LAD are based on converting postal code areas to other geographic boundaries.

4 Dictionary format and contents

Outlined below is a brief description of the next eight sections of the LAD Dictionary.

The LAD register (Section 5) is a file that is used in conjunction with the yearly LAD files. The Register outlines the years that an individual is on the LAD and provides information on the taxfiler’s sex, year of birth, and year of death. This section provides a brief description of this file and describes how it can be used to enhance LAD data analysis.

The Programming tips section (Section 6) provides information on writing programs for LAD retrievals. This information will assist those individuals who want to better access data from LAD files using the effective programming structure.

The Design of LAD variable acronyms (Section 7) is a description of the variable acronym structure. It provides insight into how to interpret the variable acronyms and information on the aggregation levels.

The What’s New section (Section 8) is a description of changes to the LAD database since the previous LAD release. It also provides a list of the new variables added to the LAD for the present income year. These new variables may also be available for previous years. Users are encouraged to check each new variable to determine the years available.

The LAD variable definitions (Section 9), typologically lists each variable by name. In addition, the following information is provided for each variable:

The Variable counts and amounts for individuals (section 10), outlines, for many variables, at the individual aggregate level, the count of individuals and the dollar amounts reported for the two most recent years of LAD data. Persons included in these counts and amounts are those who have been selected into the LAD sample.

The Definition of total income variables (Section 11) identifies and defines total income variables and highlights historical changes. Also provided are tables that outline and compare the variables that comprise market income and the Canada Revenue Agency’s (CRA) and Income Statistics Division’s (ISD) definitions of total income.

The tables outlined in this section are the following:

Finally, How to obtain more information on the inside cover provides information on how to contact us by telephone, mail, fax, or e-mail from across Canada.

5 LAD register

The LAD register is a companion data file to the yearly LAD files. It contains a selected number of variables for all individuals who are present at any time in the LAD. These variables have characteristics that should remain constant over time and thus may not be identified in a particular yearly file. A new LAD register is created every year with the addition of a new LAD yearly file from taxfiler information provided from living or deceased taxfilers and imputed individuals. Thus, the current register contains the most up-to-date information on individuals present in the LAD. On rare occasions, new information on individuals may differ from that on the existing file. In these instances, current information supersedes information in the existing LAD register.

The LAD register is a quick reference tool that can provide basic data without accessing the yearly files. For example, information such as the number of individuals in the LAD by age and sex in a given year can be tabulated directly from the register. Further, the LAD register can be employed in conjunction with the yearly files.

Following is a list of the variables that can be found on the register:

6 Programming tips

This section provides programming information for individuals who want to have a better understanding of the programming structure used to access data from the LAD files. Please note that individuals may undertake their own programming, however, only a small staff within Statistics Canada can carry out these retrievals. Access to the LAD files is restricted to protect the confidentiality of an individual’s tax data and any data that are made available will be screened through a set of rules designed to prevent disclosure.

There are two types of LAD files— the yearly LAD data files and the LAD register (for more details on the LAD register, refer to section 5, LAD register). LAD variables are identified with a variable name that consists of three parts: 1) the acronym name, 2) the aggregate level, and 3) the year (the four-digit year extension exists in most, but not all cases). Observations in the LAD files are sorted by a variable, named lin__i (note that there is no year extension for this variable), which enables users to maintain a link across years.

Data access is undertaken with SAS programming language. The next page contains a sample SAS program designed to access LAD data. The library assignments on the first three lines are the locations for the input files (first two lines) and the output files (the third line). The input files are in SAS format and can therefore be accessed with a SET or MERGE statement. This 20% sample based program is aimed at retrieving the number of Social Assistance (SA) recipients in Ontario that did not have any earnings appearing on their T4 slips, according to sex and year (in this case, 2000 to 2002). It is generally recommended that programs use the variables available in the register rather than the yearly files because the register information contains the most recent data. For example, the following program uses sxco_i, a variable found in the register, rather than sxco_i&yr, the variable found in the yearly LAD files. The flag_i&yr variables in the register are useful to identify individuals who have filed in a given year. In this program, only individuals who have filed every year from 2000 to 2002 are selected. At the end of the program, four tables are created from the output data file. Note that for confidentiality purposes, the weight variables wgt__i (with the LAD 10% sample) or wgt2_i (with the LAD 20% sample) must be used whenever a SAS procedure such as FREQ or LOGISTIC is invoked.

When programming in SAS, it is important to keep in mind the distinction between missing values and zeros in numeric fields. With SAS, most mathematical operations undertaken with missing values will return missing values. In LAD, in years that an individual is present, numeric variables not relevant to that individual have a value of zero. For example, if a non-family person has filed in 2000, then the value for RRSPSI2000 (contributions to a spouse’s RRSP) should be zero. If that individual has not filed in 2000, then the value will be missing. Thus, as a safety precaution, it is suggested that all numeric variables to be used in mathematical expressions be initialized to zero if missing, before using them.

Sample LAD program

* Sample SAS program using the LAD;

libname source1 ‘/LADdata/data1;          * first 10% sample ;
libname source2 ‘/LADdata/data2;          * second 10% sample ;
libname Out ‘/LADuser/xxxx/data’;          * user’s directory ;

* This sample program’s objective is to use the 20% LAD to retrieve the number of Social Assistance (SA) recipients in Ontario that did not have any earnings appearing on their T4 slips, according to sex and year (in this case, 2000 to 2002). Data for provinces and earnings are from the yearly LAD files whereas the sex variable is from the 2002 LAD register.

* The first step is to create a datafile containing all the information that we need to produce our tables. This datafile will be called SAOnt and will be saved in the ‘out’ directory. The Longitudinal Identifier Number (LIN__I) is used to merge the annual LAD datasets. ;

data out.SAOnt;
merge
source1.lad2000(where=(prco_i2000 = 5) keep=lin__i  prco_i2000 saspyi2000 t4e__i2000)
source2.lad2000(where=(prco_i2000 = 5) keep=lin__i  prco_i2000 saspyi2000 t4e__i2000)
source1.lad2001(where=(prco_i2001 = 5) keep=lin__i prco_i2001 saspyi2001 t4e__i2001)
source2.lad2001(where=(prco_i2001 = 5) keep=lin__i  prco_i2001 saspyi2001 t4e__i2001)
source1.lad2002(where=(prco_i2002 = 5) keep=lin__i  prco_i2002 saspyi2002 t4e__i2002)
source2.lad2002(where=(prco_i2002 = 5) keep=lin__i  prco_i2002 saspyi2002 t4e__i2002)
source1.reg2002(keep=lin__i sxco_i flag_i2000-flag_i2002 wgt2_i)
source2. reg2002(keep=lin__i sxco_i flag_i2000-flag_i2002 wgt2_i);

by lin__i ;

If flag_i2000=1 and flag_i2001=1 and flag_i2002=1; *person must be taxfiler in all 3 years;

* We create a flag variable that identifies the SA recipients for each year. The result is three variables, flag_sa2000, flag_sa2001 and flag_sa2002, taking a value of either 1 or 0.

If (t4e__i2000=0 and saspyi2000>0) then flag_sa2000 = 1 ;
          else flag_sa2000 = 0 ;
if (t4e__i2001=0 and saspyi2001>0)  then flag_sa2001 = 1 ;
          else flag_sa2001 = 0 ;
if (t4e__i2002=0 and saspyi2002>0) then flag_sa2002 = 1 ;
          else flag_sa2002 = 0 ;

run ;

* The SAS ‘freq’ procedure is used to produce our tables. We would also need to make sure that confidentiality guidelines standards are respected. ;

proc freq data = out.SAOnt;

          tables sxco_i*flag_sa2000*flag_sa2001*flag_sa2002 /missing;
          weight wgt2_i ;

run ;

* End of the sample program ;

7 Design of LAD variable acronyms

Most LAD variables have a ten-character acronym. Each acronym consists of three parts, namely the variable name (five characters), the aggregate level (one character), and the calendar year (four characters), e.g. XTIRCI2000.

The variable name is the principal component of the acronym. The characters identify the type of information provided by the variable (see section 9 “LAD Variable Definitions”).

The one-character aggregate level character provides information on individuals of the census family according to the designated level of aggregation. There are four possibilities, namely ‘I’, ‘P’, ‘F’, and ‘K’ representing individual, parents, family and children (kids) respectively. The family types outlined in these aggregate levels refer to the status of the family at the end of the tax year. Following are details about each of these aggregate levels:

The four-characters for the calendar year, identifies the year to which the variable is associated. The LAD data are stored in separate files for each calendar year; therefore all variables in a particular year file will have the same four-character calendar year reference. The only exception in the yearly files is the variable LIN__I, the LAD individual identification number, which is available for each observation present in each year file, but does not have a calendar year as part of the acronym (note that there is also a variable for spousal LIN (LIN__PyyyyNote ) which does have the year extension as part of the acronym name). In the register file, the exceptions to the four character year are LIN__I, SXCO_I, YOB__I, YOD__I, LNDYRI, TTNFLI and IMMFLI, which are the individual’s LIN, sex, year of birth, year of death, landing year, temporary SIN flag, and immigrant flag, respectively.

8 What’s New – LAD 2019

There have been a number of changes and improvements to the LAD and to the LAD data dictionary since the release of the 2018 LAD.

Updates to variable derivation information in the LAD

The T1 tax form and associated schedules were updated in 2019. The changes included an expansion of the T1 from four to six pages, new five-digit line numbers as opposed to the previous three-digit line reference, and changes to the schedule forms, such as the loss of Schedule 1 which has been incorporated into the expanded T1 form. Data dictionary users will observe the effect of these changes, where applicable, for each affected LAD variable.

Modified variables

We have made only one modification to the variables on the LAD database since the 2018 release. Changes in the originating administrative data used to develop the landing year (LNDYRI) variable on the LAD, have led to the expansion in the number of years of landing information available. Previously the earliest year of landing was 1980. However, with the introduction of the new administrative data the landing year (LNDYRI) variable now covers the period from 1952 to the present.

New variables

Three new variables are being added to the LAD 2019 database. These three variables are Scholarships fellowships bursaries amount (TSBAPG_ ), Other income exempt from tax under the Indian Act (SIEOIA_), and Maternity benefits exempt from tax under the Indian Act (SIEMBA_). Each of these three variables are included as part of total income (XTIRC). The table below lists the variable names and descriptions for the new additions to the 2018 LAD, with a fuller explanation provided in the main variable definition section.

New variables available on the LAD as of income year 2019
Table summary
This table displays the results of New variables available on the LAD as of income year 2019. The information is grouped by New variables (appearing as row headers), Years available (appearing as column headers).
New variables Years available
Scholarships fellowships bursaries amount (TSBAPG_ ) 2019
Other income exempt from tax under the Indian Act (SIEOIA_) 2019
Maternity benefits exempt from tax under the Indian Act (SIEMBA_) 2019

9 LAD variable definitions

10 Selected income variable counts and medians for individuals, 2018 to 2019

11 Definition of total income variables

This section specifies the exact definitions of the three measures of total income that are available on the LAD, which are:

The first measure of total income is TIRC, which is the Canada Revenue Agency Taxation definition of total income as per the T1 form. The second measure, XTIRC, has been derived by the Small Area and Administrative Data Division of Statistics Canada as a more appropriate measure for statistical analysis. The components of income that are included in XTIRC are generally described in Table 1, Components of XTIRC in 2018, while the details are given in Table 5, Definition of XTIRC, 1982 to 2019.

The largest difference between XTIRC and TIRC occurs from 1986 onward because non-Taxable income is added to XTIRC. In 1986, the Government of Canada introduced the Federal Sales Tax (FST) Credit directed at the low-income population. In order to determine eligibility for the FST Credit, filers had to report their
non-Taxable income. This was defined as Social Assistance payments, Guaranteed Income Supplement (GIS), Spouse’s Allowance (SPA), and Workers’ compensation payments. As a result of adding non-Taxable income to XTIRC in 1986, the user is cautioned in comparing pre-1986 values of XTIRC with later values. For example, an increase in XTIRC from 1985 to 1986 may simply reflect the reporting of non-Taxable income on the 1986 T1 form but not on the 1985 T1, i.e. perhaps no increase in income occurred.

Other new differences are the exclusion of RRSP income for people who are less than 65 years old and the inclusion of Indian exempt employment income to TIRC.

Another difference between TIRC and XTIRC is that capital gains are included in the former but not in the latter. The remaining differences are detailed in Table 4, Differences between TIRC and XTIRC.

The third measure of total income available from LAD is market income (MKINC). MKINC is derived from XTIRC by removing government transfer payments. The components of MKINC are generally described in Table 2, Components of MKINC, 1982 to 2019, while Table 6, Definition of MKINC, 1982 to 2019, gives the detailed derivation.

Besides the change to XTIRC in 1986 due to the addition of sales tax credits, changes in tax legislation and in the content of the T1 form itself have resulted in differences in the availability of the components of total income. The trend has been towards greater availability. For example, in 1992, the components of non-Taxable income are reported separately on the T1 form, adding three variables to the LAD: NFSL, denoting net federal supplements (GIS and SPA), WKCPY, denoting Workers’ compensation payments, and SASPY denoting social assistance payments. From 1986 to 1991, only the total of these three payments was reported. A history of the changes in XTIRC is given in Table 3, History of Components of XTIRC.

In summary, this part of the LAD Dictionary specifies the components of TIRC, XTIRC, and MKINC for each year of LAD from 1982 to 2019 via:

 
Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: