Longitudinal Administrative Data Dictionary, 2016

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Release date: October 25, 2018

Skip to text

Text begins

1 Introduction

The Longitudinal Administrative Databank (LAD) is a subset of the T1 Family File (T1FF). The T1FF is a yearly cross-sectional file of all taxfilers and their families. Census families are created from information provided annually to the Canada Revenue Agency in personal income tax returns. Both legal and common law spouses are attached by the spousal Social Insurance Number (SIN) listed on the tax form, or by matching based on name, address, age, sex, and marital status. Children are identified through a similar algorithm and supplementary files. Prior to 1993, non-filing children were identified from information on their parents’ tax form. Information from the Family Allowance Program was used to assist in the identification of children. Since 1993, information from the Child Tax Benefit Program has been used for this purpose.

The LAD is a random, 20% sample of the T1FF. Selection for LAD is based on an individual’s SIN. There is no age restriction, but people without a SIN can only be included in the family component. Once a person is selected for the LAD, the individual remains in the sample and is picked up each year from the T1FF if he or she appears on the T1 that year. Individuals selected for the LAD are linked across years by a unique LAD identification number (LIN__I) generated from the SIN, to create a longitudinal profile of each individual. The LAD is augmented up each year with a sample of new taxfilers so that it consists of approximately 20% of taxfilers for every year. The 20% sample has increased from 3,227,485 people in 1982 to 5,579,280 in 2016 (an increase of 73%). This increase reflects increases in the Canadian population and increases in the incidence of tax filing as a result of the introduction of the Federal sales tax credit in 1986 and the Goods and Services Tax credit in 1989.

The LAD is organized into four levels of aggregation, namely the individual, spouse/parent, family, and child levels. The databank contains information on demographics, income, and other taxation data at the different levels of aggregation from 1982-2016, with new years of data being added as the information becomes available. Changes in tax legislation and in the design of the T1 form itself have resulted in some variables not being available for all years as well as some minor definitional changes from one year to the next.

The LAD is currently linked with the Longitudinal Immigration Database (IMDB) containing immigration records from 1980 to 2016. This linkage has been approved by the Statistics Canada Executive Management Board (EMB). Further information is available at 2008 submissions.

The LAD has been designed to serve as a research tool from which custom tabulations can be prepared. This dictionary, in turn, has been created to assist researchers in identifying the type of information that is available from the LAD. It identifies and defines the LAD variables including historical changes.

2 Confidentiality

Statistics Canada protects the confidentiality of individuals’ tax data. Only aggregated information that conforms to the confidentiality provision of the Statistics Act is released. The LAD resides within Statistics Canada and all retrievals are done on site. Only a small staff within the Income Statistics Division (ISD) can access such data directly. This means that users must specify their data requirements to these persons who then carry out the retrieval. More information on the confidentiality procedures can be obtained from Client Services.

3 Geography

Data from the LAD are available for various levels of geography including Canada, provinces/territories, and regions (such as Census Division (CD), Census Metropolitan Area /Census Agglomeration (CMA/CA), Census sub-division (CSD) and Census Tracts (CT), etc.). Many other levels of geography are not included on the main LAD database, for example Economic Region (ER) and Federal Electoral District (FED); however these may be available in the LAD using the Postal Code Conversion File.

4 Dictionary format and contents

Outlined below is a brief description of the next nine sections of the LAD Dictionary.

The LAD register (Section 5) is a file that is used in conjunction with the yearly LAD files. The Register outlines the years that an individual is on the LAD and provides information on the taxfiler’s sex, year of birth, and year of death. This section provides a brief description of this file and describes how it can be used to enhance LAD data analysis.

The Programming tips section (Section 6) provides information on writing programs for LAD retrievals. This information will assist those individuals who want to better access data from LAD files using the effective programming structure.

The Design of LAD variable acronyms (Section 7) is a description of the variable acronym structure. It provides insight into how to interpret the variable acronyms and information on the aggregation levels.

The What’s New section (Section 8) is a description of changes to the LAD database since the previous LAD release. It also provides a list of the new variables added to the LAD for the present income year. These new variables may also be available for previous years. Users are encouraged to check each new variable to determine the years available.

The LAD variable definitions (Section 9), typologically lists each variable by name. In addition, the following information is provided for each variable:

The Variable counts and amounts for individuals (section 10), outlines, for many variables, at the individual aggregate level, the count of individuals and the dollar amounts reported for the two most recent years of LAD data. Persons included in these counts and amounts are those who have been selected into the LAD sample.

The Correspondence with the IMDB variables (section 11) presents the variables from the Longitudinal Immigration Database (IMDB) that are linked with the LAD.

The Definition of total income variables (Section 12) identifies and defines total income variables and highlights historical changes. Also provided are tables that outline and compare the variables that comprise market income and the Canada Revenue Agency’s (CRA) and Income Statistics Division’s (ISD) definitions of total income.

The tables outlined in this section are the following:

Finally, How to obtain more information on the inside cover provides information on how to contact us by telephone, mail, fax, or e-mail from across Canada.

5 LAD register

The LAD register is a companion data file to the yearly LAD files. It contains a selected number of variables for all individuals who are present at any time in the LAD. These variables have characteristics that should remain constant over time and thus may not be identified in a particular yearly file. A new LAD register is created every year with the addition of a new LAD yearly file from taxfiler information provided from living or deceased taxfilers and imputed individuals. Thus, the current register contains the most up-to-date information on individuals present in the LAD. On rare occasions, new information on individuals may differ from that on the existing file. In these instances, current information supersedes information in the existing LAD register.

The LAD register is a quick reference tool that can provide basic data without accessing the yearly files. For example, information such as the number of individuals in the LAD by age and sex in a given year can be tabulated directly from the register. Further, the LAD register can be employed in conjunction with the yearly files. In particular, it is recommended that the age of an individual be calculated from the register’s information on the year of birth rather than relying on the age information in the yearly files in order to ensure that it is consistent across years.

Following is a list of the variables that can be found on the register:

6 Programming tips

This section provides programming information for individuals who want to have a better understanding of the programming structure used to access data from the LAD files. Please note that individuals may undertake their own programming, however, only a small staff within Statistics Canada can carry out these retrievals. Access to the LAD files is restricted to protect the confidentiality of an individual’s tax data and any data that are made available will be screened through a set of rules designed to prevent disclosure.

There are two types of LAD files— the yearly LAD data files and the LAD register (for more details on the LAD register, refer to section 5, LAD register). LAD variables are identified with a variable name that consists of three parts: 1) the acronym name, 2) the aggregate level, and 3) the year (the four-digit year extension exists in most, but not all cases). Observations in the LAD files are sorted by a variable, named lin__i (note that there is no year extension for this variable), which enables users to maintain a link across years.

Data access is undertaken with SAS programming language. The next page contains a sample SAS program designed to access LAD data. The library assignments on the first three lines are the locations for the input files (first two lines) and the output files (the third line). The input files are in SAS format and can therefore be accessed with a SET or MERGE statement. This 20% sample based program is aimed at retrieving the number of Social Assistance (SA) recipients in Ontario that did not have any earnings appearing on their T4 slips, according to sex and year (in this case, 2000 to 2002). It is generally recommended that programs use the variables available in the register rather than the yearly files because the register information contains the most recent data. For example, the following program uses sxco_i, a variable found in the register, rather than sxco_i&yr, the variable found in the yearly LAD files. The flag_i&yr variables in the register are useful to identify individuals who have filed in a given year. In this program, only individuals who have filed every year from 2000 to 2002 are selected. At the end of the program, four tables are created from the output data file. Note that for confidentiality purposes, the weight variables wgt__i (with the LAD 10% sample) or wgt2_i (with the LAD 20% sample) must be used whenever a SAS procedure such as FREQ or LOGISTIC is invoked.

When programming in SAS, it is important to keep in mind the distinction between missing values and zeros in numeric fields. With SAS, most mathematical operations undertaken with missing values will return missing values. In LAD, in years that an individual is present, numeric variables not relevant to that individual have a value of zero. For example, if a non-family person has filed in 2000, then the value for RRSPSI2000 (contributions to a spouse’s RRSP) should be zero. If that individual has not filed in 2000, then the value will be missing. Thus, as a safety precaution, it is suggested that all numeric variables to be used in mathematical expressions be initialized to zero if missing, before using them.

Sample LAD program

* Sample SAS program using the LAD;

libname source1 ‘/LADdata/data1;          * first 10% sample ;
libname source2 ‘/LADdata/data2;          * second 10% sample ;
libname Out ‘/LADuser/xxxx/data’;          * user’s directory ;

* This sample program’s objective is to use the 20% LAD to retrieve the number of Social Assistance (SA) recipients in Ontario that did not have any earnings appearing on their T4 slips, according to sex and year (in this case, 2000 to 2002). Data for provinces and earnings are from the yearly LAD files whereas the sex variable is from the 2002 LAD register.

* The first step is to create a datafile containing all the information that we need to produce our tables. This datafile will be called SAOnt and will be saved in the ‘out’ directory. The Longitudinal Identifier Number (LIN__I) is used to merge the annual LAD datasets. ;

data out.SAOnt;
source1.lad2000(where=(prco_i2000 = 5) keep=lin__i  prco_i2000 saspyi2000 t4e__i2000)
source2.lad2000(where=(prco_i2000 = 5) keep=lin__i  prco_i2000 saspyi2000 t4e__i2000)
source1.lad2001(where=(prco_i2001 = 5) keep=lin__i prco_i2001 saspyi2001 t4e__i2001)
source2.lad2001(where=(prco_i2001 = 5) keep=lin__i  prco_i2001 saspyi2001 t4e__i2001)
source1.lad2002(where=(prco_i2002 = 5) keep=lin__i  prco_i2002 saspyi2002 t4e__i2002)
source2.lad2002(where=(prco_i2002 = 5) keep=lin__i  prco_i2002 saspyi2002 t4e__i2002)
source1.reg2002(keep=lin__i sxco_i flag_i2000-flag_i2002 wgt2_i)
source2. reg2002(keep=lin__i sxco_i flag_i2000-flag_i2002 wgt2_i);

by lin__i ;

If flag_i2000=1 and flag_i2001=1 and flag_i2002=1; *person must be taxfiler in all 3 years;

* We create a flag variable that identifies the SA recipients for each year. The result is three variables, flag_sa2000, flag_sa2001 and flag_sa2002, taking a value of either 1 or 0.

If (t4e__i2000=0 and saspyi2000>0) then flag_sa2000 = 1 ;
          else flag_sa2000 = 0 ;
if (t4e__i2001=0 and saspyi2001>0)  then flag_sa2001 = 1 ;
          else flag_sa2001 = 0 ;
if (t4e__i2002=0 and saspyi2002>0) then flag_sa2002 = 1 ;
          else flag_sa2002 = 0 ;

run ;

* The SAS ‘freq’ procedure is used to produce our tables. We would also need to make sure that confidentiality guidelines standards are respected. ;

proc freq data = out.SAOnt;

          tables sxco_i*flag_sa2000*flag_sa2001*flag_sa2002 /missing;
          weight wgt2_i ;

run ;

* End of the sample program ;

7 Design of LAD variable acronyms

Most LAD variables have a ten-character acronym. Each acronym consists of three parts, namely the variable name (five characters), the aggregate level (one character), and the calendar year (four characters), e.g. XTIRCI2000.

The variable name is the principal component of the acronym. The characters identify the type of information provided by the variable (see section 9 “LAD Variable Definitions”).

The one-character aggregate level character provides information on individuals of the census family according to the designated level of aggregation. There are four possibilities, namely ‘I’, ‘P’, ‘F’, and ‘K’ representing individual, parents, family and children (kids) respectively. The family types outlined in these aggregate levels refer to the status of the family at the end of the tax year. Following are details about each of these aggregate levels:

The four-characters for the calendar year, identifies the year to which the variable is associated. The LAD data are stored in separate files for each calendar year; therefore all variables in a particular year file will have the same four-character calendar year reference. The only exception in the yearly files is the variable LIN__I, the LAD individual identification number, which is available for each observation present in each year file, but does not have a calendar year as part of the acronym (note that there is also a variable for spousal LIN (LIN__PyyyyNote ) which does have the year extension as part of the acronym name). In the register file, the exceptions to the four character year are LIN__I, SXCO_I, YOB__I, YOD__I, LNDYRI, TTNFLI and IMMFLI, which are the individual’s LIN, sex, year of birth, year of death, landing year, temporary SIN flag, and immigrant flag, respectively.

8 What’s New – LAD 2016

There have been a number of changes and improvements to the LAD and to the LAD data dictionary since the release of the 2015 LAD.

Restructured documentation

The LAD data dictionary variable description section, Section 9, now groups all the LAD variables and their descriptions by income theme, enabling researchers to quickly find specific groups of variables relevant to particular aspects of income research.

The new overall layout is visible in the table of contents and provides a simple entry into the dictionary’s income concepts. The major variable headings are – total income, income taxes, personal characteristics, and TFSA. Within these four major headings are subheadings and associated variables. For example, under the Total Income section heading are a number of subheadings and various total income variables, such as total income without capital gains (XTIRC), total income including capital gains (XTIIC), after-tax income (AFTAX), after-tax income including capital gains (AFTIC), etc.

For those who still wish to search for their LAD variables alphabetically, there is now an index at the end of the data dictionary which reproduces a full alphabetical listing of the LAD variables. The remaining sections of the data dictionary are maintained, which allow users to gain a short introduction to the LAD while also understanding some of the components of the database and some control counts and totals.

Modified variables

In addition to the format changes in the data dictionary, there have been some corrections and changes made to the LAD database.

Changes have been made to year of birth variable (YOB__) on the Register, as well as to other variables relating to the age of the taxfiler, their spouse, or their children – (age__i, age__p, age__k). These changes have been made to improve the consistency of these variables across the years, as inconsistencies were identified, particularly affecting the values associated with imputed records.

The variable “Old age security guaranteed income supplement recipient indicator” (OASFL) has been removed from the LAD. Analyses determined that there were data inconsistencies and quality concerns with the variable.

Changes have been introduced to the two LAD low income measure (LIM) variables – LIMXT, and LIMAT. These changes ensure a consistent LAD-based methodology to calculate these two LIM measurements for all LAD years, employing a new LAD family weighting variable (see “New Variables” below for more information on the weight variable). These updated variables replace the existing LIM variables.

New variables

Several new variables have been added to the LAD database. A group of variables have been added relating to individuals and their ownership of shares in Canadian Controlled Private Corporations (CCPCFLGI_ , CCPCOWNI_ , CCPCCNTI_ ).

As well, we have added a new family weight variable (famwgt_F). Users wishing to produce estimates of census family information, such as the number of tax-filing families, can now do so by applying this weight. There are a number of conditions which users must take into account when applying this weight, all of which are explained in the variable description.

The table below lists the variable names and descriptions for the new additions to the 2016 LAD, with a fuller explanation provided in the main variable definition section.

New variables available on the LAD as of income year 2016
Table summary
This table displays the results of New variables available on the LAD as of income year 2016. The information is grouped by New variables (appearing as row headers), Years Available (appearing as column headers).
New variables Years Available
Flag: Owned shares in a CCPC (CCPCFLGI_) 2002-2016
Flag: Sole owner of a CCPC (CCPCOWNI_) 2002-2016
Number of CCPCs owned or partly owned by an individual filer (CCPCCNTI_) 2002-2016
Family weight variable (famwgt_F) 1982-2016

9 LAD variable definitions

10 Selected income variable counts and medians for individuals, 2015 to 2016

11 Correspondence with the IMDB variables

Correspondence with the IMDB variables
Table summary
This table displays the results of Correspondence with the IMDB variables. The information is grouped by LAD Acronym (appearing as row headers), IMDB Acronym, IMDB Acronym (Old) and Description (appearing as column headers).
LAD Acronym IMDB Acronym IMDB Acronym (Old) Description
LNGOF OFFICIAL_LANGUAGE CAN_LANG, Official_Language_Cd Immigrant’s official languages ability indicator
PAYSC COUNTRY_CITIZENSHIP F03FCITZ, CITZ, Citizenship_Country_Cd Immigrant’s country of citizenship at landing
PAYSR COUNTRY_RESIDENCE F03FCLPR, FCLPR, CLPR_Country_Cd Immigrant’s country of last permanent residence
PAYSN COUNTRY_BIRTH F03FCOB, FCOB, Birth_Country_Cd Immigrant’s country of birth
IEDCD LEVEL_OF_EDUCATION F03FEDUC, FEDUC, Level_Of_Education_Cd Immigrant’s level of education at landing
IMCAT IMM_CATEGORY_STC_ROLLUP2 F03IMCAT, IMCAT, Imm_Category_Rollup2_Cd Immigrant category
STATM MARITAL_STATUS M_STAT, Marital_Status_Cd Immigrant’s marital status at landing
LNGMA MOTHER_TONGUE NAT_LANG, Mother_Tongue_Cd Immigrant’s native language (or mother tongue)
IPRMR DESTINATION_CMA F03NCMA296NCHA3, Destination_CMA_Cd11 Immigrant’s intended place of destination
CNP4_ NOC2_CD11 F03NOC4, NOC4, NOC4_Cd Immigrant’s intended occupation
IEDAN YEARS_OF_SCHOOLING SCH_YR, SPECIAL_PROGRAM Immigrant’s years of schooling at landing
IPSPC SPECIAL_PROGRAM F03SPC_P, SPC_P, Special_Program_Cd Immigrant’s special program code

12 Definition of total income variables

This section specifies the exact definitions of the three measures of total income that are available on the LAD, which are:

The first measure of total income is TIRC, which is the Canada Revenue Agency Taxation definition of total income as per the T1 form. The second measure, XTIRC, has been derived by the Small Area and Administrative Data Division of Statistics Canada as a more appropriate measure for statistical analysis. The components of income that are included in XTIRC are generally described in Table 1, Components of XTIRC in 2016, while the details are given in Table 5, Definition of XTIRC, 1982 to 2016.

The largest difference between XTIRC and TIRC occurs from 1986 onward because non-Taxable income is added to XTIRC. In 1986, the Government of Canada introduced the Federal Sales Tax (FST) Credit directed at the low-income population. In order to determine eligibility for the FST Credit, filers had to report their
non-Taxable income. This was defined as Social Assistance payments, Guaranteed Income Supplement (GIS), Spouse’s Allowance (SPA), and Workers’ compensation payments. As a result of adding non-Taxable income to XTIRC in 1986, the user is cautioned in comparing pre-1986 values of XTIRC with later values. For example, an increase in XTIRC from 1985 to 1986 may simply reflect the reporting of non-Taxable income on the 1986 T1 form but not on the 1985 T1, i.e. perhaps no increase in income occurred.

Other new differences are the exclusion of RRSP income for people who are less than 65 years old and the inclusion of Indian exempt employment income to TIRC.

Another difference between TIRC and XTIRC is that capital gains are included in the former but not in the latter. The remaining differences are detailed in Table 4, Differences between TIRC and XTIRC.

The third measure of total income available from LAD is market income (MKINC). MKINC is derived from XTIRC by removing government transfer payments. The components of MKINC are generally described in Table 2, Components of MKINC, 1982 to 2016, while Table 6, Definition of MKINC, 1982 to 2016, gives the detailed derivation.

Besides the change to XTIRC in 1986 due to the addition of sales tax credits, changes in tax legislation and in the content of the T1 form itself have resulted in differences in the availability of the components of total income. The trend has been towards greater availability. For example, in 1992, the components of non-Taxable income are reported separately on the T1 form, adding three variables to the LAD: NFSL, denoting net federal supplements (GIS and SPA), WKCPY, denoting Workers’ compensation payments, and SASPY denoting social assistance payments. From 1986 to 1991, only the total of these three payments was reported. A history of the changes in XTIRC is given in Table 3, History of Components of XTIRC.

In summary, this part of the LAD Dictionary specifies the components of TIRC, XTIRC, and MKINC for each year of LAD from 1982 to 2016 via:

Date modified: