Historical Data Linkage Quality: The Longitudinal and International Study of Adults, and Tax Records on Labour and Income
Archived Content
Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
by James Hemeon
Start of text box
Abstract
Linkages between survey and administrative data are an increasingly common practice, due in part to the reduced burden to respondents, and to the data that can be obtained at a relatively low cost. Historical linkage, or the linkage of administrative data from previous years to the year of the survey, compounds these benefits by providing additional years of data. This paper examines the Longitudinal and International Study of Adults (LISA), which was linked to historical tax data from the T1 Family File (T1FF) and those collected from employers’ files (T4), among others not mentioned in this paper. It presents trends in historical linkage rates, compares the coherence of administrative data between the T1FF and T4, presents the ability to use the data to create balanced panels, and uses the T1FF data to produce age-earnings profiles by sex. The results show that the historical linkage rate is high (over 90% in most cases) and stable over time for respondents who are likely to file a tax return, and that the T1FF and T4 administrative sources show similar earnings. Moreover, long balanced panels of up to 30 years in length (at the time of writing) can be created using LISA administrative linkage data.
End of text box
1. Introduction
Statistics Canada has collected administrative data for statistical purposes since its inception as the Dominion Bureau of Statistics in 1918. The linkage of survey data to administrative sources is becoming increasingly common as a means to reduce respondent burden, to replace survey questions with data that could otherwise be subject to respondent recall bias, and to collect data that a respondent may not feel comfortable disclosing during a survey interview. By nature, it can reduce the costs associated with survey collection. Statistical agencies around the world have been using administrative data to replace questionnaires for decades (Economic and Social Council, 2009).
Survey-collected data from the Longitudinal and International Study of Adults (LISA) is linked to tax and other administrative data sources for each year of survey collection. In addition, the LISA performs a historical linkage to tax files preceding the first year of LISA collection, providing a significant amount of longitudinal data with no added response burden, and at no extra collection cost. Though cross-sectional socioeconomic data, which represents a moment in time, can be extremely useful, the availability of a high-quality longitudinal dataset such as the LISA allows for the analysis of trends over the course of people’s lives, which provides additional insight for public policy decision-making.
The purpose of this paper is to explore the data quality of the LISA historical linkage data. Heisz et al. (2013), using data from a pilot study, analyzed tax data linkage rates and data accuracy, and presented the benefits of historical linkage data. The current paper will apply some of the same methods, and expand upon these findings, using the LISA dataset. More specifically, this paper will analyze the linkage rate, the degree to which it decreases as the administrative data years go back in time, and the potential of historical linkage data in analyzing phenomena that require a longitudinal data series.
2. Sample
The LISA is a sample survey, with a stratified multi-stage, multi-phase design. The sample was drawn in 2011 by selecting dwellings from 2011 Canadian Census of Population data, and is therefore a representation of the population at that time. The first LISA interviews took place in late 2011 and early 2012. For simplicity, this first collection wave is referred to as LISA 2012, and thus the LISA 2012 database is used in this study. The sample included dwellings in Canada’s ten provinces, and excluded regular members of the Canadian Forces, individuals living in institutions, and individuals living on reserves and other Aboriginal settlements in the provinces. The data includes information on respondent demographics, family and household composition, literacy numeracy and problem solving skills, education and training, health, income and wealth, and labour market participation (Statistics Canada, 2014).
The database contains data from 23,926 respondents, aged 15 years and older. The file also contains 2,943 non-respondents and 5,264 non-responding children (under the age of 15). Upon completing the LISA interview, respondents were informed of the data replacement plans to link their survey data to administrative sources - a practice referred to as “informed replacement”.
Following collection, Social Insurance Numbers (SIN) were retrieved for respondents from the tax databases of 2010 and 2011, using the respondent’s first name, last name, date of birth, sex, marital status, address and postal code (Social Insurance Numbers are not collected directly from respondents). If no direct match was found, a SIN was linked using probabilistic linkage with the aforementioned auxiliary variables.
Once a SIN was identified, the LISA data was linked with different tax files of individuals: (i) the T1 Family File (T1FF), (ii) the statement and summary of compensation paid by employers (T4 file), and (iii) the Pension Plans in Canada fileNote 1. Two different types of linkage were carried out: (i) a yearly linkage (renewable for each new wave of the survey) and (ii) a historical linkage of tax data for all years going back to 1982 (T1 Family File) or 2000 (T4 file, Pension Plans in Canada file).
3. Results
3.1 Linkage rate between 1982 and 2011
Data linkage fails when no linkage key can be established, or the linkage key does not find a match in the administrative data file.
As indicated above, SIN codes were retrieved for individuals using the tax databases of 2010 and 2011. If no SIN was found during this linkage process, this may be attributed to a respondent having not filed a personal tax return during those two years, or it may be due to no suitable link being found in the probabilistic linkage. For 7.5% of LISA respondents, a linkage was attempted but no SIN was found. Of these, 55.1% were 17 years of age or less, and 64.9% were 20 years of age or less. Therefore, the majority of these cases can likely be attributed to respondents being young, and not having established a need to file a tax return.
In historical data linkage, there is the additional problem that a person’s SIN may change over time. If the respondent’s SIN was not consistent over time, the linkage will fail when the SIN can no longer be found. Social Insurance Number (SIN) is a relatively stable linkage key; however, in some cases, it changes over time. For example, immigrants to Canada are assigned a temporary SIN on their arrival in Canada, and are then assigned a permanent SIN. If a SIN cannot be found for this reason, the longitudinal dataset could be missing information, since there may have been an income tax return filed during the earlier years, but it could not be associated with the respondent.
Moreover, in historical linkage there is the additional problem that a person, as we go back to older administrative files, might fall out of the administrative file because they become too young to file, or if the respondents were immigrants, they might have been living in another country, or had not yet established stable filing patterns using a permanent SIN code in Canada. The LISA sample includes respondents as young as 15 years of age (as of 2011), while the historical linkage to T1FF data includes 30 years of tax data, in the 2012 LISA release. Therefore, the availability of tax data precedes the year of birth for some LISA respondents, and precedes the year of immigration to Canada for others. For example, in 1982, 24.4% of LISA respondents were not yet born, 10.0% had not yet immigrated to Canada, and 28.6% were 20 years of age or less or had immigrated to Canada within the past 3 years. By 1997, all LISA respondents had been born, 7.5% had not yet immigrated to Canada, and 30.2% of respondents were 20 years of age or less or had immigrated within the past 3 years (see Figure 3.1-1). Therefore, a linkage is impossible, or unlikely, for a subset of the LISA sample during some years, indicating that we would expect historical linkage rates to decline as we go back in time.
Data table for Figure 3.1-1
Not born yet | Immigrants who have not yet immigrated | 20 years of age or less, or immigrated within the past 3 years | Over 20 years of age, non-immigrant or immigrated over 3 years ago | |
---|---|---|---|---|
respondents | ||||
1982 | 5,849 | 2,398 | 6,846 | 8,833 |
1983 | 5,550 | 2,412 | 6,625 | 9,339 |
1984 | 5,206 | 2,420 | 6,443 | 9,857 |
1985 | 4,900 | 2,431 | 6,259 | 10,336 |
1986 | 4,581 | 2,429 | 6,124 | 10,792 |
1987 | 4,225 | 2,375 | 6,142 | 11,184 |
1988 | 3,851 | 2,359 | 6,173 | 11,543 |
1989 | 3,463 | 2,299 | 6,242 | 11,922 |
1990 | 3,002 | 2,225 | 6,400 | 12,299 |
1991 | 2,522 | 2,193 | 6,504 | 12,707 |
1992 | 2,038 | 2,138 | 6,674 | 13,076 |
1993 | 1,564 | 2,071 | 6,818 | 13,473 |
1994 | 1,002 | 2,040 | 7,046 | 13,838 |
1995 | 487 | 1,976 | 7,263 | 14,200 |
1996 | 37 | 1,910 | 7,419 | 14,560 |
1997 | 0 | 1,798 | 7,236 | 14,892 |
1998 | 0 | 1,708 | 7,002 | 15,216 |
1999 | 0 | 1,608 | 6,756 | 15,562 |
2000 | 0 | 1,475 | 6,537 | 15,914 |
2001 | 0 | 1,337 | 6,321 | 16,268 |
2002 | 0 | 1,241 | 6,100 | 16,585 |
2003 | 0 | 1,107 | 5,874 | 16,945 |
2004 | 0 | 984 | 5,644 | 17,298 |
2005 | 0 | 845 | 5,363 | 17,718 |
2006 | 0 | 682 | 5,169 | 18,075 |
2007 | 0 | 525 | 4,924 | 18,477 |
2008 | 0 | 386 | 4,632 | 18,908 |
2009 | 0 | 225 | 4,331 | 19,370 |
2010 | 0 | 105 | 3,944 | 19,877 |
2011 | 0 | 9 | 3,454 | 20,463 |
Source: Longitudinal and International Study of Adults (2012). |
A linkage rate was calculated between survey respondents and the T1FF data files for all years from 1982 to 2011, to determine the extent of its decline going back over time. Three different linkage rates were calculated, including those respondents for whom no SIN was found: (1) a gross rate (using all respondents in the sample), (2) an adjusted rate using a sample excluding respondents under 20 years of age in a given tax year, and (3) a second adjusted rate using a sample excluding respondents under 20 years of age in a given tax year, as well as immigrants landed in the three years preceding a given tax year (see Figure 3.1-2). The adjustment based on age reflects the fact that this group is less likely to produce an income tax return during a given year. The adjustment based on immigrant status reflects the fact that this group is unlikely to have filed Canadian taxes in pre-immigration years, and three years is chosen to give the immigrant respondents time to establish stable filing patterns with a permanent SIN.
Data table for Figure 3.1-2
Gross rate | Adjusted rate 1 | Adjusted rate 2 | |
---|---|---|---|
rate | |||
1982 | 39.1 | 83.6 | 91.3 |
1983 | 40.4 | 83.2 | 91.2 |
1984 | 41.8 | 82.6 | 90.6 |
1985 | 43.2 | 82.3 | 90.3 |
1986 | 45.8 | 83.5 | 91.5 |
1987 | 47.3 | 83.1 | 91.0 |
1988 | 49.0 | 83.6 | 91.5 |
1989 | 50.7 | 84.1 | 91.9 |
1990 | 52.1 | 84.3 | 91.8 |
1991 | 53.5 | 84.6 | 91.7 |
1992 | 55.2 | 85.4 | 92.2 |
1993 | 57.0 | 86.4 | 93.1 |
1994 | 58.4 | 86.6 | 93.1 |
1995 | 59.8 | 86.8 | 93.1 |
1996 | 60.9 | 86.8 | 92.8 |
1997 | 62.6 | 87.3 | 92.9 |
1998 | 63.9 | 87.3 | 92.8 |
1999 | 65.5 | 87.4 | 92.7 |
2000 | 67.2 | 87.9 | 93.0 |
2001 | 68.6 | 88.2 | 92.8 |
2002 | 70.0 | 88.4 | 92.7 |
2003 | 71.4 | 88.8 | 92.7 |
2004 | 73.3 | 89.3 | 92.7 |
2005 | 75.1 | 89.6 | 92.7 |
2006 | 77.2 | 90.1 | 92.6 |
2007 | 79.4 | 90.8 | 92.8 |
2008 | 82.2 | 91.8 | 93.2 |
2009 | 84.7 | 92.8 | 93.5 |
2010 | 87.5 | 93.8 | 94.1 |
2011 | 90.3 | 94.8 | 94.8 |
Source: LISA (2012) and linked data from the T1FF (1982 to 2011). |
Results in Figure 3.1-2 indicate that the linkage rate decreases going back in time, regardless of the sample used for the calculation. However, the most significant reduction in the linkage rate occurs when the calculation is based on a sample without exclusions, in which case it decreases significantly (1.76% per year, on average), from 90.3% in 2011 to only 39.1% in 1982. Excluding those respondents under the age of 20 in a given tax year, the rate decreases much less (0.39%, on average), from 94.8% in 2011 to 83.6% in 1982, remaining over 82% for all years. When also excluding respondents who immigrated to Canada in the 3 years prior to a given tax year, the decrease in linkage rate is small (0.12% per year, on average) from 94.8% in 2011 to 91.3% in 1982, remaining over 90% for all years. In other words, when the sample is limited to the population that is likely to file a tax return and have a constant SIN over time, the linkage rate remains high across all years.
In the LISA sample, 8.5% of respondents were not linked to tax data for any year between 1982 and 2011. This group is comprised of respondents for whom no SIN was found, and those who opted out of linkageNote 2.
3.2 Linkage rate examined
To analyze whether the linkage data is representative of the sample, several linkage rates were calculated for respondents who were over the age of 20 in a given tax year and were non-immigrants or who had immigrated over 3 years prior to the given tax year (see ‘Adjusted rate 2’ in Figure 3.1-2). In addition to the overall ‘Adjusted 2’ linkage rate, linkage rates were calculated for sub-samples by age (in a given tax year), sex, immigrant status, and province of residence (as of 2011) for the tax years 1982, 1996, and 2011. Because of the slight decrease in linkage rate in 1985, as shown in Figure 3.1-2, 1985 was also included (see Figure 3.2-1 – Figure 3.2-4). Note that these rates are based on unweighted frequency counts, as the objective of the present paper is to analyze linkage quality, rather than representativeness to the population. For the total number of observations linked in 1982, 1985, 1996, and 2011, see Table 3.2-5.
Data table for Figure 3.2-1
Category | Rate |
---|---|
All | 91.3 |
Female | 92.2 |
Male | 90.4 |
21 to 30 | 90.1 |
31 to 40 | 93.3 |
41 to 50 | 91.9 |
51 or over | 90.4 |
Immigrants | 90.2 |
Non-Immigrants | 91.5 |
Alberta | 91.0 |
BC | 90.1 |
Manitoba | 90.4 |
Maritimes | 92.8 |
Ontario | 90.3 |
Quebec | 91.3 |
Saskatchewan | 92.7 |
Source: LISA (2012) and linked data from the T1FF (1982). |
Data table for Figure 3.2-2
Category | Rate |
---|---|
All | 90.3 |
Female | 91.5 |
Male | 89.1 |
21 to 30 | 89.2 |
31 to 40 | 91.4 |
41 to 50 | 92.2 |
51 or over | 89.3 |
Immigrants | 89.2 |
Non-Immigrants | 90.5 |
Alberta | 90.2 |
BC | 86.7 |
Manitoba | 91.1 |
Maritimes | 91.6 |
Ontario | 89.6 |
Quebec | 90.4 |
Saskatchewan | 92.6 |
Source: LISA (2012) and linked data from the T1FF (1985). |
Data table for Figure 3.2-3
Category | Rate |
---|---|
All | 92.8 |
Female | 94.3 |
Male | 91.2 |
21 to 30 | 88.5 |
31 to 40 | 93.4 |
41 to 50 | 94.2 |
51 or over | 95.2 |
Immigrants | 91.8 |
Non-Immigrants | 93.0 |
Alberta | 91.3 |
BC | 89.4 |
Manitoba | 92.7 |
Maritimes | 94.4 |
Ontario | 91.0 |
Quebec | 95.0 |
Saskatchewan | 94.5 |
Source: LISA (2012) and linked data from the T1FF (1996). |
Data table for Figure 3.2-4
Category | Rate |
---|---|
All | 94.8 |
Female | 95.7 |
Male | 93.8 |
21 to 30 | 91.8 |
31 to 40 | 94.1 |
41 to 50 | 95.0 |
51 or over | 96.1 |
Immigrants | 95.9 |
Non-Immigrants | 94.6 |
Alberta | 93.0 |
BC | 93.1 |
Manitoba | 95.0 |
Maritimes | 95.8 |
Ontario | 93.6 |
Quebec | 97.0 |
Saskatchewan | 95.6 |
Source: LISA (2012) and linked data from the T1FF (2011). |
The overall linkage rates for 1982, 1985, 1996, and 2011 are 91.3%, 90.3%, 92.8%, and 94.8%, respectively.
Males have a slightly lower linkage rate than females in all years, with 95.7% of females linked in 2011, compared to 93.8% of males.
The results suggest that the linkage rate generally increases with age, similar to findings by Li et al. (2006); however, the rate for the youngest respondents, while lower, is still reasonably high at 91.8%. Respondents aged 51 or above in a given tax year have the highest linkage rate of all age groups in 2011, but this rate is lower in earlier years, as the number of respondents in that age group in a given tax year decreases sharply from 8,879 in 2011 to 544 in 1982. This is to be expected, as the ‘51 or over’ age group in 1982 was aged 81 or over at the time of their LISA interview, and may have been less likely to file a tax return in 2011 or 2010 (and therefore, also less likely to find a linkage SIN).
Immigrants have a slightly higher linkage rate than non-immigrants in 2011. Respondents who resided in the provinces of Ontario and British Columbia in 2011 generally have slightly lower linkage rates, when compared to other provinces. This is particularly apparent in 1985, where the linkage rate for British Columbia was 86.7%, a 3.5% drop from 1982. The reason for this decrease is unclear. Upon analysis of the 312 total respondents who had a linkage in 1982 but a missed linkage in 1985, no trend was found among age, sex, or immigrant status.
Category | 1982 | 1985 | 1996 | 2011 |
---|---|---|---|---|
All | 8,068 | 9,334 | 13,514 | 19,403 |
Female | 4,153 | 4,878 | 7,220 | 10,247 |
Male | 3,915 | 4,456 | 6,294 | 9,156 |
21-30 | 3,991 | 4,168 | 2,911 | 3,040 |
31-40 | 2,460 | 3,102 | 4,630 | 3,043 |
41-50 | 1,073 | 1,246 | 3,581 | 4,441 |
51 or over | 544 | 818 | 2,392 | 8,879 |
Immigrants | 889 | 1,038 | 1,829 | 3,558 |
Non-Immigrants | 7,179 | 8,296 | 11,685 | 15,845 |
Alberta resident 2011 | 735 | 860 | 1,313 | 2,056 |
BC resident 2011 | 860 | 948 | 1,396 | 2,129 |
Manitoba resident 2011 | 576 | 679 | 963 | 1,373 |
Maritimes resident 2011 | 2,043 | 2,379 | 3,278 | 4,138 |
Ontario resident 2011 | 1,716 | 1,963 | 2,930 | 4,493 |
Quebec resident 2011 | 1,593 | 1,869 | 2,734 | 3,920 |
Saskatchewan resident 2011 | 545 | 636 | 900 | 1,294 |
Source: LISA (2012) and linked data from the T1FF (1982, 1985, 1996, 2011). |
3.3 Balanced Panels
A longitudinal dataset requires data on its sample over a period of time. A break in the continuity of data could limit its usability for researchers for some purposes. A longitudinal dataset is said to be a “balanced panel” when all observations (respondents) are present in the dataset in all periods (in the case of LISA, each year). For historical linkage, a balanced panel requires the linkage of a tax record for each year.
If a researcher required balanced panels from those who were likely to file a tax return and likely to have a constant SIN over time (see ‘Adjusted rate 2’ in Figure 3.1-2), a 30-year panel could be constructed with a 74.3% linkage rate, and would contain 6,564 respondents (using years 1982-2011). If a researcher required a 25-year panel, it could be constructed with a 78.1% linkage rate (1987-2011), containing 8,735 respondents. A 20-year panel could be constructed with an 82.1% linkage rate (1992-2011), containing 10,733 respondents. A 15-year panel could be constructed with an 84.5% linkage rate (1997-2011), containing 12,579 respondents. A 10-year panel could be constructed with an 86.7% linkage rate (2002-2011), containing 14,371 respondents. If a researcher required only a 5-year panel, it could be constructed with an 89.7% linkage rate (2007-2011), and would contain 16,568 respondents (see Appendix A). Thus, LISA can be used to create long, balanced panels of a size sufficient for many analyses.
3.4 Comparison of earnings from T1FF and T4 files
One way to verify the reliability of administrative files is to compare their data to the values in administrative files from another source.
Earnings amounts in T1FF data files and in T4 data files were compared for the period from 2000Note 3 to 2011. The results show that the majority of cases - 97% each year, on average - present a similar earnings situation in both the T1 Family File (T1FF) and the employer file (T4) (see Table 3.4-1). In other words, only 3% (approximately) of cases show earnings in one file without showing earnings in the other. Approximately 71% of cases show earnings in both the T1FF and T4. Another 26% of respondents had $0 earnings in the T1FF and no T4 information. The number of cases with no T1FF information and a T4 earnings value of $0 is insignificant.
2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
% | ||||||||||||
Both sources - earnings (T1FF and T4 >= $0) | 72.7 | 72.2 | 71.9 | 71.6 | 71.0 | 71.3 | 71.0 | 70.9 | 70.5 | 69.7 | 69.7 | 70.1 |
Single source - earnings (either T1FF or T4 > $0) | 3.1 | 3.2 | 3.1 | 3.3 | 3.5 | 3.2 | 3.6 | 3.6 | 3.5 | 2.9 | 2.3 | 1.3 |
Single source - no earnings (T1FF = $0, no T4) | 24.2 | 24.6 | 25.0 | 25.2 | 25.4 | 25.5 | 25.4 | 25.5 | 26.0 | 27.4 | 28.1 | 28.6 |
Single source - no earnings (no T1FF, T4 = $0) | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Source: LISA (2012) and linked data from the T1FF and T4 files (2000-2011). |
The vast majority of cases show earnings in both the T1FF and T4 files, or $0 earnings in the T1FF filing and no T4 information (indicating agreement between the two files). When earnings values are reported in both the T1FF and T4 files, approximately 95% of cases show a difference of no more than one dollar between data sources (see Table 3.4-2). It should be noted that, while T4 earnings values contain cents values, T1FF earnings do not contain cents. Approximately 98% of cases show a difference of no more than one thousand dollars between data sources.
2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
% | ||||||||||||
$0.01 to $1.00 | 92.7 | 95 | 94.2 | 94.6 | 94.5 | 94.4 | 94.2 | 94.7 | 95.1 | 95.7 | 96.1 | 96.5 |
$0.01 to $100.00 | 2.2 | 1.9 | 1.7 | 1.6 | 1.6 | 1.8 | 1.8 | 1.4 | 1.3 | 1.0 | 0.8 | 1.0 |
$100.01 to $1000 | 1.8 | 1.5 | 1.8 | 1.9 | 1.8 | 1.7 | 1.8 | 1.9 | 1.6 | 1.5 | 1.3 | 1.1 |
T1FF < T4 by over $1000 | 2.6 | 0.9 | 1.5 | 1.2 | 1.3 | 1.3 | 1.5 | 1.3 | 1.3 | 1.2 | 1.2 | 0.9 |
T1FF > T4 by over $1000 | 0.7 | 0.7 | 0.8 | 0.6 | 0.9 | 0.9 | 0.7 | 0.7 | 0.8 | 0.6 | 0.6 | 0.5 |
Source: LISA (2012) and linked data from the T1FF and T4 files (2000-2011). |
From 2000 to 2011, the difference in median employment earnings calculated from the two data sources is, on average, $116 (see Table 3.4-3). The median earnings, when present in the T1FF and T4 sources, are very similar, which suggests that the T1FF linkage data is accurate, and that the T4 is also present where expected.
The median earnings, when found in one data source only, are significantly lower than median earnings when found in both data sources. Examining this in more detail shows that the majority of values from a single source are attributed to T1FF values of $0, in which case a T4 may not be available. The majority of single source earnings values greater than $0 are attributed to T4 earnings values with no T1FF information.
Year | Both sources | Single source, >$0 | |||||
---|---|---|---|---|---|---|---|
T1FF | T4 | T1FF | T4 | ||||
N | Median | Median | N | Median | N | Median | |
2000 | 12,000 | 31,979 | 32,584 | 84 | 7,248 | 430 | 6,294 |
2001 | 12,192 | 33,109 | 33,152 | 78 | 11,396 | 462 | 6,822 |
2002 | 12,377 | 32,517 | 32,634 | 67 | 11,545 | 461 | 8,273 |
2003 | 12,601 | 32,727 | 32,815 | 57 | 6,559 | 515 | 5,374 |
2004 | 12,808 | 32,814 | 32,851 | 141 | 26,926 | 494 | 5,940 |
2005 | 13,162 | 33,317 | 33,330 | 89 | 9,903 | 495 | 5,901 |
2006 | 13,523 | 33,541 | 33,653 | 98 | 11,574 | 588 | 5,311 |
2007 | 13,917 | 33,816 | 33,942 | 78 | 12,624 | 628 | 4,787 |
2008 | 14,287 | 33,983 | 34,052 | 123 | 12,612 | 589 | 5,564 |
2009 | 14,473 | 33,844 | 34,009 | 113 | 14,004 | 489 | 5,728 |
2010 | 14,853 | 33,322 | 33,410 | 94 | 10,724 | 390 | 8,912 |
2011 | 15,290 | 33,073 | 33,133 | 96 | 11,457 | 181 | 19,543 |
|
3.5 Profiles of earnings by age and sex
To demonstrate the potential of LISA linked data in creating a long data series, an age-earnings profile for each sex was created for different birth cohorts.
Due to the similar earnings found when comparing the T1 Family File (T1FF) and employers’ (T4) files, the earnings used were those from the T1FF, thus providing a longer data series. The sample was divided into seven birth year groups, at five-year intervals, for which the change in employment earnings was tracked by age.
Data table for Figure 3.5-1
All | 1921 to 1930 | 1931 to 1940 | 1941 to 1950 | 1951 to 1960 | 1961 to 1970 | 1971 to 1980 | 1981 to 1990 | |
---|---|---|---|---|---|---|---|---|
mean T1FF earnings ($) | ||||||||
20 to 24 | 20,789 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | 29,146 | 21,179 | 18,831 | 20,392 |
25 to 29 | 36,897 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | 39,449 | 34,719 | 37,240 | 36,677 |
30 to 34 | 47,703 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | 47,817 | 45,017 | 50,742 | Note ...: not applicable |
35 to 39 | 55,665 | Note ...: not applicable | Note ...: not applicable | 55,159 | 53,318 | 56,768 | 61,472 | Note ...: not applicable |
40 to 44 | 61,278 | Note ...: not applicable | Note ...: not applicable | 60,766 | 58,690 | 64,990 | Note ...: not applicable | Note ...: not applicable |
45 to 49 | 65,163 | Note ...: not applicable | 58,070 | 63,414 | 65,774 | 68,229 | Note ...: not applicable | Note ...: not applicable |
50 to 54 | 67,023 | Note ...: not applicable | 60,941 | 65,362 | 69,656 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable |
55 to 59 | 60,651 | 54,731 | 56,717 | 60,391 | 64,038 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable |
60 to 64 | 49,472 | 54,942 | 48,758 | 48,258 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable |
65 to 69 | 35,872 | 40,460 | 35,142 | 35,524 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable |
Source: LISA (2012) and linked data from the T1FF from 1982 to 2011. |
Data table for Figure 3.5-2
All | 1921 to 1930 | 1931 to 1940 | 1941 to 1950 | 1951 to 1960 | 1961 to 1970 | 1971 to 1980 | 1981 to 1990 | |
---|---|---|---|---|---|---|---|---|
mean T1FF earnings ($) | ||||||||
20 to 24 | 15,911 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | 22,494 | 16,899 | 13,954 | 15,148 |
25 to 29 | 25,878 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | 26,640 | 24,421 | 26,385 | 28,064 |
30 to 34 | 29,827 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | 28,845 | 28,863 | 32,559 | Note ...: not applicable |
35 to 39 | 33,208 | Note ...: not applicable | Note ...: not applicable | 28,854 | 32,074 | 34,184 | 37,176 | Note ...: not applicable |
40 to 44 | 36,320 | Note ...: not applicable | Note ...: not applicable | 30,283 | 35,969 | 39,512 | Note ...: not applicable | Note ...: not applicable |
45 to 49 | 38,554 | Note ...: not applicable | 28,994 | 33,133 | 39,954 | 43,268 | Note ...: not applicable | Note ...: not applicable |
50 to 54 | 38,817 | Note ...: not applicable | 29,256 | 34,837 | 43,099 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable |
55 to 59 | 34,069 | 25,957 | 27,812 | 32,386 | 40,738 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable |
60 to 64 | 26,284 | 25,372 | 23,988 | 27,177 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable |
65 to 69 | 18,750 | 21,339 | 16,649 | 19,883 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable |
Source: LISA (2012) and linked data from the T1FF from 1982 to 2011. |
The age-earnings profiles by birth year cohort show a trend of lower earnings at the beginning of career, and a faster growth in earnings, for workers in more recent cohorts (e.g., 1971-1980 cohort versus 1961-1970 cohort). There is also a trend of higher peak earnings for workers in more recent cohorts (e.g., 1951-60 cohort versus 1941-50 cohort). These trends are similar to those found in related literature (Vijay et al., 2014; Beach and Finnie, 2004).
Most notably, females show higher career earnings progression for each successive cohort. Females aged 50-54 in the 1951-1960 cohort earned 47% more than females in the 1931-1940 cohort did when they were aged 50-54, compared to a 14% increase in earnings between the respective cohorts for males aged 50-54. These support previous findings of a greater increase in female earnings, relative to males (Williams, 2010; Suh, 2010; Blau and Kahn, 2006).
4. Conclusions
This study provides a partial assessment of the quality of administrative data from 1982 to 2011 which was linked to 2012 LISA data. In particular, linkage rates were analyzed, the data was compared across administrative sources, and the ability to use the linkage data to analyze selected phenomena requiring a longitudinal data series was assessed.
Linkage rates to administrative data were examined in multiple ways, and the results indicate that the linkage rates are high, with more than 90% of LISA respondents aged 15 and over being linked in 2011. Linkage rates to prior years were also high, especially when rates were calculated for respondents who were aged 20 and over and who had immigrated at least three years prior to the year linked. Among key demographic sub-groups, the linkage rate remains high. However, data users must consider that certain sub-groups may not have as many observations historically. For example, immigrants will only have linkage data on or after the year in which they entered Canada.
The results also suggest that the data obtained by the historical linkage produces data that is coherent across administrative data sources, and can be used for observing phenomena that require a longitudinal data series, as well as long panel datasets.
As the data is based a sample drawn in 2011, it is most appropriately used for studies describing the life-course histories of that particular cohort, as opposed to cross-sectional referencing to individual years. The linkage allows for analysis of retrospective income data that would otherwise have not been possible without 30 years of survey collection, or without introducing a significant recall bias. Furthermore, upcoming data releases will be coupled with additional years of LISA survey data, which will increase the analytical potential of the dataset.
Bibliography
Beach, C. and Finnie, R. (2004), “A Longitudinal Analysis of Earnings Change in Canada”, Analytical Studies Branch. Research Paper, Statistics Canada.
http://www.statcan.gc.ca/pub/11f0019m/11f0019m2004227-eng.pdf
Blau, Francine D.; Kahn, Lawrence M. (2006) “The US gender pay gap in the 1990s: slowing convergence”, IZA Discussion Papers, no. 2176
www.econstor.eu/dspace/bitstream/10419/34046/1/51436131X.pdf
Economic and Social Council. (2009) “Main Results Of The UNECE-UNSD Survey On The 2010 Round of Population and Housing Censuses”, Economic Commission for Europe, Conference of European Statisticians. Twelfth Meeting, 28-30 October 2009.
http://unstats.un.org/unsd/censuskb20/Attachment459.aspx
Gill, Vijay, Knowles, James, Stewart-Patterson, David. (2014). “The Buck Stops Here: Trends in Income Equality Between Generations”, Ottawa: The Conference Board of Canada.
Heisz, Andrew, Langevin, Manon, Randle, Jeffrey. (2013). “Historical data linkage of tax records on labour and income: The case of the Living in Canada Survey pilot”. Statistics Canada Catalogue no. 89-648-X (2).
http://www.statcan.gc.ca/pub/89-648-x/89-648-x2013002-eng.htm
Li, Bing, Quan, Huge, Fond, Andrew, Lu, Mingshan. (2006) “Assessing record linkage between health care and Vital Statistics databases using deterministic methods”, BioMed Central Health Services Research 2006, 6:48.
http://www.biomedcentral.com/1472-6963/6/48/
Sakshug, Joseph W., Couper, Mick P., Ofstedal, Mary B., Weir, David R. (2012) “Linking Survey and Administrative Records: Mechanisms of Consent”, Sociological Methods & Research, 41(4) 535-569.
http://smr.sagepub.com/content/41/4/535.full.pdf
Statistics Canada. (2014). LISA Detailed information for 2014 (Wave 2).
http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=5144
Suh, Jingyo. (2010) “Decomposition of the Change in the Gender Wage Gap”, Research in Business and Economics Journal, 2-18.
http://www.aabri.com/manuscripts/08076.pdf
Williams, Cara. (2010) “Women in Canada: A Gender-based Statistical Report. Sixth Edition”. Economic Well-being. Statistics Canada Catalogue no. 89-503-X. p. 32-33.
http://www.statcan.gc.ca/pub/89-503-x/2010001/article/11388-eng.pdf
Appendix A
5yr | 10yr | 15yr | 20yr | 25yr | 30yr | ||
---|---|---|---|---|---|---|---|
1982 | % | 85.3% | 80.6% | 78.9% | 77.3% | 75.9% | 74.3% |
N | 7,531 | 7,123 | 6,966 | 6,831 | 6,700 | 6,564 | |
1987 | % | 85.7% | 83.2% | 81.5% | 79.7% | 78.1% | Note ...: not applicable |
N | 9,582 | 9,307 | 9,112 | 8,918 | 8,735 | Note ...: not applicable | |
1992 | % | 88.8% | 86.2% | 84.0% | 82.1% | Note ...: not applicable | Note ...: not applicable |
N | 11,606 | 11,265 | 10,978 | 10,733 | Note ...: not applicable | Note ...: not applicable | |
1997 | % | 89.3% | 86.6% | 84.5% | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable |
N | 13,305 | 12,889 | 12,579 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | |
2002 | % | 89.2% | 86.7% | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable |
N | 14,798 | 14,371 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | |
2007 | % | 89.7% | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable |
N | 16,568 | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | Note ...: not applicable | |
Source: LISA (2012) and linked data from the T1FF from 1982 to 2011. |
Notes
- Date modified: