5 Data and sample

The Longitudinal Administrative Databank (LAD) is the 20% random sample based on annual information provided on personal tax returns. Once selected, individuals are in the sample whenever they file a tax return. To keep the sample current, a part of each year's sample consists of individuals who file their returns for the first time. For instance, the first year of LAD is 1982, so the 1982 LAD is simply a 20% sample of all files in 1982. The 1983 sample consists of those selected in 1982 who also filed in 1983 plus a sample of those who filed for the first time in 1983. The total of these two groups is a 20% sample of all filers in 1983. This scheme allows annual increases in the LAD sample to parallel the annual increases in the Canadian population.

The Longitudinal Immigration Database (IMDB) is a database that, when merged with LAD, provides a direct link between immigration records and the economic performance of immigrants. A person is included in the database only if he or she obtained landed-immigrant status since 1980 and filed at least one tax return after becoming a landed immigrant. Each year the IMDB is updated with a new cohort of landings. Moreover, in each new tax year there are new entrants from previous landing cohorts, not just the newly added cohort, who have filed (or are matched) for the first time. There are also those immigrants who have filed previously, but have not filed in that year. These immigrants remain in the IMDB as they could file in future years.

By linking the IMDB (1980 to 2000) with LAD (1982 to 2004) we can observe the earnings of those who became landed immigrants during the 1980-to-2000 period from 1982 to 2004. Seven immigrant cohorts are considered: 1980 to 1982, 1983 to 1985, 1986 to 1988, 1989 to 1991, 1992 to 1994, 1995 to 1997 and 1998 to 2000. The three-year band is chosen, based on a trade- off between the size of each cohort and the total number of cohorts.

The earnings variable used in the study is as a sum of two LAD variables. The first variable is the employment income from T4 slips issued to the individual—that is, all paid-employment income (except self-employment income) including wages, salaries and commissions before deductions. The second variable is the so-called 'other employment income,' which captures taxable employment income other than wages, salaries and commissions (tips, gratuities or director's fee that are not reported on a T4 slip).

The immigrant's years of schooling at landing are the number of years of formal schooling—top coded at 25 years—successfully completed by the time of arrival in Canada. The official languages ability indicator is the self-reported ability to communicate in either French or English, or both. Finally, the immigrant's country of birth is identified, based on a list of countries including countries that no longer exist or are not recognized as a nation state.5 All countries are divided into nine regions of birth, based on religious, ethnic and historical considerations (Appendix A).

The sample includes all male immigrants in the IMDB, who were at least 24 years old in the year they became landed immigrants and had positive earnings in the year following the last year in

the cohort band.6 This restriction ensures that the persons in the sample had completed all or most of their schooling outside Canada and entered the Canadian labour market soon after arrival. Persons were kept in the sample for as long as they had positive earnings and were under 55 years, for a minimum of two periods. The structure of the resulting panel is similar to the one adopted by Haider (2001). Although it has its drawbacks, the alternatives—a fully balanced or a fully unbalanced panel—appear to be worse. A fully balanced panel, for instance, would require immigrants from the 1980-to-1982 cohort to have 22 years of positive earnings to be in the sample, leaving us with a very narrow sample of immigrants from this cohort: those who entered the Canadian labour market at a young age and had a strong attachment to the labour market. At the other extreme, immigrants from the 1998-to-2000 cohort would only need four years of positive earnings to be in the sample and would have been in their late forties when entering Canada. These differences in the 'age-at-arrival' distributions would make cross-cohort comparisons very difficult. A fully unbalanced panel, on the other hand, which would allow for a later entry and/or re-entry into the sample of those who had zero earnings in some years, would also allow for a possibility of school attendance during these years. At a minimum, a 'delayed entry' of those who attended school in Canada prior to entering the labour market would create differences in the timing of the earnings profiles within each arrival cohort, making cohorts' inequality and instability profiles difficult to interpret. There is also evidence that the earnings profiles of immigrants who attended school in Canada may be quite different from the earnings profiles of those with only a foreign education (see Schaafsma and Sweetman 2001, for a discussion).

As we focus on immigrants whose main income source is employment income (wages and salaries), we exclude immigrants with self-employment income greater than $100 (in 2004 dollars) in absolute terms. Some immigrants report very small annual earnings. Retaining these observations in the sample would allow some zero earners to escape deletion 'on technicality.' To avoid this, annual earnings of less than $50 were reset to zero.

The summary of sample averages and percentages of immigrants in different categories is given in Appendix B, Table B.1.

5 The list, for instance, includes the Czech Republic, the Slovak Republic and Czechoslovakia.

6 For instance, if the person arrived between 1983 and 1985, he would be included in the sample if he filed for tax return and had positive earnings in 1986.