3 Data and a framework for the analysis

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

The empirical approach is motivated by the regression to the mean model used in economic analysis to measure mobility in earnings, income, and other indicators of socioeconomic status across the generations as described, for example, in Corak (2004) and Mulligan (1997). This is depicted in Equation (1), where Y represents an outcome of interest, in our case years of education attained, and t is an index of generations.

To use the example of education, in this equation the educational attainment of family i's child would be Yi,t, which is equal to the average years of education of generation t children, as represented by αt, plus two factors determining the deviation from this average: a fraction of parental education ( β Yi,t-1) and other influences not associated with parental education ( εi,t).

Average educational attainment will evolve through time, and it is very likely that many or all members of a generation will have more education than their parents. This is captured in Equation (1) by the value of α. However, and just as importantly, the equation reflects the idea that an individual's education is nonetheless related to his or her parents' education. This is captured by the value of β, which represents the fraction of education advantage that is, on average, transmitted across the generations. In other words, β summarizes in a single number the degree of intergenerational education mobility in a society. It could conceivably be any real number. A positive value would indicate intergenerational persistence of education in which higher parental education is associated with higher child education; a negative number would indicate intergenerational reversal in which higher parental education is associated with lower child education. In fact, the published research shows that this coefficient has always been found to be positive, though varying significantly across countries and with the level of development as, for example, in the analysis of over 30 countries by Hertz et al. (2007).3

We implement this framework in two separate ways: indirectly, using a grouped estimator from the census; and directly, using reported individual information on parental education from the Ethnic Diversity Survey. We follow the U.S. analysis of Card, DiNardo and Estes (2000) and define second-generation immigrants to be those Canadian-born individuals whose mother and father were both born outside of Canada. First-generation immigrants are defined as those who immigrated to Canada regardless of the age of arrival. In beginning, it should be underscored that the 2001 Census does not permit a direct link between the adult outcomes of children and the status of their parents when they were raising their families, but it does permit the construction of a 'grouped' estimator relating the average outcomes of second-generation adults in 2001 with the average background characteristics of immigrant adults from the 1981 Census who were potentially their parents. An analysis of the intergenerational mobility of immigrants using detailed country of origin along these lines is also offered for the United States in Borjas (1993) and Card, DiNardo and Estes (2000), and also in the research on the intergenerational earnings mobility of the children of Canadian immigrants in Aydemir, Chen and Corak (forthcoming).

The analytical files from the census are constructed as follows. Immigrant fathers are drawn from the 1981 Census and restricted to those individuals whose spouse is also an immigrant, and who have Canadian-born children from the ages of 5 to 17 years. Using least squares regression we computed predicted values of Yi,t-1 for each country of origin for individuals matching these criteria. Correspondingly, the second-generation sample consists of individuals from 25 to 37 years of age in 2001, whose parents were both immigrants. Similarly, predicted values of Yi,t are calculated for each country that respondents reported their fathers came from.

Since the variation in the outcome variables may arise from the differences in demographic characteristics among country groups, we construct age- and region-adjusted years of schooling and earnings outcomes for each country of origin. For the immigrant parents, we regress the variables of interest—years of education and also the logarithm of weekly earnings—on age, age-squared, country of origin dummies, dummies for the Canadian province of residence, and country of origin dummies interacted with age and age-squared. The inclusion of these interaction terms controls for differences in life-cycle profiles across countries. We then calculate predicted schooling or earnings for each source country at age 40 for those residing in Ontario, the most populous province.4 For the second-generation sons and daughters, we construct age- and region-adjusted outcomes by regressing schooling on age, age-squared, dummies for father's country of origin, and region dummies, and then predict outcomes for each country group for a 31-year-old living in Ontario. These points in the life cycle correspond to those used in Aydemir, Chen and Corak (forthcoming) and also in much of the Canadian intergenerational earnings mobility literature, as well as roughly to the suggestion of Haider and Solon (2006), who examine life-cycle biases in the derivation of permanent income.

To avoid small sample-size problems, we aggregate some countries in which observations are less than 30 into groups and arrive at a total of 70 countries/regions. This is done separately for sons and daughters. These 70 data points are used to estimate Equation (1) for sons and daughters using years of education as the outcome, and weighted by population shares. As mentioned, we also calculate parental earnings in the same way, opening up the possibility of relating both parental education and earnings to the educational attainment of the children.

This grouped data estimator of Equation (1) has both advantages and disadvantages. These are discussed in Card, DiNardo and Estes (2000). The most obvious disadvantage includes the potential slippage between the generations. The 'parents' are the potential parents of the children, and there could be a slippage in how representative they are of the actual parents, due to death or emigration. At the same time, however, it should be noted that the large sample size available to us through the use of the full 20% census file reduces this problem to the largest extent possible in the literature with which we are familiar. In particular, this is a tighter fit than is possible with U.S. data. For example, Card, DiNardo and Estes (2000) are able to develop a similar structure for only 30 source countries, and the data require them to relate the earnings and education of all immigrants to all second-generation individuals aged from 16 to 65. Furthermore, as discussed in Aydemir and Borjas (unpublished), since the within-cell means are based upon calculations that are samples, their accuracy will vary with the number of observations available. The implication is that the sampling variation associated with the independent variable will cause an attenuation bias. Aydemir and Borjas (unpublished) examine the nature and extent of this bias, and also show that the use of the 20% census file, as opposed to smaller sampling rates available in public-use versions of the census, affords a sufficiently large sample size to minimize its impact.

On the other hand, the advantage of this estimator is that it is more robust to measurement error. This is a particularly important concern in the analysis of the intergenerational transmission of earnings inequality as discussed, for example, in Solon (1999, 1992). In this literature, researchers are faced with the difficulty of having to infer information on permanent income from annual earnings, and they are trying to minimize a classical errors-in-variables problem through instrumental variables or through multi-year averages from panel data on individual annual earnings. At first glance it might be reasonable to suppose that the measurement error problems in an outcome like education are not as severe as with earnings. Much of the literature implicitly and even explicitly assumes that in fact it is absent, but Ermisch and Francesconi (2004), using U.K. data on a commonly employed measure of socioeconomic status point out that this need not be the case.

All of this said, we use the census jointly with, and as a complement to, the Ethnic Diversity Survey (EDS), which has the advantage of offering individual-level information on educational attainment across two generations. This is a post-censual survey representative of the entire population, but with the objective of providing information on the ethnic and cultural background of Canadians. A sample of just under 42,500 people, 15 years of age and over, were interviewed in 2002 using the one-in-five 2001 Census data as the sampling frame, and basing the sample selection on the ethnic origin, place of birth and parental place of birth. Those who were not Canadian, British, French, American, Australian or New Zealanders in their response to ethnic origin questions were over-sampled (Statistics Canada 2003). The limitations of the EDS are that there is no information on earnings and income of parents, and the smaller sample size somewhat limits the degree to which specific countries of origin can be examined. It is in these ways that the census can be a useful complement. The advantages over the census are the retrospective information on parental education collected from survey respondents, and also the capacity to estimate Equation (1) for both the children of immigrants, for the entire population of Canadians, and for different birth cohorts.

The EDS contains all the information from the 2001 Census for each survey respondent including, most importantly for our purposes, the years of education attained. The information on parental education attainment, however, is recorded as one of nine categories. In converting this information into years of schooling we rely on the fact that in addition to actual years of education the census also reports information categorically and actually in more detail with 16 categories being used. We re-code both the EDS categories and those in the 1981 Census into seven common categories.5 We then match years of schooling from the census to the EDS by cells defined according to gender, country of origin, education category, and age (25 to 44 years, 45 to 54 years, and 55 and older). Within each of these cells we calculate from the 1981 Census the mode of the years of schooling and match this statistic to the individuals in the EDS in similarly defined cells according to the information they provided on their mothers and fathers.6

A summary of this information by broad region of origin is offered in Table 5 along with information from the census. The average years of education for second-generation men and women in panels 3 and 4 of the table are essentially the same across the two data sources, never differing by more than 0.3 to 0.4 of a year. This is not surprising, since the EDS information is extracted from the census, the differences likely reflecting sampling error. Second-generation Canadians, regardless of the region of the world in which their parents were born, all have more years of education than Canadians with parents also born in Canada. The advantage is greatest for those with African and Asian origins.

The information in panels 1 and 2 compares the direct measures of the years of schooling from the census with the data calculated from the categories reported in the EDS. The averages across these two sources are similar, with the possible exception of those from Africa, the census reporting an average of 14.9 years and our derivations from the EDS implying 16.1 years. But the EDS information is based upon a rather small sample of just 68 observations, so it is likely that this difference is due to sampling variation. The next largest difference is 0.7 years for those from Asia.

Further, the information as a whole suggests that all groups made gains over their parents. Canadians 25 to 37 years of age with Canadian-born parents have roughly two to three more years of education on average than their parents. Gains are also made by second-generation Canadians, though in some cases not as great in absolute levels because of the higher starting point of their parents. However, the gains are particularly high for those whose parents were born in Southern and Eastern Europe: on average, fathers had just less than nine years of schooling, but the children obtained 15 years. Those with parents born in Asia also obtained significantly more education than their parents, about two to three years more on average. A more refined examination of this type of mobility, in the context of Equation (1), using both grouped data and individual data is discussed in the remainder of the paper.

3 Intergenerational mobility in education has, of course, been a longstanding concern in both economics and sociology. Some of the most related Canadian work in this area includes de Broucker and Lavallée (1998) using the International Adult Literacy Survey, Fournier, Butlin and Giles (1995) using the Survey of Labour and Income Dynamics, and Sen and Clemente (unpublished) using the General Social Survey. The last mentioned is closest in spirit to the methodology that we employ, but all of these studies find a strong positive association between parent and child education, though none focuses on immigrants. More recently attention has also shifted to the relationship between family background and actual literacy and numeracy outcomes for children, as opposed to formal schooling. See for example OECD and UNESCO (2003) based upon the Programme for International Student Assessment.

4 The exclusion restrictions imposed on the underlying data differ slightly across the two variables of interest. For education, we use all available observations; for weekly earnings, we use only those observations in which respondents report positive earnings.

5 These are: (1) less than high school, including no schooling; (2) high school diploma; (3) some college without a diploma or certificate; (4) some university without a diploma or certificate; (5) college graduation with a diploma or certificate; (6) undergraduate university degree; and (7) graduate university degree.

6 We also calculated the cell medians and cell means. These all led to similar results, but the mode came closest to the census results in a comparison across broad regions of origin.