Data sources, methods and definitions

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Data sources

Data from the 2011 National Household Survey (NHS) and census data from 1991 and 2006 were used in the study. The study covered employed men and women aged 25 to 34. In the NHS, a random sample of 4.5 million dwellings was selected, accounting for almost 30% of all private dwellings in Canada (but it excluded persons living in institutional collective dwellings such as hospitals, nursing homes and penitentiaries; Canadian citizens living in other countries; and full-time members of the Canadian Forces stationed outside Canada). The overall response rate for the NHS, a voluntary survey, was 68.6%. The final responses are weighted so that the data from the sample accurately represent the NHS' target population.

The census is conducted every five years. All households receive the short form, which asks for basic information only. Prior to 2011, a 20% sample of households received the long form which, in addition to the basic information, also asked more detailed questions on matters including labour market activities.

The choice of using the census and NHS data is motivated by the fact that this paper focuses only on men and women aged 25 to 34. Other sources with a smaller sample, such as the Labour Force Survey, would not allow for a detailed analysis of the occupational mix among young graduates and non-graduates.


Following the method first developed by Duncan and Duncan (1955), the index of segregation can be defined as

St = (0.5) ∑i | mit – fit |

where mit (fit) is the proportion of all working males (females) who are employed in occupation i at time t. This index is usually expressed as a percentage and indicates the proportion of women (or men) that would have to change occupations to have the occupation distribution of men and women be the same. A value of zero implies complete integration, while a value of 100 means complete segregation.

To study changes in segregation over time, it is helpful to decompose the index into the sex composition effect (change due to changes in sex composition within occupations) and occupation mix effect (change in the relative size of occupations).Note1 The segregation index at time t can be expressed as

St = (0.5) ∑i | (qit Tit / ∑i qit Tit) — (pit Tit / ∑i pit Tit) |

where pit (qit) is the percentage of women (men) in occupation i at time t. Tit is the total employment for occupation i at time t. To study the changes in segregation between time periods 1 and 2, the decomposition can be carried out as follows:

sex composition effect = [(0.5) ∑i | (qi2 Ti1 / ∑i qi2 Ti1) — (pi2 Ti1 / ∑i pi2 Ti1) |] – S1

occupation mix effect = S2 – [(0.5) ∑i | (qi2 Ti1 / ∑i qi2 Ti1) — (pi2 Ti1 / ∑i pi2 Ti1) |].

To allow for the construction of the index, consistent occupational groups had to be defined for the entire period, meaning some of the four-digit occupations had to be regrouped.Note2


Employed: A person is considered employed if he or she had a job in the reference week (week preceding the census/survey)—includes persons who were temporarily absent for the entire week because of vacation, illness, a labour dispute at work, maternity/parental leave, bad weather, fire or family responsibilities, or for some other reason.

Occupations: Occupation classifications are based on the four-digit National Occupational Classification (NOC), according to the following:

  • 2011 NHS and 2006 Census: Occupations based on NOC 2006 (Human Resources and Skills Development Canada)
  • 1991 Census: Occupations based on NOC 1990 (Human Resources and Skills Development Canada).

Some occupations were reclassified over time. For example, among computer and information systems professionals, there were five occupations in 2011: information systems analysts and consultants; database analysts and data administrators; software engineers and designers; computer programmers and interactive media developers; and web designers and developers. In 1991 there were two occupations: computer systems analysts and computer programmers.


  1. This technique was first developed by Fuchs (1975) and subsequently used by many researchers, including Blau and Hendricks (1979) and Blau et al. (2013).
  2. In 2011, information systems analysts and consultants; database analysts and data administrators; software engineers and designers; computer programmers and interactive media developers; and web designers and developers were combined to create one group. The corresponding group for 1991 was formed by combining computer systems analysts and computer programmers. Also in 2011, the following occupational groups that were excluded as comparable categories could not be found in 1991: construction estimators; computer network technicians; user support technicians; systems testing technicians; casino occupations; machine operators; and mineral and metal processing. Similarly, the following were excluded from 1991: records and file clerks; computer operators; and elemental medical and hospital assistants. These occupations represented a very small proportion of the overall group of workers in both years.
Date modified: