Methodology of the Canadian Labour Force Survey
Chapter 2 Sample design Methodology of the Canadian Labour Force Survey
Chapter 2 Sample design

2.0 Introduction

As mentioned in Chapter 1, the objective of the Labour Force Survey (LFS) is to produce reliable and timely data on employment, unemployment and characteristics of the working-age population at various levels of geography. In theory, such data could be acquired through an administrative source, a census of the population, or a sample survey. As there is currently no administrative source available that can produce the required estimates, nor is it feasible to conduct a census and contact everyone of working age every month to determine their employment status, a sample of the population is contacted, and their responses are used to produce monthly labour force estimates.

The sample design consists of all the steps to be carried out when selecting a sample. It impacts the quality of estimates produced and the survey costs. Since a significant portion of a survey’s budget is spent on data collection, the sample design tries to minimize collection costs while maximizing data quality.

This chapter describes the various strategies the LFS uses to achieve this objective in the ten provinces. First, Section 2.1 presents some basic concepts of survey theory that will be used throughout this chapter. Section 2.2 outlines the overall LFS sample design. The sample allocation is described in Section 2.3. Section 2.4 describes how the clusters are formed and Section 2.5 describes how they are stratified. Finally, Section 2.6 describes the sample selection process and rotation methodology.

2.1 Some basic survey theory concepts

This section presents some concepts required in order to understand the sample design that is described in the following sections. For further information, a conceptual overview of survey theory is available in Survey Methods and Practices (Statistics Canada (2003)). More technical details can be found in one of the many books on sampling theory (e.g., Cochran 1977 or Särndal, Swensson and Wretman 1992).

Data collected from a sample survey are used to produce estimates for the target population - the group or population of interest. Selecting a sample requires a survey frame, which should correspond as closely as possible to the target population although practical constraints may prevent this. The LFS selects a probabilistic sample, i.e., a subset of the population for which the surveyed units are selected at random. Estimates for the population are calculated based on the information provided from this sample.

Estimates can differ depending on which individuals are selected in the sample. Also, the estimate produced from a sample differs from the estimate produced if the entire population was interviewed. These types of differences are called sampling errors. Survey results also have other errors not associated with the sample design, called non-sampling errors.

Two important measures of sampling error are bias and sampling variance. Suppose that it is possible to select several different samples using the same sample design. For each sample, an estimate of the characteristic of interest (e.g., the number of unemployed, average number of hours worked) can be produced from the observed data. Bias is the difference between the average of the estimates produced from all of the possible samples and the corresponding true value for the whole population. The variability between the sample estimates, or how different they are from one another, is the sampling variance.

Bias can be caused by a number of sources, such as an imperfect survey frame, the method used to produce the estimate, or survey nonresponse. The sample design can add bias when some regions are excluded from survey coverage (e.g. due to prohibitive collection costs). This error component can be difficult to measure in practice because the true value for the population is generally unknown.

Sampling variance measures the spread between the estimates produced from all the possible samples. It reflects the degree of precision of an estimate: the smaller the sampling variance, the more precise the estimate. Sampling variance can be estimated from a single observed sample, even though it reflects variability between many theoretical samples.

Other measures of variability are derived from sampling variance. Standard error is obtained by taking the square root of the sampling variance and is often used to determine a confidence interval or to carry out a statistical test. Standard error is an absolute measure of variation, since it is measured in the same units as the estimate. Another measure is the Coefficient of Variation (CV), which is defined as the standard error divided by the estimate. The CV is a relative measure, since it is unit-free and calculated relative to the estimate. A third measure is the design effect, a relative measure calculated by dividing the sampling variance of an estimate obtained under the survey design by the sampling variance of a Simple Random Sample (SRS) of the same sample size. It can be used to compare the effectiveness of one sample design to another. The smaller the standard error, confidence interval length, CV or design effect is, the more precise the estimate is.

The primary goal of an effective sample design is to reduce the sampling variance given limited budget and operational constraints. A more efficient sample design can obtain the same precision of the estimates (as measured by sampling variance) using a smaller sample size than another less efficient design. Similarly, given a fixed total sample size, a more efficient sample design has lower sampling variance than a less efficient sample design.

Several factors influence the sampling variance of an estimate. The most influential factors are the number of individuals in the population, the number of individuals in the sample, the sampling method used to draw the sample, the response rate, and the homogeneity of the characteristic of interest in the population. The size of the population cannot be controlled. Response rates can sometimes be changed by data collection, but usually not by the sample design. However, by controlling the number of individuals in the sample, the sampling method used to draw the sample, and the homogeneity within sampled groups, a more effective sample design can be obtained.

2.2 Overview of the sample design

The LFS uses a complex sample design. A more efficient sampling method would be simple random sampling (SRS), where units are selected at random from a list with equal probability. However, a simple random sample of individuals requires a list of all individuals in the target population, which may be difficult to obtain in practice. Also, operational constraints may prevent the feasibility of an SRS design requiring a more complex sampling method to be used.

For most LFS estimates the target population is all persons in Canada aged 15 and over. It is impossible to directly select a sample of such persons to interview since a complete and up-to-date list of persons residing in the ten provinces is not available. Instead of selecting persons directly, it is easier to select dwellings and then identify and interview persons living in the selected dwellings. Although a fairly complete and regularly updated list of dwelling addresses is now available (see Section 2.2.1), selecting dwellings by simple random sampling would lead to a sample that would be too geographically spread out. As a result, travel costs associated with in-person collection could be exceedingly high.

To reduce travel costs, the sample of dwellings is taken through two consecutive selection stages. This method is called two-stage sampling. In the first stage, the provinces are divided into geographic regions called clusters or primary sampling units (PSUs). A random selection of these PSUs makes up the first stage sample. In the second stage, for each selected PSU, a list of dwellings in the region is established either by an extraction from the Address Register (The Dwelling Universe File (Section 2.2.1)) or through field listing. A second-stage sample of dwellings is selected from these lists. Dwellings are the secondary sampling units (SSUs). All of the residents, who are part of the target population occupying the selected dwellings within the selected clusters, make up the LFS sample of persons. This two-stage selection method is more complex but reduces the geographic spread of the sampled persons by clustering them, thereby reducing costs.

Starting in January 2015, the sample design in Prince Edward Island (PEI) was changed to one-stage sampling. This means that dwellings are selected directly from a list, without any clusters. Section 2.5.6 has more information about the one-stage sample design in PEI.

In addition to the monthly estimates described in Section 1.2, the LFS produces change estimates between two given reference periods. To improve the quality of these estimates, it is preferable to increase the overlap between the samples of these two periods, which is only possible by keeping the same dwellings in the sample for several months. Unfortunately, when the sample overlap is increased, the burden imposed on respondents rises because they must participate in the survey several times. This increased burden could lead to a lower response rate. On the other hand, in addition to improved quality, a bigger overlap also reduces survey collection costs, since it costs less to obtain a response in subsequent months than in the first month. Therefore, the sample overlap is a compromise between the quality of the change estimates and the cost of survey operations versus the burden imposed on respondents.

It was decided to keep each dwelling in the LFS sample for six consecutive months. Subject to this limitation, the maximum overlap of the sample between two consecutive months is five-sixths. Therefore, it is necessary to replace one-sixth of the sample of dwellings each month. To implement this strategy, the LFS PSU population is divided into six rotation groups^{Note 1}, with a sample selected in each group representing the whole population. The first rotation group is initially contacted in January. These dwellings then remain in the sample until June inclusively. In July, all the dwellings in rotation group 1 are replaced by a new sample of dwellings from the same rotation group. The second rotation group is made up of the dwellings initially surveyed from February to July inclusively, and so on for the other rotation groups. The rotation pattern is illustrated in Figure 2.1 at the end of the chapter. More information about the rotation of the dwelling sample is provided in Section 2.6.4.

The strategy of overlapping rotation groups has some advantages. First, it allows for more effective processing and estimation methods (described in Chapters 5 and 6). It also permits a simple method for selecting a subset of the LFS sample for other Statistics Canada surveys. Since each rotation group represents the whole population, it is straightforward to build the sample for another survey by grouping together the dwellings from an appropriate number of rotation groups. Information on using the LFS survey frame and sample for other household surveys is given in Chapter 9.

2.2.1 The Dwelling Universe File and its impact on the sample design

For the sample design, it is important to know approximately how many dwellings and how many occupied dwellings (i.e., dwellings that correspond to households of persons) are in the LFS population. The counts are used for PSU creation, sample allocation and stratification. For previous LFS designs, these counts came from the most recent Census. However, the most recent Census counts are from May 2011 while this redesign was phased in starting January 2015.

To have a more up-to-date count of the total number of dwellings in the population, the Dwelling Universe File (DUF) was used for this design. The DUF is an extraction from Statistics Canada’s Address Register (AR) database that contains residential addresses (dwellings). It is updated quarterly, using the latest administrative files available and the results of field listing and verification.

For planning the redesign, the June 2013 extract of the DUF was used. Since the DUF did not identify occupied dwellings, the number needed to be estimated. This was done by multiplying the total number of dwellings on the DUF by the dwelling occupancy rate from the 2011 Census.

2.3 Sample allocation

As described in Chapter 1, the LFS is the official source of monthly estimates of total employment and unemployment. The LFS is also one of the main sources of information on socio-demographic characteristics of the working-age population such as age, marital status, level of education and family status.

The LFS produces data for a variety of geographic regions including National, Provincial and sub-provincial regions, such as Economic Regions (ERs), Census Metropolitan Areas (CMAs) and Employment Insurance Economic Regions (EIERs). The sample allocation step specifies the target number of households to select in each of these regions^{Note 2}. It is established to ensure that the sample can produce estimates that satisfy various LFS precision objectives. This is a crucial step because the subsequent steps depend on it, and it ensures that the survey resources are effectively used. More information on the Census related geographies used by the LFS can be found in the Census Dictionary (Statistics Canada (2012)).

As explained in Section 2.1, the number of units sampled has a direct impact on the quality of the estimates produced by the survey. Since the total sample size is fixed, too much sample assigned to a given region will produce estimates for that region that are of better quality than required by the survey objectives to the detriment of the data quality in other regions. The LFS produces estimates at various geographical levels (Canada, provinces, economic regions, etc.), so it is necessary to reach a suitable compromise for all these estimates when allocating the finite-budgeted sample.

In order to meet the survey objectives and maintain the overall efficacy of the survey design, the LFS sample is allocated in two steps. In the first step, sample funded by Statistics Canada is allocated. In the second step, additional sample funded by Employment and Social Development Canada (ESDC) is added. This two-step approach is based on the hypothesis that the Statistics Canada LFS budget is ensured over a long period of time, but that the funding from ESDC could fluctuate over time. Thus, each part of the sample should be allocated separately to meet the appropriate objectives for which it is funded. Table B.4 in Appendix B provides the LFS sample allocation based on various geographical units.

2.3.1 Allocation of the sample funded by Statistics Canada

The first step consists of allocating the sample funded by Statistics Canada (36,000 households) among the 10 provinces. Statistics Canada has established LFS quality objectives for the provinces, Economic Regions (ERs) and Census Metropolitan Areas (CMAs). All targets are based on estimates of the number of unemployed persons. This is because unemployment is an issue of high interest and because unemployment, being rarer than employment, takes more resources to measure at the same quality in terms of CV than employment would. The purpose of the allocation is to ensure that the sample will be able to meet these objectives.

To make adjustments to the sample allocation of the previous design, it was necessary to predict the precision of the estimates of the number of unemployed persons (monthly and three-month moving average) for each province, ER, and CMA for a given sample size. These predictions were based on a CV estimation model involving the sample size, the estimate of the number of unemployed persons, and the estimated variance of that estimate. This model is based on data from 80 previous months of LFS. Therefore, it is implicitly assumed that the new sample design will have comparable efficiency to the previous one. It is also assumed that response rates, vacancy rates, and the number of adults per household will remain constant over time. Model assumptions were validated by analyzing LFS trends over the last nine years. Using this model, it was possible to predict the impact of allocation changes on the quality of future unemployment estimates.

The allocation strategy for the sample funded by Statistics Canada is based on the following criteria:

For each province, the CV of the monthly estimate of the number of unemployed should be less than 7%;^{Note 3}
For each ER, the CV of the three-month moving average estimate of the number of unemployed should be less than 25%;
The minimum sample size for each ER is 200;
For CMAs that do not correspond well to EIERs, that is for five CMAs and Lethbridge^{Note 4}, the CV of the three-month moving average estimate of the number of unemployed should be less than 25%;

In previous designs, it was expected that CVs should be below the target for all months. However, in practice this does not happen. Over a ten-year design, some months can be outliers for some domains, which is difficult to predict. Allocating a larger sample to control for outliers can be both difficult and potentially a waste of finite resources better spent in a different region. For the redesign, the sample was allocated such that the CV targets would be met for at least 90% of the monthly estimates over time.

Using the CV estimation model and non-linear programming, the 36,000 households were allocated to provinces, ERs and CMAs to meet the above constraints while minimizing the variance of national monthly estimate of the number of unemployed. Further details on allocation to the provinces, ERs, and CMAs follow.

Allocation to the ten provinces

For the provinces, the previous design aimed to have CVs less than 7% for the monthly estimates of the number of unemployed. In practice, this was not achieved for many provinces. Simulations showed that it would not be possible to meet a 7% CV target for all provinces and in all months using only the 36,000 sampled households funded by Statistics Canada. Given that the sample size was fixed, the only solution was to modify the targets.

As mentioned earlier, the CV is a relative quality measure. For very low levels of unemployment, CVs tend to be higher. On the other hand, standard error is an absolute quality measure, in that it is measured in the same units as the estimate. For a fixed sample size, CVs increase as unemployment rate decreases, even if the standard error remains the same. Because of a low unemployment rate in the Prairie provinces over the last few years, CVs were higher than the 7% target, even though standard errors were comparable to months with higher unemployment and lower CVs. A decision was made to adopt two allocation procedures: Allocate to provinces based on CVs when unemployment rate is above 5%; Allocate based on a comparable standard error size when unemployment rate is below 5%.

The exception to this rule was PEI. The sample size in PEI was kept at the same level as in the previous design. A sample increase was avoided because a further increase would have led to dwellings being selected a second time over the planned ten-year life of the new design, which would impose considerable burden on respondents.

Allocation to Economic Regions

The CV target for ERs remained unchanged from the previous LFS designs. However, some changes were made to the geographic regions targeted since it is difficult to ensure that we meet these targets for all ERs. ERs that are small in terms of their household count were combined and the precision objective was applied to the combined ER. Four groups of ERs were combined with the last redesign. They were located in the northern regions of Quebec, Manitoba, Saskatchewan and British Columbia. For this redesign, three additional pairs of small rural ERs were combined in Newfoundland and Labrador, Manitoba and Alberta.

Allocation to Census Metropolitan Areas

Ensuring that the sample from Statistics Canada supports CV targets for all Census Metropolitan Areas (CMAs) has been considered. However, simulations have shown that imposing such targets detract from the precision of more important provincial and national estimates. In most cases, the Census Metropolitan Areas correspond to Employment Insurance Economic Regions (EIERs). The additional ESDC sample allocated for EIER estimates (see Section 2.3.2) provides enough sample to have sufficient quality for most CMA estimates. Two CMAs (Moncton and Saint John) together form a single EIER and each has sufficient quality due to the ESDC sample.

There are six areas that do not correspond well to EIERs and where the CV requirements are not met even with the additional ESDC sample. Five of these areas are the CMAs of Peterborough, Barrie, Brantford, Guelph and Kelowna. The sixth area is Lethbridge. After each census, Statistics Canada reviews the list of CMAs. It was expected that Lethbridge would become a CMA following the 2016 Census, so it was important to ensure that the sample drawn in Lethbridge would be sufficient to produce good estimates after 2016. Additional sample was allocated to these six areas so they would have sufficient sample for the CVs of the three-month moving average estimate of the number of unemployed to be under 25%.

2.3.2 Allocation of the sample funded by ESDC

This step of the LFS sample allocation involves adding sample funded by ESDC to the core sample funded by Statistics Canada. LFS unemployment rate estimates for EIERs are used by ESDC to establish employment insurance eligibility criteria and duration of benefits in each region. To help improve precision of unemployment rate estimates for EIERs, ESDC pays for an additional sample of 16,600 households.

ESDC eligibility criteria and benefits are determined based on ranges of the unemployment rate. ESDC needs comparable precision of estimates in all EIERs, except when region’s unemployment rates are in the highest and lowest ranges. The lowest range is 6% and lower; therefore, in regions with low unemployment, ESDC just needs to be able to determine that the unemployment rate is below 6%. Similarly, the highest range is above 13%; therefore, in regions with very high unemployment, ESDC just needs to be able to determine that the unemployment rate is above 13%.

Before allocating the sample funded by ESDC, the number of units allocated to each EIER by the sample funded by Statistics Canada had to be determined. First, the sample in an ER or CMA was proportionally allocated to the ER-EIER-CMA intersections based on the size of each intersection. By summing the intersections, the Statistics Canada sample allocated to each EIER was determined. For the sample funded by ESDC, the target sample size in each EIER was based on the following criteria:

For each EIER^{Note 5}, the CV of the estimated unemployment rate by three-month moving average must be less than 15%. However, in regions with low unemployment, ESDC just needs sufficient quality to conclude that unemployment rate is below 6%. Therefore, for rates less than 4.8%, the sample size required is such that the standard error must be small enough to conclude the unemployment rate is less than 6%. The value of 4.8% was chosen because it is the maximum value for which the upper bound of the confidence interval on the unemployment rate will be 6% when the CV is 15%.
The minimum sample size for each EIER is 500;
The quality of the estimates produced for each EIER must be similar from one EIER to another.

Once again, non-linear programming was used to solve this problem. After allocating the sample funded by ESDC, the total sample size (Statistics Canada and ESDC) assigned to each EIER was allocated to the ER-EIER-CMA intersections, proportionally to the size of each intersection. This new allocation was then compared to the one used before the redesign to identify potential errors in the model used and to predict the effectiveness of the new design. Changes in regional sample sizes and predicted CVs were used to evaluate the new design.

After this final step of sample allocation, two parameters were produced: the inverse sampling ratio (ISR) and the number of sampled households required for each intersection. The inverse sampling ratio is the number of households in the intersection divided by the number of households allocated to the sample for the intersection. It is used to determine the size and number of design strata (see Section 2.5) and during the sample selection process (see Section 2.6).

2.4 Creation of PSUs

In Section 2.2, the basic design was presented where the LFS has a selection of clusters as primary sampling units (PSUs) followed by a selection of dwellings as the secondary sampling units (SSUs)^{Note 6}. The first step of the two-stage sample design is determining the geographic boundaries of the PSUs used for the first stage of sample selection based on size, shape and other factors.

The determination of the size and shape of the PSUs is a compromise between collection costs and the sample design’s efficiency. From a cost perspective, collection is cheaper if the shape of PSUs is geographically compact and contiguous, reducing travel time between the sampled dwellings within the PSU. If PSUs are made too large, it is too costly for an interviewer to visit all the sampled dwellings frequently enough to get responses. On the other hand, small PSUs remain in the sample for less time, which increases costs associated with PSU replacement. Also, when the PSUs are small, many PSUs will need to be selected. The selected PSUs will be generally further away from each other, again increasing the travel costs.

From a design perspective, the LFS could select either a few PSUs with many dwellings selected in each or many PSUs with a few dwellings selected in each. The latter case leads to a more efficient survey design^{Note 7}. However, moving towards such a design negates the advantages of a clustered design.

To determine the ideal size of a PSU given the above considerations, two elements are necessary. The first is a tool to evaluate the sampling variance resulting from different scenarios. This tool can be built using census data. The second is a relatively accurate model to estimate the collection costs for different scenarios of PSU size and the number of dwellings selected per PSU. To build this model, detailed information on costs is needed. Due to the complexity and recent major changes in the collection strategy (see Chapter 4), it was virtually impossible to build a valid updated cost model. Therefore, it was not possible to re-evaluate the ideal size of the PSUs. The ideal size of the PSUs that was used in the two previous redesigns (200 households) was maintained and used once again as a target for this design.

Once a target PSU size was established, PSUs were built from the standard geographical unit of Dissemination Areas (DAs) from Census 2011. This has several advantages. Using DAs as a basis for PSUs removes the need to create new geography definitions for LFS, as was done in previous designs. This streamlines the PSU creation process and reduces redesign costs. For analysis, using standard geographical units helps in comparing estimates across surveys, linking to auxiliary data from Census and other sources, and multi-level modeling. Also, since many household surveys typically sample for regions defined by standard geographical units, using DAs as PSUs simplifies use of the LFS frame by other household surveys which includes the ability to update as new DAs are defined as will be the case with the 2016 Census.

Unfortunately, some DAs are too large and others are too small for the ideal constraint of 200 dwellings per PSU. Simulation showed that having a lot of variability in PSU size would increase sampling variance under the LFS sample selection strategy (see Section 2.6). Some variability was inevitable in order to stay close to standard geographical units. An acceptable range of 100 to 600 households per PSU was determined. DAs below this range were joined with other contiguous DAs to form larger PSUs. DAs above this range were split into smaller contiguous and compact PSUs at the level of Census 2011 Dissemination Blocks or block faces.

An exception to this rule was made in Toronto. Toronto contained many DAs with more than 600 households, yet many were difficult to split into smaller PSUs due to the presence of high-rise apartment buildings with more than 600 units. It is inconvenient to split a single building into more than one PSU. First, unit occupancy changes frequently, making it difficult to accurately control for the number of households in each piece when dividing up the building. Also, once an interviewer has established regular access to an apartment building, it is quite efficient to continue to collect from other units in the building. This means that this large PSU would not have the same high costs as a large PSU of detached houses. Therefore, in Toronto many PSUs were created containing between 600 and 1000 households. To avoid a negative impact on design efficiency, these PSUs were grouped together into special strata (see Section 2.5.4).

Once all PSUs were created, a detailed analysis was done to identify those that were far from an urban centre and would probably have a very high collection cost. Depending on the situation, these PSUs were either stratified separately (see Section 2.5) or excluded from the survey frame. Under the previous LFS design, less than 1% of households in Canada were excluded. For this redesign, approximately 100,000 additional households were excluded in northern areas of the ten provinces, bringing the rate of exclusions up to 1.5%. See Appendix B.1 for more details on excluded areas. Excluding persons belonging to the target population of a survey inherently introduces bias into the survey estimates; however, the cost required to cover these regions was deemed too high relative to the potential impact on estimates.

2.5 Stratification

Stratification is the process whereby the population is divided into homogeneous, mutually exclusive groups called strata, in order to improve the efficiency of the sample design. In many surveys, strata are defined based on geographic domains of interest. In the case of the LFS, strata are formed within each domain of interest: ER-EIER-CMA intersections. This extra stratification ensures that the survey can accommodate the rotation, allocation and selection constraints that are described in this chapter.

The first step is to determine how many strata are needed within a domain. Once the number of strata is determined, the strata can be defined based on geographic, socio-economic and efficiency constraints and the PSUs can be grouped into these strata. Stratification will improve the sample design’s efficiency if the PSUs grouped together are homogeneous, meaning that the households therein have similar characteristics. Once this process is complete, a survey frame can be created, containing all of the PSUs and their corresponding strata.

2.5.1 Changes made during this redesign

Two past strategies, which were previously introduced to the LFS stratification methodology in order to reduce the costs associated with in-person collection and listing, were discontinued for this design. First, in isolated urban areas , a three-stage sample design had been used so that, in the first stage, only one of a group of population centres would need to be visited at a time. Second, the rural PSUs that were the most expensive for in-person collection (due to high vacancy, distance from urban centres or lack of road access) had been stratified separately and given a reduced sampling rate to minimize the number of in-person visits needed. These innovations reduced collection and listing costs, but also decreased the efficiency of the sample design. In addition, they could lead to shifts in some local industry employment figures, especially when a particular PSU with many workers associated with a given industry was replaced by another PSU in a different area with a different dominant industry. Now, with more interviews handled by telephone instead of in-person (see Chapter 4), the expanded exclusion of high-cost remote areas (mentioned in Section 2.4), and the reliance on the DUF to provide dwelling lists, the potential cost savings offered by these two innovations appeared less favourable compared to the loss of design efficiency and the impact on estimates; therefore, they were discontinued for this redesign.

For the redesign, two-stage sampling was used in all provinces except PEI. Other changes in the stratification methodology include: PEI being stratified differently to facilitate the new one-stage design (see Section 2.5.6), a new type of special stratum being introduced to deal with the large PSUs in Toronto (see Section 2.5.4), and the specific needs of the Canadian Community Health Survey (CCHS) being taken into account (see Section 2.5.3).

2.5.2 Stratum size

The size and number of strata in each ER-EIER-CMA intersection is determined based on the sample allocation, the number of PSUs to select in each stratum and the number of households to select in each PSU (also called the sample take or density factor). The allocation to each intersection was explained earlier. The number of PSUs is based on the rotation strategy where one sixth of the sample rotates every month. To implement this approach, it is preferable to select six PSUs (or sometimes twelve) in each stratum. Finally, past studies have determined that in order to improve the sample design’s efficiency, the sample take for a PSU should be ten in rural strata, eight in urban strata, and six in strata covering the Montréal, Toronto and Vancouver CMAs. Selecting more households per PSU in the rural strata reduces the travel costs per unit for in-person collection. At the other end of the spectrum, selecting six households per PSU in the largest CMAs helps to increase the number of PSUs required in the sample, improving the precision of the estimates. This reduction of the sample take also increases the number of strata needed. More and smaller strata should lead to an increase in the homogeneity of PSUs within, which should also improve the efficiency of the sample design.

By combining these constraints (allocation requirements, six PSUs selected per stratum and a fixed number of households selected in each PSU), the size requirement of each stratum within an intersection, in terms of households, can be calculated as:

$M_{h} = I S R \times 6 \times m_{h}^{*} (2 .1)$

where

M_{h}

is the number of households to group together in each stratum of an intersection.

I S R

is the inverse sampling ratio as established during the first two steps of the sample allocation.

m_{h}^{*}

is the number of households to select per PSU. As explained in the previous paragraph, this number varies by the population density of the region (rural, urban, three largest CMAs).

The number of strata needed in each region can be determined by dividing the number of households in a region by this result and rounding the result to the most appropriate integer. Usually, the strata within an ER-EIER-CMA intersection that this process creates are approximately the same size.

2.5.3 Adjustments to geographic boundaries

Using the stratum size expression described above, it was not possible to create strata in some small ER-EIER-CMA intersections. Consequently, these small intersections were combined with a neighbouring intersection. Combining was done so that the combined group respected the boundaries of the CMA or EIER as much as possible. This approach implicitly gives more importance to the estimates by CMA and by EIER than by ER; therefore, the efficiency of the sample design decreased at the ER level, but was maintained for the EIERs and CMAs. In cases where 2011 Census boundaries for a CMA no longer matched the boundaries for the EIER representing that city^{Note 8}, the resulting small intersections were treated as if the EIER boundaries matched the CMA boundaries. After combining the small pieces, there were 120 intersections covering the ten provinces in which stratification occurred.

Outside CMAs, it was favourable to create separate strata for urban and rural areas for three reasons: rural strata have more households than urban strata (see Equation 2.1 in Section 2.5.2); persons residing in rural areas have different characteristics from those residing in urban areas; and stratification that respects these areas allows for the implementation of more appropriate collection strategies. In some cases, an urban or rural area is too small to create a stratum using the size determined in Section 2.5.2. In such cases, it is necessary to combine the area with a neighbouring urban area or a rural area. Each case was evaluated separately.

The CCHS is a regular user of the area frame, and they rely on the LFS stratification methodology to identify the PSUs for their sample. For the LFS redesign, it is beneficial to make some adjustments to support their needs and minimize the impact on the LFS. The CCHS samples at a much higher rate in some rural regions than the LFS does and their geographic domains do not always correspond well with LFS regions. In the past, the CCHS would simply select more dwellings within the limited PSUs that were available causing much faster PSU rotation than planned (see Section 2.6.4) or select more PSUs than required by the LFS. For the redesign, the solution was to create additional CCHS specific strata that would ensure that there were enough PSUs selected in those regions to meet the CCHS requirements. This has less impact on LFS operations.

After the adjustments to the geographic boundaries, PSUs can be grouped into strata. Some PSUs were assigned to special strata (see Section 2.5.4), and the remaining PSUs in each intersection were stratified geographically and then optimally (see Section 2.5.5).

2.5.4 Special strata

Special strata can be divided into two categories: those created to improve efficiency, and those created to target specific populations. The first category is used to group remote PSUs as well as PSUs with a large number of dwellings in Toronto. The second category of special strata helps target sub-populations of interest for analysts who use LFS data.

Strata used to group inconvenient PSUs

Two special strata were created to group inconvenient PSUs: remote strata containing PSUs that are geographically isolated and difficult for in-person collection, and strata in Toronto containing PSUs with a large number of dwellings. By grouping these PSUs, the rate at which these PSUs are selected can be controlled.

A significant part of Canada is inhabited by a small portion of the population. Collection costs are high in regions with a small population, while the impact of these regions on the main LFS estimates is relatively low. Such PSUs were identified using data from Census 2011 on population density, distances to urban centres and accessibility by road. If there were enough of these PSUs in a province, they were grouped together into a remote stratum. By assigning these regions to specific strata, the number of these PSUs selected in a given sample can be better controlled, thereby better controlling the assignment of LFS resources.

As mentioned in Section 2.4, Toronto contains many PSUs with between 600 and 1000 dwellings, unlike other LFS PSUs in the provinces. If PSUs with 1000 dwellings were stratified with PSUs with 100 dwellings, there would be a considerable increase in sampling variance given the sample selection strategy used. However, the design stays efficient if PSU sizes are relatively homogeneous within a stratum. Since these PSUs could not be split any further, they were instead grouped together as special strata. That way, the variability of PSU sizes within Toronto strata is reduced leading to a more efficient design.

Table B.2 in Appendix B presents the number of households in the first-category special strata.

Strata to target certain sub-populations

Three types of sub-populations were targeted by special strata: households with high income, Aboriginal people , and recent immigrants. For simplification, the terms high-income strata, Aboriginal strata, and immigrant strata will be used from now on, although this is technically incorrect since these strata do not only contain high-income households, Aboriginal people, or immigrants. High income strata were created in most large CMAs. Aboriginal strata were created in British Columbia, Alberta and Saskatchewan. Immigrant strata were created in Manitoba only .

Because the LFS samples clusters of dwellings, instead of persons directly, it is difficult to target these rare sub-populations, especially when they do not all live in the same neighbourhood. Even in a neighbourhood with a higher prevalence of the sub-population, many households still will not have any members of the sub-population, so a sample of these dwellings may not yield many more members than a sample from a different region. Since there is no better tool available within the constraints of the LFS design to target sub-populations, special strata can help by at least ensuring that a PSU with higher prevalence is selected.

For special strata to effectively cover a target population, they need to have a higher prevalence of the target population and represent a large proportion of the overall target population. However, even if they produce good estimates for their target population, special strata cannot be justified if their introduction leads to a significant decline in the quality of the main LFS estimates. In order to find a viable compromise, two guidelines from a study done for the last redesign were used. The first guideline states that the strata must be created based on the prevalence of specific characteristics. For example, it would be futile to create an immigrant stratum in northern Manitoba, where the proportion of immigrants is very low. The second guideline states that no more than 8% of a domain can be used to create each type of special stratum. This limitation guarantees that the creation of these strata will not have a major adverse effect on the main LFS estimates, based on a study conducted using 1996 and 2001 Census data.

Using these two guidelines, the special strata were created in sequence. For each category, they were created by identifying the PSUs with the highest prevalence of the sub-population of interest. PSUs were then continually added to strata in decreasing order of prevalence until the 8% limit for the domain was reached. Using this approach, these strata are not contiguous and may be quite spread out geographically.

High-income strata were created first. PSUs in a given CMA were first classified in descending order based on the proportion of households with an income over $150,000 based on the 2012 T1 Family File (T1FF) generated from 2012 tax returns received by Canada Revenue Agency^{Note 9}. PSUs at the top of this list were assigned to a high-income stratum until the stratum’s pre-determined size had been reached (see Section 2.5.2). If the limit of 8% was not attained, another high-income stratum was created for the same CMA. In this way, high income strata were created for most CMAs.

To create the Aboriginal strata, the basic strategy had to be slightly modified. The high-income strata respect the CMA boundaries, but a significant number of Aboriginal people live outside these boundaries, so special strata were created separately in CMAs and outside CMAs. Furthermore, some ER-EIER intersections outside CMAs were too small to form an Aboriginal stratum, although several PSUs in these intersections had a high proportion of Aboriginal households. To remedy this problem, the Aboriginal strata created outside CMAs respect the boundaries of just the EIERs, rather than those of the ER-EIER intersections. Finally, PSUs already assigned to a remote stratum could not also be assigned to an Aboriginal stratum. Among the remaining PSUs in the combined intersections, PSUs were put into the strata in descending order based on the proportion of households with at least one person who reported having an Aboriginal identity on the 2011 National Household Survey until the 8% limit was reached. As with high income strata, multiple strata were created where necessary to reach the 8% limit.

To create the immigrant strata in Manitoba, strata also had to be created both inside and outside CMAs. Since most of the recent immigrant population of Manitoba resides in Winnipeg and the prevalence of immigrants was low elsewhere, only two immigrant strata were created outside Winnipeg. For these two strata, PSUs outside Winnipeg were put into descending order based on the proportion of households with at least one person who had immigrated to Canada in the last ten years according to the 2011 National Household Survey.

Using only 8% of Winnipeg and strata outside Winnipeg would have provided inadequate representation of the target population, since recent immigrants are disproportionately located in Winnipeg. However, using more than 8% of the CMA to create strata by sorting PSUs in descending order of prevalence would have had a major adverse effect on main LFS estimates for Winnipeg. The end decision was that the top 25% of PSUs^{Note 10} in terms of prevalence were isolated and stratified into twelve strata using the same optimization algorithm used to stratify PSUs outside special strata (see Section 2.5.5). That way, more strata could be created to cover the target population without as much impact on LFS estimates. Simulations were conducted to evaluate alternate strategies and this was the most favourable option.

Tables B.3 in Appendix B give the number of households in the special strata, the prevalence of the target population and the proportion of the sub-population covered by the special strata.

2.5.5 Stratification of the remaining PSUs

After forming special strata, which only cover a small portion of the Canadian territory, the remaining PSUs in the nine provinces other than PEI^{Note 11} are stratified within the geographic regions discussed in Section 2.5.3. To determine the number of strata that needed to be created in a region, the number of households not in special strata was divided by the targeted stratum size from Equation (2.1) in Section 2.5.2. Since this quotient is not an integer, the result was rounded up^{Note 12}. If more than one stratum was needed for a region, the PSUs were stratified first geographically and then optimally (both described below).

Geographic stratification

In the case of CMAs, each was divided into several pieces that would serve as the basis for stratification. The regions considered were the largest Census Subdivision (CSD), the second-largest CSD, the third-largest CSD, the remaining urban PSUs, and the rural PSUs. These regions were created only if the CMA required several strata and if the region in question met the targeted stratum size. Otherwise, they were combined with other regions.

Within a region, if only one stratum was needed then stratification is complete. If between two and nine strata were needed, the PSUs were stratified optimally (described below) within the piece. If more than ten strata were needed, the piece was first divided into super-strata – compact areas with a similar number of households – to ensure better geographic distribution of the selected PSUs in a sample. PSUs were then optimally stratified within the super-strata.

Outside CMAs, the regions were defined using the largest Census Agglomeration (CA) in an EIER, the remaining urban PSUs and the remaining rural PSUs. Within each piece, PSUs were assigned to the required number of strata using optimal stratification.

Optimal stratification

After geographic stratification, PSUs in regions that needed two or more strata were stratified optimally. The purpose of optimal stratification is to reduce the sampling variance of several variables of interest by grouping together PSUs with similar characteristics, creating strata that are as homogeneous as possible while conforming to the stratum size constraints determined in Section 2.5.2. This was achieved using the iterative process described below.

The algorithm used for optimal stratification is based on an iterative method developed by Friedman and Rubin (1967) and modified by Drew, Bélanger and Foy (1985). Starting with a random initial stratification with equal-sized strata, the algorithm exchanges a PSU between two strata and checks whether this new stratification decreases a weighted sum of squares of auxiliary data. If the sum of squares decreased, the new stratification replaces the previous one; otherwise, the previous stratification is retained. PSU exchanges continue iteratively until no exchange leads to a decrease. The process is then repeated using different initial stratifications. The stratification associated with the smallest variance is retained^{Note 13}.

The weighted sum of squares is calculated over several auxiliary characteristics. The list of these characteristics of interest (29 in total) is identical to the list from the last redesign and is available at the end of Appendix B. Household income was given three times the weight compared to the rest of the characteristics in the weighted sum of squares because income is correlated with several LFS variables. All other variables were given equal weight in the process.

After both geographic and optimal stratification are complete, the LFS geographic variable is defined by assigning unique identifiers to each created stratum and each PSU assigned to that stratum. The result is a completed LFS area frame in all provinces except for PEI. This frame is used for the first stage of sample selection (Section 2.6).

2.5.6 Creating strata in Prince Edward Island

For the redesign, it was decided to use one-stage sampling in PEI. As explained in Sections 2.1 and 2.2, the primary gain from two-stage sampling is to reduce the geographic spread of selected samples, thus reducing travel costs in survey collection, though with a loss in efficiency of the design. However, since PEI is a small province and many cases are now handled by telephone instead of in-person, travel costs in PEI are minimal. Historically, the other reason for two-stage sampling was the lack of a complete list of dwellings for an entire province, which limited the survey design options. However, with recent improvements to the Dwelling Universe File (Section 2.2.1), a reasonably up-to-date list of all dwellings in PEI is now available. While the CVs of monthly unemployment estimates for PEI were often far above the 7% target, it was not prudent to increase the sample size as PEI already has the highest sampling rate in the country. Therefore, it was decided to simply select dwellings in PEI from a list at random (one-stage systematic random sampling) to improve the efficiency of the design. Since CCHS needs to sample dwellings in PEI at a slightly higher rate than LFS, the ISR (Section 2.3.2) was adjusted so that each sample would contain enough dwellings for either LFS or CCHS.

Even though there are no PSUs to stratify, it was still advantageous to stratify the province for several reasons. Without strata, selecting dwellings at random could result in samples where only one dwelling is selected in one part of the island. In terms of the collection strategy, it would be hard to create a full-time workload for an interviewer covering that region and potentially very costly if that interviewer was also covering other parts of the island. In terms of sample design, that part of the island would be poorly represented in the sample and sampling variance would likely increase if that part of the province had different characteristics. Creating strata ensures that each LFS sample better represents the whole province. To stay consistent with standard geography units, geographically contiguous strata were created using Census DAs.

As explained in Section 2.2, one-sixth of the sample is replaced every month, so the PEI sample needed to be split into six rotation groups. This is more challenging without clusters in a stratum.^{Note 14} To address this, PEI geographic strata were pooled together in groups of six to form super-strata. The process controlled for the number of households in the super-strata and minimized the distance between the strata within. A modified version of the Friedman-Rubin algorithm discussed above was used. These super-strata also respected the boundaries of Charlottetown and Summerside, the two major cities in the province. Then, each stratum in a super-stratum was randomly assigned to one of six rotation groups in a way that balanced the overall number of occupied dwellings in each rotation group. Thus, one-sixth of the PEI sample can be replaced each month.

2.6 Sample selection strategy

Once allocation and stratification are completed, all the pieces are in place to select the sample. This section provides a conceptual description of the selection and rotation method used by the LFS in the provinces other than PEI. For PEI, systematic random sampling of dwellings within strata is used. Additional information on processing the growth and maintenance of the survey frame is given in Chapter 3. For a more detailed description of the sampling probabilities and the sampling weights, refer to Chapter 6.

2.6.1 Sample allocation of PSUs to Strata

When a two-stage design is used, survey theory stipulates that it is preferable to select the PSUs with a probability proportional to their size when this size measurement is also correlated to the estimates of interest. This is the case for the LFS. For example, the number of persons who work in a PSU is strongly correlated to the number of persons who live in the PSU. Therefore, the PSUs for the LFS are ideally selected with a probability proportional to their size. The size measure used for LFS is based on the number of households in the PSU as estimated using the DUF (explained in Section 2.2.1)^{Note 15}.

The first step is to determine the number of PSUs to select in each stratum. By design, as described with determining the size of the strata, this should be six. However, due to rounding, the creation of special strata and other factors, the number of strata defined and the required sample size may not correspond to six PSUs being selected. Also, to simplify the sample rotation process – where one-sixth of the sample rotates out every month – it is preferable to select a multiple of six PSUs in each stratum. The rotation method will be discussed in detail later in the chapter.

To determine the number of PSUs to select in a stratum, the number of households to survey in the stratum is needed. Up to this point the allocation is only known at the ER-EIER-CMA intersection level. Strata are given the same sampling rate as the intersection in which they are located. Thus, within an ER-EIER-CMA intersection, the constant sampling rate implies that the sample is allocated to strata proportionally to the number of households. The household stratum allocation is given by the number of households in the stratum divided by the inverse sampling ratio (ISR) for the ER-EIER-CMA intersection.

The number of PSUs to select in the stratum is determined by the household stratum allocation divided by the target number of households to survey per selected PSU. As discussed in Section 2.5.2, this target number is six households per PSU in the Montréal, Toronto and Vancouver CMA strata, eight in the urban strata outside these three CMAs, and ten in the rural strata. If the result of the second division is closer to six PSUs than to twelve, six PSUs will be selected in the stratum. Otherwise, twelve PSUs will be selected. This can result in eighteen PSUs in rare situations.

The approach described below is based on selecting six PSUs. The same approach applies when twelve or eighteen PSUs are selected.

2.6.2 Overview of the RHC method

The LFS selects the PSU sample using the Rao-Hartley-Cochran (RHC) method. The RHC method is used because it allows the selection probabilities to be updated when strong growth is observed in some PSUs. The method described in Keyfitz (1951) can be combined with the RHC method to update the probabilities while maximizing the overlap of the selected PSUs before and after the update. For more information on the RHC method, see Rao, Hartley and Cochran (1962).

When using the RHC method to select multiple PSUs in a stratum, all the PSUs must first be distributed into groups, each containing roughly the same number of PSUs – plus or minus one. In the case of the LFS, the groups used are the six rotation groups. After the PSUs have been distributed to the rotation groups, one PSU is selected per group with probability proportional to size within the group. This can be summarized by the following equation:

$π_{h i j} = \frac{M_{h i j}}{\sum_{j \in h i} M_{h i j}} (2 .2)$

where

M_{h i j}

is the number of households in PSU j in rotation group i of stratum h.

\sum_{j \in h i} M_{h i j}

is the total number of households in all the PSUs in rotation group i of stratum h.

π_{h i j}

is the selection probability of PSU j in rotation group i of stratum h.

Rather than using the number of households, the LFS uses the rounded inverse sampling ratio of the PSU $(I S R_{h i j}^{*})$ as a size measure for the PSU – described below. These values are used mainly because of the way the sample is selected in the second stage. It will be shown later that this is not an extreme departure from using the number of households in terms of sampling probability for the PSU.

Second-stage selection probabilities

Dwellings are selected from within the selected PSUs with probabilities that ensure that all households in the stratum have the same overall probability of selection. This is often referred to as a self-weighted design.

At the second-stage, dwellings are selected from the PSU listing line generated by the Address Register and/or field listing of the PSU^{Note 16} using systematic sampling where households are selected at regular intervals. This method is recommended because it is simple to use, ensures a good distribution of the households selected in the PSU, controls the overlap of samples and facilitates adding new dwellings to the PSU. To select the systematic sample of dwellings, the PSU ISR, $I S R_{h i j}$ , and a starting point on the list must be determined. $I S R_{h i j}$ can be obtained from the number of households in the PSU and the ISR of the stratum, $I S R_{h}$ , using the following equation:

$I S R_{h i j} = (\frac{M_{h i j}}{\sum_{j \in h i} M_{h i j}}) I S R_{h} (2 .3)$

where

I S R_{h i j}

is the inverse sampling ratio in PSU j in rotation group i of stratum h.

I S R_{h}

is the inverse sampling ratio of stratum h established during the allocation of the sample.

Since $I S R_{h}$ is constant for all the PSUs of a group, $I S R_{h i j}$ is proportional to the number of households in the PSU. For the redesign, a switch to simple random sampling was considered. However, a study showed that adjacent dwellings have correlated responses. This implies that for monthly estimates, systematic sampling should lead to a better representation of the PSU since the sample is guaranteed to be spread out over the entire region. Also, systematic sampling can give reduced variance of estimates of month-to-month change as neighbouring households are rotated in to replace those who are rotated out.

The LFS selection system cannot use these ISRs directly and instead is configured to use integer inverse sampling ratios ISR*. The result of Equation 2.3 is therefore rounded up or down so that

$\sum_{j \in h i} I S R_{h i j}^{*} = I S R_{h i}^{*} = I S R_{h}^{*}, \forall i \in h (2 .4)$

This is called controlled rounding. The second-stage selection probability of a household when PSU j in rotation group i of stratum h is selected is $1 / I S R_{h i j}^{*}$ .

The rounded value $I S R_{h i j}^{*}$ has some useful interpretations. First, it is the sampling interval to use in systematic sampling if the corresponding PSU is selected in the first stage. By applying this sampling interval, the appropriate number of households will be selected in the PSU^{Note 17}. Second, $I S R_{h i j}^{*}$ is the number of distinct samples available in the PSU. In LFS terminology, this concept is called the number of random starts.

First-stage selection probabilities

As stated earlier, a self-weighted design is achievable when the first and second stage probabilities are in agreement. The second stage probabilities were defined earlier as $1 / I S R_{h i j}^{*}$ . In order to preserve the self-weighting aspect, the values $I S R_{h i j}^{*}$ must be used as size values in the probability proportional to size sample. The first-stage selection probability associated with each PSU is therefore:

$π_{h i j}^{*} = \frac{I S R_{h i j}^{*}}{\sum_{j \in h i} I S R_{h i j}^{*}} = \frac{I S R_{h i j}^{*}}{I S R_{h}^{*}} . (2 .5)$

This is not an extreme change from using the number of households as a size measure. As stated earlier, $I S R_{h i j}$ is proportional to the number of households in the PSU. In this case,

$π_{h i j}^{*} = \frac{I S R_{h i j}^{*}}{\sum_{j \in h i} I S R_{h i j}^{*}} \approx \frac{M_{h i j}}{\sum_{j \in h i} M_{h i j}} = π_{h i j} . (2 .6)$

The only difference between these two probabilities is due to the controlled rounding of $I S R_{h i j}^{*}$ . As a result, the overall selection probability of household k in PSU j in rotation group i of stratum h is:

$π_{h i j k}^{*} = \frac{I S R_{h i j}^{*}}{\sum_{j \in h i} I S R_{h i j}^{*}} \times \frac{1}{I S R_{h i j}^{*}} = \frac{1}{I S R_{h}^{*}} . (2 .7)$

As required, Equation 2.7 suggests that the selection probability is the same for all households in the same stratum. The LFS sample design is therefore self-weighted within the stratum.

2.6.3 PSU and start selection

In practice, to select a PSU in a rotation group, the PSUs of a rotation group are put in random order. A random whole number U is then drawn from a uniform distribution on the interval $[1, I S R_{h}^{*}]$ . This random number U has two functions. First, it is used to identify the first PSU selected. This PSU is the first for which the cumulative total of the $I S R_{h i j}^{*}$ is greater than or equal to U (or $\sum_{j \leq j^{*}} I S R_{h i j}^{*} \geq U$ where the indicator j follows the random order).

It also determines the number of random starts to use in this first PSU before moving on to the next PSU. The number of starts to use in the first PSU is $D_{j^{*}} = (\sum_{j \leq j^{*}} I S R_{h i j}^{*}) - U + 1$ . Lastly, a second random whole number $U_{j^{*}} \in [1, I S R_{h i j^{*}}^{*}]$ ^{Note 18} is selected. This number indicates the first random start to use to select the sample of dwellings for the PSU j*. The systematic sample for a selected PSU hij is composed of the dwelling whose line number on the dwelling frame is equal to the starting point $U_{j^{*}}$ , and of other dwellings whose additional lines are in intervals of $I S R_{h i j}^{*}$ . Therefore, dwellings are selected with line numbers such that $d = U_{j^{*}} + t \times I S R_{h i j}^{*}$ , t = 0,1,2,… until d exceeds the number of lines available.

These dwellings will remain in the sample for a period of six months.

Gray (1973) and Alexander, Ernst and Haas (1982) use two different approaches to illustrate that this method produces a sample that respects the selection probabilities specified. Laflamme (2003) demonstrates the sample selection process using a diagram.

2.6.4 Sample rotation

Section 2.6.3 describes how the first sample of dwellings was selected in each group created using the RHC method. After a period of six months, it is necessary to replace this sample with new dwellings. By continuing with the example given at the end of the previous section, the first sample corresponded to the random start $U_{j^{*}}$ of the PSU j*.

If the number of random starts to use from PSU j* is $D_{j^{*}} = 1$ , the second sample of dwellings will correspond to the start $U_{j^{*} + 1}$ of the next PSU, j*+1, where $U_{j^{*} + 1} \in [1, I S R_{h i (j^{*} + 1)}^{*}]$ . Otherwise, if $D_{j^{*}} > 1$ , the second sample will correspond to the start $U_{j^{*}} + 1$ of PSU j* (i.e., the neighbours of the previous sample). If $U_{j^{*}} + 1 > I S R_{h i j^{*}}^{*}$ , the selection loops back to use start 1 of PSU j*. Generally speaking, with this method, PSU j remains in the sample for $D_{j}$ periods of six months. When it is necessary to replace the surveyed dwellings, the next random start is used. After $D_{j}$ periods, the sample moves to the value $U_{j + 1}$ of PSU j+1. This PSU will remain in the sample until all its random starts have been used. The same goes for the PSUs that are added to the sample at a later date.

This method produces the expected results: the selection probabilities are always respected over time. Unfortunately, it has a major inconvenience. As discussed, the first PSU remains in the sample for a random number of periods, and sometimes this number is small. This rapid rotation of the first PSU selected would lead to an inefficient use of the survey’s limited resources. In fact, adding a PSU to a sample requires a great deal of work, including preparing the material, possibly listing the PSU and sometimes hiring and training an interviewer. To be effective, it would be preferable to amortize this investment by avoiding a too-rapid rotation of the first PSU as much as possible.

To overcome this problem, the LFS developed a correction that increases the number of random starts to use from the first PSU without introducing a bias into the selection probabilities. When $D_{j^{*}}$ is too small, based on a pre-determined criterion, it is increased in order to keep this PSU in the sample longer. In this case, the number of starts to use for PSU j*+1 must be reduced proportionally in order to avoid introducing a bias into the selection probabilities. Some constraints are required to ensure that the increase in the number of starts associated with the first PSU will not reduce the number of starts to survey from the second PSU by too much. Gray (1973) shows that this approach does not bias the selection probabilities, while Laflamme (2003) provides explanations on these constraints.

This method is applied separately to each rotation group. However, the samples are not all rotated at the same time. An RHC group in rotation group 1 is rotated in January and July of every year. The RHC groups in rotation group 2 are rotated in February and August, and so on. By using this method, at the start of the redesign, a list can be produced containing all the starts that will be in the LFS sample for each month over the next ten years.

Figure 2.1 LFS Sample Rotation

Description for Figure 2.1

This diagram illustrates the LFS sample rotation design. The colors indicate which rotation group the dwellings belong to (Orange is rotation group 1, pink is rotation group 2, blue is rotation group 3, green is rotation group 4, grey is rotation group 5 and yellow is rotation group 6). The numbers in the boxes indicate the number of months that the dwellings associated with a given rotation group have been part of the survey. As shown by the diagram, one-sixth of the sample is renewed monthly. So, say in April, dwellings from the blue rotation group are in their second month of the survey, while dwellings from the grey rotation panel are in their sixth and last month of participation. Dwellings from the grey rotation panel will be replaced by new dwellings in May, as seen in the diagram. Dwellings that rotate-out are generally replaced by dwellings from the same respective primary sampling units.

Notes

Footnote 1.

They can also be called rotation panels. This is commonly referred to as a rotating panel survey design.

Return to note 1 referrer

Footnote 2.

LFS samples dwellings and not households (occupied dwellings). However, the allocation is expressed in terms of a targeted number of households. This number is used to determine the sampling rate of households in a region. When that sampling rate is applied to a list of dwellings, the number of dwellings sampled should on average yield roughly the target number of households. The precision will depend on the precision of occupancy rate and coverage of the DUF.

Return to note 2 referrer

Footnote 3.

There is a slight modification to the quality target when unemployment is low, explained further below.

Return to note 3 referrer

Footnote 4.

It was expected that Lethbridge would become a CMA after the 2016 Census.

Return to note 4 referrer

Footnote 5.

The Northern Manitoba EIER is an exception. Despite having some CVs above 15%, this region has historically had unemployment rate far above 13%, so better quality was not needed to determine EI benefits. Thus, to avoid high collection costs in Northern Manitoba, the sample size was kept at the same level as in the previous design.

Return to note 5 referrer

Footnote 6.

PEI, which has a one-stage design, does not have clusters.

Return to note 6 referrer

Footnote 7.

Taking this argument to the extreme, the ideal solution would be to create PSUs containing one dwelling each. This is equivalent to one-stage sampling. However, as discussed in Section 2.2, one-stage sampling is currently too expensive to implement outside of PEI.

Return to note 7 referrer

Footnote 8.

EIER geography definitions date back to 2000. Due to urban sprawl, many CMAs that were once EIERs have now grown beyond the EIER boundaries.

Return to note 8 referrer

Footnote 9

Since a T1FF record is not available for each household in a PSU, the proportion of high income households had to be estimated from the proportion among households with T1FF records in the PSU. In the majority of PSUs, T1FF records were available for 90% of households or more, so this estimate should be reliable. PSUs where only a few T1FF records were available were excluded.

Return to note 9 referrer

Footnote 10.

PSUs already assigned to high income strata were excluded. No Aboriginal strata were created in Winnipeg.

Return to note 10 referrer

Footnote 11.

Stratification in PEI is explained in Section 2.5.6

Return to note 11 referrer

Footnote 12.

In some cases this number was rounded down, usually if there were too few PSUs to create the extra stratum.

Return to note 12 referrer

Footnote 13.

This optimization method is known as random restart hill climbing.

Return to note 13 referrer

Footnote 14.

See Section 2.6.1 for how rotation groups are formed in the other provinces.

Return to note 14 referrer

Footnote 15.

In practice, the size measurement used is the inverse sampling ratio, which is derived from the number of households. More information on this calculation is provided in Section 2.7.1.

Return to note 15 referrer

Footnote 16.

Chapter 3 describes how the Address Register and field listing are used to create the sample frame.

Return to note 16 referrer

Footnote 17.

This target corresponds to the number of households in the group divided by the stratum inverse sampling ratio.

Return to note 17 referrer

Footnote 18.

This second random number has two functions. It takes into account the fact that the sample size associated with the last random start is sometimes smaller than that of the first starts. We therefore hope to stabilize the global sample size over time. It also lays the groundwork for applying the rule of the minimum number of starts to use.

Return to note 18 referrer

Date modified:: 2017-12-21

Language selection

Search and menus

Search

Methodology of the Canadian Labour Force Survey
Chapter 2 Sample design Methodology of the Canadian Labour Force Survey
Chapter 2 Sample design

2.0 Introduction

2.1 Some basic survey theory concepts

2.2 Overview of the sample design

2.2.1 The Dwelling Universe File and its impact on the sample design

2.3 Sample allocation

2.3.1 Allocation of the sample funded by Statistics Canada

Allocation to the ten provinces

Allocation to Economic Regions

Allocation to Census Metropolitan Areas

2.3.2 Allocation of the sample funded by ESDC

2.4 Creation of PSUs

2.5 Stratification

2.5.1 Changes made during this redesign

2.5.2 Stratum size

2.5.3 Adjustments to geographic boundaries

2.5.4 Special strata

Strata used to group inconvenient PSUs

Strata to target certain sub-populations

2.5.5 Stratification of the remaining PSUs

Geographic stratification

Optimal stratification

2.5.6 Creating strata in Prince Edward Island

2.6 Sample selection strategy

2.6.1 Sample allocation of PSUs to Strata

2.6.2 Overview of the RHC method

Second-stage selection probabilities

First-stage selection probabilities

2.6.3 PSU and start selection

2.6.4 Sample rotation

Notes

Methodology of the Canadian Labour Force Survey Chapter 2 Sample design Methodology of the Canadian Labour Force Survey Chapter 2 Sample design

2.0 Introduction

2.1 Some basic survey theory concepts

2.2 Overview of the sample design

2.2.1 The Dwelling Universe File and its impact on the sample design

2.3 Sample allocation

2.3.1 Allocation of the sample funded by Statistics Canada

Allocation to the ten provinces

Allocation to Economic Regions

Allocation to Census Metropolitan Areas

2.3.2 Allocation of the sample funded by ESDC

2.4 Creation of PSUs

2.5 Stratification

2.5.1 Changes made during this redesign

2.5.2 Stratum size

2.5.3 Adjustments to geographic boundaries

2.5.4 Special strata

Strata used to group inconvenient PSUs

Strata to target certain sub-populations

2.5.5 Stratification of the remaining PSUs

Geographic stratification

Optimal stratification

2.5.6 Creating strata in Prince Edward Island

2.6 Sample selection strategy

2.6.1 Sample allocation of PSUs to Strata

2.6.2 Overview of the RHC method

Second-stage selection probabilities

First-stage selection probabilities

2.6.3 PSU and start selection

2.6.4 Sample rotation

Notes

Acknowledgement

Note of appreciation

Standards of service to the public

Copyright

Methodology of the Canadian Labour Force Survey
Chapter 2 Sample design Methodology of the Canadian Labour Force Survey
Chapter 2 Sample design