Methodology of the Canadian Labour Force Survey
Chapter 7 Variance estimation Methodology of the Canadian Labour Force Survey
Chapter 7 Variance estimation

7.0 Introduction

In a survey based on a probability sample such as the Labour Force Survey (LFS), statistical inferences need to account for the sampling error. The variance measures the precision of an estimator. Because of the complexity of the estimation method and sample design, an explicit form of the variance estimator is not readily available for the LFS. The survey therefore uses a resampling method for variance estimation.

With the 2015 redesign, a major change to the LFS variance estimation methodology was introduced. Previously, variance estimation was based on a resampling method called the jackknife. A variance estimation system custom-built for the LFS used the jackknife method to produce variance estimates of totals, rates or proportions, changes, and moving averages. As of January 2015, variance estimation is based on a resampling method called the bootstrap. Each month, 1,000 sets of LFS bootstrap weights are generated, and these bootstrap weights can be used with various standard software packages to produce variance estimates. The variance estimates obtained using the new methodology are similar in value to those obtained using the old methodology. The main advantage of the new methodology is that once bootstrap weights are generated, they can be used to produce variance estimates for a much wider variety of analyses than the old system.

This chapter will describe how variance is estimated for the LFS. Section 7.1 presents the particular bootstrap method that is implemented, the Rao-Wu bootstrap. Sections 7.2 and 7.3 describe how the LFS bootstrap samples and bootstrap weights are generated. Section 7.4 discusses how the bootstrap weights are used to compute variance estimates.

7.1 The Rao-Wu bootstrap

The LFS uses the Rao-Wu bootstrap, as proposed in Rao and Wu (1988) and Rao, Wu and Yue (1992). The method was proposed for stratified multistage designs where the primary sampling units (PSUs) are selected using probability proportional to size with replacement (PPSWR) sampling. For the LFS, the PSUs are actually selected using PPS without replacement (PPSWOR). Särndal, Swensson and Wretman (1992, p. 154), states that the variance estimator for multistage sampling with PSUs selected without replacement can be approximated by the variance estimator for multistage sampling with PSUs selected with replacement, and that the approximation is conservative if the selection of PSUs without replacement is more efficient than the selection of PSUs with replacement. This is the case for the LFS.

The first step in applying the Rao-Wu bootstrap is to select bootstrap samples. For each stratum h, m_h PSUs are drawn using simple random sampling with replacement (SRSWR) from the original set of n_h sampled PSUs. For most applications of the Rao-Wu bootstrap at Statistics Canada, including the LFS, m_h is set to n_h− 1. This process of selecting bootstrap samples is repeated B times. The number of times the j^th PSU is selected in the bootstrap sample of the b^th replicate, called the multiplicity of the PSU, is denoted as $m_{h j}^{(b)}$ , where b=1,...,B. The multiplicities, $m_{h j}^{(b)}$ , have values between 0 and n_h− 1 inclusive, and satisfy $\sum_{j = 1}^{n_{h}} m_{h j}^{(b)} = n_{h} - 1$ for each bootstrap replicate and each stratum.

The next step is to produce B sets of bootstrap weights by applying an adjustment factor to the original survey weight. They are calculated as follows:

$w_{h j k}^{(b)} = \frac{n_{h}}{n_{h} - 1} m_{h j}^{(b)} w_{h j k}, (7 .1)$

where $w_{h j k}^{}$ is the survey weight for unit k in PSU j and stratum h, and $w_{h j k}^{(b)}$ is the bootstrap weight for the b^th replicate.

The B sets of bootstrap weights can be used to produce variance estimates for a variety of analyses. For an estimate, $\hat{θ}$ , of a population parameter, $θ$ , the bootstrap variance estimate is computed as follows. The estimate is calculated using each set of bootstrap weights, resulting in B estimates denoted as ${\hat{θ}}^{* (1)}, ..., {\hat{θ}}^{* (B)} .$ For example, suppose $\hat{θ}$ is an estimate of a total, given by $\hat{θ} = \sum_{h} \sum_{j} \sum_{k} w_{h j k}^{} y_{h j k}$ , where $y_{h j k}$ is the value of a variable of interest y for unit k in PSU j and stratum h. Then the estimate for the b^th bootstrap replicate is ${\hat{θ}}^{* (b)} = \sum_{h} \sum_{j} \sum_{k} w_{h j k}^{(b)} y_{h j k}$ . The bootstrap variance estimate is given by the variance of the B estimates

${\hat{V}}_{BOOT} (\hat{θ}) = \frac{1}{B} {\sum_{b = 1}^{B} ({\hat{θ}}^{* (b)} - {\hat{θ}}^{* (.)})}^{2} (7 .2)$

where ${\hat{θ}}^{* (.)} = \frac{1}{B} \sum_{b = 1}^{B} {\hat{θ}}^{* (b)}$ .

7.2 LFS bootstrap samples

To obtain stable variance estimates for various types of analyses, as many bootstrap replicates as possible should be made available. A compromise has to be reached between ensuring stability, and limiting the execution time and the size of files. The LFS has opted to generate 1,000 LFS bootstrap replicates each month. This ensures the stability of the variance estimates for the key survey estimates.

As described in Section 7.1, the first step in applying the Rao-Wu bootstrap consists in drawing 1,000 bootstrap samples at the PSU level, with n_h− 1 PSUs selected with replacement per stratum. A two-stage sample design is used for all provinces except Prince Edward Island (PEI), and the bootstrap samples are therefore selected at the cluster level. Since a one-stage sample design is used for PEI, the bootstrap samples are selected at the dwelling level.

The remainder of this section provides details on various considerations related to the generation of the LFS bootstrap samples. This is followed by Section 7.3, which describes the generation of the 1,000 sets of LFS bootstrap weights.

7.2.1 Strata with one selected PSU

To estimate the variance, each stratum should contain at least two sampled PSUs. This is usually the case for the LFS and it is always the case for the one-stage strata in PEI. Most two-stage strata in the provinces contain six sampled PSUs, one for each rotation. However, for various reasons, some strata may only have one sampled PSU on the final tabulation file. This can happen by design (a few three-stage strata in the previous design, transition between redesigns), or due to survey results (out-of-scope and non-responding dwellings). The single-PSU strata are handled in one of three different ways.

First, the three-stage strata in the provinces with only one selected PSU are handled by splitting the selected PSU. The PSU is split by the rotation group or by the second sampling stage unit (SSU). For these strata, the bootstrap samples are selected at the rotation group level or at the SSU level instead of the PSU level.

Second, the single-PSU strata that occur during the redesign transition period are handled by collapsing strata; this is discussed in more detail in Section 7.2.3.

Finally, the remaining single-PSU strata are handled by temporarily splitting the PSU into two parts based on whether the household identifier is even or odd. The strategy was chosen because it is easy to implement and requires no manual intervention. This situation happens rarely enough that the strategy used has no impact on the variance estimates at the provincial level.

7.2.2 Bootstrap sample coordination

The LFS produces estimates involving multiple survey months, such as estimates of change between periods and moving averages. The sample overlap and dependence that exists between months can be taken into account in the variance estimation through the coordinated bootstrap method proposed by Roberts, Kovacevic, Mantel and Phillips (2001). Their method takes the dependence into account by retaining the same bootstrap samples of PSUs from one month to the next.

In practice, the sampled PSUs in a stratum are not always the same from one month to the next, and the coordinated bootstrap needs to be adapted. A strategy is proposed in Neusy (2013) and Benhin and Mantel (2012) to adapt the coordinated bootstrap in the presence of change. There are potentially four different situations:

When the sampled PSUs in the stratum are the same in the current month as in the previous month, the previous month’s bootstrap sample can be used for the current month without any further work.
When the PSUs are not all the same but the number of sampled PSUs in the stratum remains the same for the two months, the coordinated bootstrap can be implemented by pairing each PSU in the current month with a PSU in the previous month. PSUs that are common to both months’ samples are paired; new PSUs replacing retired PSUs are paired with the PSU that they are replacing; and all remaining PSUs are randomly paired. The current-month bootstrap samples for the stratum are generated by transferring the multiplicities of the previous month to the current month: each current-month PSU receives the multiplicities of the previous-month PSU with which it is paired. This results in a Rao-Wu bootstrap sample with the same multiplicities in the current month as in the previous month for the PSUs that are common to both months.
When there are fewer sampled PSUs in the stratum for the current month than for the previous month, the coordinated bootstrap is adapted as follows. Each PSU in the current month is first paired with a PSU in the previous month as described in ii, leaving one or more previous-month PSUs unpaired. Each current-month PSU receives the multiplicities of the previous-month PSU with which it is paired, resulting in preliminary bootstrap samples for the current month. The sum of the multiplicities for the preliminary bootstrap samples is not necessarily n_h $-$ 1 for all the bootstrap replicates. This is because the multiplicities of the unpaired previous-month PSUs are not carried forward to the current month and because n_h is smaller than it was in the previous month. For the bootstrap replicates where the sum is less than n_h− 1, PSUs are randomly added to the bootstrap sample using SRSWR (i.e., the PSU multiplicities are increased) until the sum of the multiplicities is n_h− 1. Conversely, for the bootstrap replicates where the sum is greater than n_h− 1, PSUs are randomly dropped from the bootstrap sample (i.e., the PSU multiplicities are decreased) until the sum of the multiplicities is n_h− 1.
When there are more sampled PSUs in the stratum for the current month than for the previous month, an extra step is required to adapt the coordinated bootstrap. The current-month PSUs are paired, as many as possible, with the previous-month PSUs as described in ii. The current month has more sampled PSUs than the previous month so not all current month PSUs can be paired. The paired current-month PSUs receive their multiplicities from the previous-month PSUs with which they are paired. The multiplicities of the unpaired current-month PSUs (new PSUs) are generated using the $Binomial (n_{h}^{*} - 1, 1 / n_{h}^{*})$ distribution, where $n_{h}^{*}$ is the number of sampled PSUs in the previous month. This ensures that the expected multiplicities of the unpaired PSUs are the same as the paired PSUs. The multiplicities for the paired and unpaired current month PSUs together form preliminary bootstrap samples for the current month. The sum of the multiplicities for the preliminary bootstrap samples is not necessarily n_h $-$ 1 for all the bootstrap replicates. PSUs are randomly added or dropped from the bootstrap samples, as described in iii, until the sum of the multiplicities is n_h− 1for all bootstrap samples.

The strategies for handling increases or decreases in the number of sampled PSUs described in iii and iv maintain correct cross-sectional variance estimates, and provide some coordination for variance estimates involving multiple months.

For the LFS, the coordination for two and three-stage strata in the provinces is implemented as follows. The LFS bootstrap samples are based on the PSUs present in the current month’s final tabulation file, and the number of PSUs within most strata remains the same from one month to the next. This means that the coordination described in i and ii are most commonly used. However, there are sometimes differences in the number of PSUs, usually caused by a PSU with temporarily no respondents in the final tabulation file. If the number of PSUs decreases by one, then the adaptation to the coordinated bootstrap described in iii is used. If the number of PSUs decreases by more than one or if it increases, then new bootstrap samples are randomly selected using a fixed random seed that is assigned to each LFS stratum and kept until the next redesign. These fixed random seeds are used so that the same bootstrap samples are selected for a given stratum and number of PSUs.

Starting with the 2015 redesign, PEI uses a one-stage sample design, and PSUs are at the dwelling level. As described in Section 2.5.6, the PEI strata were formed based on the Census Dissemination Areas (DAs), and then assigned to one of six rotation groups. All dwellings in the same one-stage stratum belong to the same rotation group. Every six months, when a new sample of dwellings is rotated into a stratum, new bootstrap samples are also selected for that stratum. For the other five months, the bootstrap samples are coordinated using the strategies described in i, ii, iii and iv, depending on the situation.

7.2.3 Redesign transition period

The LFS sample was redesigned in January 2015 and the new sample was gradually phased-in from January to June 2015. Each month during the transition period, a rotation group from the old design was rotated out and replaced by a rotation group from the new design. In the end, the estimates are based on an integration of the two designs. The new LFS sample was selected independently from the old sample, so the bootstrap samples for the new design were also selected independently and separately from the bootstrap samples for the old design, i.e. without coordination.

As the old sample was gradually phased-out during the transition period, the number of sampled PSUs in the old design strata decreased each month. The bootstrap samples for the old design were coordinated during this period using the coordination strategy described in iii of Section 7.2.2. By the fifth month of the transition period, only one rotation group from the old design remained in the LFS sample, and therefore many strata were left with only one PSU. The single-PSU strata were randomly paired within the province and collapsed to form two-PSU strata. Preliminary bootstrap samples for the collapsed strata were generated using the previous month multiplicities of each PSU. Each collapsed stratum had two PSUs, so PSUs were randomly added or dropped from the bootstrap samples until the sum of the multiplicities was one for all the bootstrap samples of each collapsed stratum.

The bootstrap samples for the new sample were created by first generating bootstrap samples for the June 2015 sample, when the new sample was completely phased-in. The bootstrap samples were generated using a new set of random seeds that will be kept until the next redesign. Next, bootstrap samples for the new design were generated moving backwards from May to January 2015. While moving backwards, the number of sampled PSUs decreases each month. The same methodology used to coordinate the bootstrap samples for the old design moving forward through time was used to coordinate the bootstrap samples for the new design moving backward through time. The January 2015 sample contained only one rotation group from the new sample, and therefore many new design strata contained only one PSU. The single-PSU strata were collapsed and handled, as described previously for the old design strata in the fifth month of the transition.

7.3 LFS bootstrap weights

In order to properly estimate the sampling variability of an estimator, each of the weighting steps leading to the computation of the final weights should be repeated for each bootstrap replicate. Currently, only the final weighting step, composite calibration (see Section 6.3.1), is repeated for each bootstrap replicate. This was also the case for the previous variance estimation system based on the jackknife.

The following steps are performed to generate 1,000 sets of final LFS bootstrap weights for the provinces:

Initial bootstrap weights are generated for each household by applying Equation (7.1), using the multiplicities from the 1,000 LFS bootstrap samples and the household subweights from the current month LFS final tabulation file.
A separate set of composite control totals is required for each bootstrap replicate. The 1,000 sets of totals are calculated using the previous month’s final bootstrap weights. To do this, first, each set of replicate weights from the previous month’s bootstrap weight file is calibrated to the current month’s demographic control totals. Next, for each set of weights, provincial-level estimates for the 28 labour characteristics listed in Appendix G are calculated.
Composite auxiliary variables that correspond to the composite control totals are derived for the current month households, as described in Section 6.3.1. The characteristics are derived using the previous month’s final tabulation file for households that are common to both the current and previous month. The auxiliary variables of households missing from the previous month’s final tabulation file are imputed using donor imputation in the case of nonrespondents, and using mean imputation in the case of households from the birth rotation. The donor imputation for the nonrespondents is only performed once, whereas the mean imputation is performed separately for each of the 1,000 bootstrap replicates.
The initial bootstrap weights generated in step 1 are calibrated to the current month’s demographic control totals and to the composite control totals computed in step 2, using the composite auxiliary variables derived in step 3. The calibration is repeated for each bootstrap replicate.

Note that if negative weights are obtained in step 4, then the calibration is applied a second time to the calibrated weights, with the negative weights replaced by their value in the initial bootstrap weights file. If after this second round of calibration there are still negative weights, these negative weights are set to one and it is accepted that the control totals will not be satisfied.

Monthly LFS bootstrap weights have been generated beginning from 1998 for the ten provinces. They are now generated every month, as part of monthly production.

7.4 Variance estimation

The LFS bootstrap weights are used to compute variance estimates using Equation (7.2). The variance estimates can be produced using software packages, such as SAS (PROC SURVEYMEANS), Stata 9 or newer, SUDAAN and WesVar. Gagné, Roberts, and Keown (2014) and Phillips (2004) provide guidance on how to use bootstrap weights with these software packages.

In order to reduce the size of the LFS bootstrap weight files, the files contain one record per household. Like the survey weights, the bootstrap weights are the same for all household members, and so a person level bootstrap weights file can be generated by assigning the household level bootstrap weights to each member of the household.

As described in Section 7.1, the bootstrap variance estimate for an estimate, $\hat{θ}$ , is obtained by first computing the estimate with each set of bootstrap weights to obtain ${\hat{θ}}^{* (1)}, ..., {\hat{θ}}^{* (1, 000)}$ , and then applying (7.2).

For estimates involving multiple survey months, each of ${\hat{θ}}^{* (1)}, ..., {\hat{θ}}^{* (1, 000)}$ should be computed using multiple survey months as well. For example, consider an estimate of change of the form: ${\hat{θ}}_{C} = {\hat{θ}}_{2} - {\hat{θ}}_{1},$ where ${\hat{θ}}_{1}$ is an estimate of $θ_{1}$ , the population parameter for the first month; and ${\hat{θ}}_{2}$ is an estimate of $θ_{2}$ , the population parameter for the second month. The bootstrap variance estimate of ${\hat{θ}}_{C}$ is obtained by first computing ${\hat{θ}}_{_{C}}^{* (b)} = {\hat{θ}}_{2}^{* (b)} - {\hat{θ}}_{1}^{* (b)}$ for b=1,...,1000, where ${\hat{θ}}_{_{1}}^{* (b)}$ is an estimate of $θ_{1}$ based on the b^th set of first month bootstrap weights, and ${\hat{θ}}_{2}^{* (b)}$ is an estimate of $θ_{2}$ based on the b^th set of second month bootstrap weights. Equation (7.2) is then applied to ${\hat{θ}}_{C}^{* (1)}, ..., {\hat{θ}}_{C}^{* (1, 000)}$ . Because the bootstrap weights are based on coordinated bootstrap samples, this approach of handling estimates involving multiple periods will take into account the overlap and dependence that exists between months. In practice, software packages are not usually designed to deal with multiple datasets from different periods. A solution to this problem is to create an input file containing the data and bootstrap weights from all the months of interest in the same file. It may be necessary to create dummy variables to identify the different months.

Date modified:: 2017-12-21

Language selection

Search and menus

Search

Methodology of the Canadian Labour Force Survey
Chapter 7 Variance estimation Methodology of the Canadian Labour Force Survey
Chapter 7 Variance estimation

7.0 Introduction

7.1 The Rao-Wu bootstrap

7.2 LFS bootstrap samples

7.2.1 Strata with one selected PSU

7.2.2 Bootstrap sample coordination

7.2.3 Redesign transition period

7.3 LFS bootstrap weights

7.4 Variance estimation

Methodology of the Canadian Labour Force Survey Chapter 7 Variance estimation Methodology of the Canadian Labour Force Survey Chapter 7 Variance estimation

7.0 Introduction

7.1 The Rao-Wu bootstrap

7.2 LFS bootstrap samples

7.2.1 Strata with one selected PSU

7.2.2 Bootstrap sample coordination

7.2.3 Redesign transition period

7.3 LFS bootstrap weights

7.4 Variance estimation

Acknowledgement

Note of appreciation

Standards of service to the public

Copyright

Methodology of the Canadian Labour Force Survey
Chapter 7 Variance estimation Methodology of the Canadian Labour Force Survey
Chapter 7 Variance estimation