5 Testing new advance letters for the Dutch Labor Force Survey

Jan A. van den Brakel

In this section an experiment with different advance letters embedded in the Dutch Labor Force Survey (LFS) is described, which serves as a numerical example to illustrate the methodology developed in this paper.

5.1 Survey design

The LFS is based on a rotating panel survey. Each month a stratified two-stage cluster sample of about 6,000 addresses is drawn from a register of all known addresses in the Netherlands. Strata are formed by geographical regions, municipalities are considered as primary sampling units, and addresses as secondary sampling units. All households residing at an address, with a maximum of three, are included in the sample. In the first wave, data are collected by means of computer assisted personal interviewing. The respondents are re-interviewed four times at quarterly intervals by means of computer assisted telephone interviewing.

The weighting procedure of the LFS is based on the GREG estimator of Särndal et al. (1992). The inclusion probabilities reflect the sample design used to select households as well as the different response rates between geographical regions. The weighting scheme is based on a combination of different socio-demographic categorical variables. One of the most important parameters of the LFS is the unemployed labor force, which is defined as the ratio of the total unemployment and the total labor force.

5.2 Experimental design

Advance letters are one of the design parameters of a survey that affect response rates and cooperation of respondents (De Leeuw, Callegaro, Hox, Korendijk and Lensveit-Mulders (2007)). The standard advance letter of the LFS is addressed to the occupants of the accommodation and the tone is formal and high-handed. As a result, this letter does not conform to social psychological theories regarding survey participation proposed by Groves et al. (1992) and Groves and Couper (1998). In an attempt to improve the LFS response rates, Luiten et al. (2008) proposed different advance letters for the LFS that better meet these principles about survey participation. The effects of these alternative letters are investigated empirically by means of a large-scale field experiment embedded in the LFS.

The first factor considered in this experiment, say $A$ , concerns the salutation of the respondent on two levels, i.e. the standard approach where the letter is addressed to the occupants of the accommodation ( $A_{1}$ ) versus a named letter ( $A_{2}$ ). It is anticipated that named letters are more likely to be read and therefore increase response rates and survey participation. The second factor, say $B$ , concerns the content of the letter on three levels, i.e. the standard formal letter ( $B_{1}$ ) versus two alternative letters ( $B_{2}$ and $B_{3}$ ). In the first alternative, the content of the standard letter is adapted by explaining why the survey is conducted, what the respondent gains by participating and why it is important for Statistics Netherlands that the respondent participates in the survey. The second alternative attempts to improve the formal tone of the standard letter. The three versions of the advance letters can be found in van den Brakel (2010).

A new letter is only considered for implementation as a standard in the LFS, if its positive effect on response behavior has been demonstrated and if its effect on the main parameter estimates is quantified in a randomized experiment. Both factors are tested in a $2 \times 3$ factorial design resulting in six treatment combinations. This experiment is embedded in the first wave of the LFS for a period of five months (December 2007 through April 2008). During this period the monthly gross sample size is randomized over six subsamples according to an RBD with interviewers as the block variables. About 220 interviewers were available for the field work. In the analysis, adjacent interviewer regions were collapsed into 13 blocks. A fraction of 0.8 of the sample is assigned to the regular advance letter, i.e. treatment combination $A_{1} \times B_{1}$ . A fraction of 0.04 of the sample is assigned to each of the other five alternative treatment combinations.

The allocation of the sampling units over the treatments is predominantly based on practical arguments. Embedding experiments in ongoing sample surveys serves two competing purposes. To estimate official figures as precisely as possible it is beneficial to allocate as many sampling units as possible to the control group, since this subsample is also used for regular publication purposes. To estimate the contrasts in the experiment as precisely as possible it is, on the other hand, beneficial to divide the total sample equally over the different treatment combinations. In this application it was decided that a loss of at most 20% of the sample size for regular publication purposes could be tolerated. This led to the aforementioned allocation over the treatment combinations. Under a response rate of 56% and a monthly sample size of 6,000 households it is expected that about 13,440 households are observed in the control group $A_{1} \times B_{1}$ and 670 households in each of the alternative treatment combinations.

Although the allocation is based on practical considerations, it is important to have a notion of the power of the planned experiment. The target variable analyzed in this paper is the unemployed labor force, expressed as a percentage. Ignoring the block design of this experiment, it follows that the variance of the treatments equals to ${\hat{d}}_{k l} = {\hat{S}}_{k l}^{2} / n_{k l}$ , where ${\hat{S}}_{k l}^{2}$ is implicitly defined by (2.26). It is assumed that ${\hat{S}}_{k l}^{2}$ is equal to say ${\hat{S}}_{}^{2}$ for each treatment combination. With available sample data it follows for the unemployed labor force that ${\hat{S}}_{}^{2} = 285$ . Now the minimal observable difference for a contrast that would reject the null hypothesis under a pre-specified significance and power level equals

$Δ = \sqrt{var (Δ)} (Z_{(1 - α / 2)} + Z_{(1 - β)}), (5.1)$

where $Z_{(γ)}$ denotes the $γ - th$ percentile point of the standard normal distribution, $α$ the significance level of the test and $(1 - β)$ the power. The main effect of factor $A$ concerns one contrast ${\hat{Δ}}_{A} = ({\hat{\bar{Y}}}_{1.; g r e g} - {\hat{\bar{Y}}}_{2.; g r e g})$ . From (2.29) it follows that the variance of this contrast equals $v \hat{a} r ({\hat{Δ}}_{A}) = ({\hat{S}}_{}^{2} / 9) \sum_{l = 1}^{3} (1 / n_{1 l} + 1 / n_{2 l})$ . The main effect of factor $B$ concern two contrasts ${\hat{Δ}}_{B_{l}} = ({\hat{\bar{Y}}}_{.1; g r e g} - {\hat{\bar{Y}}}_{. l; g r e g})$ , $l = 2, 3$ with variances $v \hat{a} r ({\hat{Δ}}_{B_{l}}) = ({\hat{S}}_{}^{2} / 4) \sum_{k = 1}^{2} (1 / n_{k 1} + 1 / n_{k l})$ , $l = 2, 3$ . The interactions between factors $A$ and $B$ concern the two contrasts ${\hat{Δ}}_{A B_{l}} = ({\hat{\bar{Y}}}_{11; g r e g} - {\hat{\bar{Y}}}_{1 l .; g r e g} - {\hat{\bar{Y}}}_{21; g r e g} + {\hat{\bar{Y}}}_{2 l .; g r e g})$ with variances $v \hat{a} r ({\hat{Δ}}_{A B_{l}}) = {\hat{S}}_{}^{2} (1 / n_{11} + 1 / n_{1 l} + 1 / n_{21} + 1 / n_{2 l})$ , $l = 2, 3 .$

Inserting the variances of the different contrasts in (5.1), gives minimum values of differences that would reject the null hypothesis for main effects and interactions for pre-specified sample sizes, significance levels and power levels. In Table 5.1 these differences for the unemployed labor force are calculated for the aforementioned applied allocation, and a balanced design where the sample size for each treatment combination is equal to 2,800. Values are given for unspecified alternative hypotheses at a 5% significance level and a power of 50%, 80% and 90%. In experimental design theory, 80% is a widely accepted power level by sample size determination. In survey sampling minimum sample size requirements are generally based on significance level requirements only, which corresponds to a power level of 50%. Differences are specified for separate tests of the contrasts. The main effect of factor $B$ and the interaction effects both contain two contrasts. To preserve an overall significance level of 5%, differences for both tests are also calculated using Bonferroni's simultaneous comparison procedure.

Table 5.1
Observable difference for the unemployed labor force in percentages at 5% significance levels and different power levels
Table summary
This table displays observable difference for the unemployed labor force in percentages at 5% significance levels and different power levels. The information is grouped by contrast, Number of contrasts, Power separate t-test, Power Bonferroni t-test (appearing as column headers).
Contrast	Number of contrasts	Power separate t-test			Power Bonferroni t-test
Contrast	Number of contrasts	50%	80%	90%	50%	80%	90%
Applied design
Main effect A	1	0.96	1.36	1.58	0.96	1.36	1.58
Main effect B	2	1.12	1.59	1.85	1.27	1.75	2
Interaction	2	2.23	3.19	3,.69	2.55	3.51	4
$A_{1} \times B_{1} - A_{k} \times B_{l}$	5	1.31	1.87	2.17	1.72	2.28	2.57
Balanced design
Main effect A	1	0.51	0.73	0.84	0.51	0.73	0.84
Main effect B	2	0.63	0.89	1.03	0.71	0.98	1.12
Interaction	2	1.25	1.79	2.07	1.43	1.97	2.25
$A_{1} \times B_{1} - A_{k} \times B_{l}$	5	0.88	1.26	1.46	1.16	1.54	1.74

Table 5.1 illustrates different aspects of embedded experiments and factorial designs. First it illustrates the cost-benefits of a factorial setup. Twice as many experimental units are required if the main effects of both factors are tested at the same precision in two separate single factor experiments. Table 5.1 also shows that the power for the test of interactions is much smaller than for the tests of the two main effects. The more treatment factors that are combined in one experiment, the smaller the sample size allocated to each treatment combination and the smaller the power for the tests of interactions. This puts the often cited advantage that factorial designs also allow testing of interactions between the different treatment factors into perspective. In practise, sample sizes are based on power calculations for the tests on the main effects. Consequently, only large interactions can be detected with sufficient power. A factorial design still has the advantage that the validity of observed main effects increases, since they are tested over a wider range of conditions.

If the null hypothesis of no interactions is rejected, then main effects are difficult to interpret. In that situation it is more useful to compare the control group, i.e. $A_{1} \times B_{1}$ , with the five alternative treatment combinations. The minimum observable differences of these five contrasts that reject the null hypothesis at a 5% significance level and different power levels are also included in Table 5.1.

Comparing minimum values for the differences under the applied design and the balanced design, illustrates the loss of power if an extreme skew allocation over the treatment combinations is chosen. Minimizing the risk of losing too much precision for the regular publication is the motivation behind the choice for this allocation. It clearly illustrates the duality of combining two competing purposes in an embedded experiment; estimation for the regular publication purposes versus testing contrasts of different treatment combinations.

To assess the value of the results that can be obtained with this experiment, the minimum observable differences with this experiment are related to the standard errors of the regular survey estimates. Standard errors for the survey estimates at the national level will generally be much smaller than the minimum observable differences with an experiment since the sample size allocated to the alternative treatments is generally much smaller than the regular sample size. If, however, the assumption is adopted that differences observed with an experiment at the national level also apply to the survey estimates for important domains, then the differences observable with the experiment might become comparable with the standard errors of these domain estimates. This assumes no interaction between domains and treatment effects. The standard errors for the monthly unemployed labor force figures at the national level equals 0.15 percent points. The standard errors for the domains vary between 0.3 and 1.0 percent points. Comparing these standard errors with the differences in Table 5.1 shows that the main effects are still larger than the standard errors at the national level but become comparable with the precision of the regular monthly domain estimates.

5.3 Results

Table 5.2 contains an overview of the response rates of the households in the six subsamples of the experiment. It follows that the different advance letters result in relatively small differences in the response rates. Factor $A$ results in an increase of the response of 2.4 percent points by using a personalized letter (after correcting proportions for the unbalanced allocation of the sample over the treatment combinations). The alternative letters considered in factor $B$ resulted in a decrease of 1.5 percent points (alternative $B_{2}$ ) and 1.9 percent points (alternative $B_{3}$ ).

Table 5.2
Response rates experiment with advance letters
Table summary
This table displays the response rates experiment with advance letters. The information is grouped by Treatment, Response, Refusal, Rest, Total (appearing as column headers).
Treatment	Response	Refusal	Rest	Total
$A_{1} \times B_{1}$	13,234 56.69%	5,127 21,.96%	4,985 21.35%	23,346
$A_{1} \times B_{2}$	604 53.59%	271 24.05%	252 22.36%	1,127
$A_{1} \times B_{3}$	635 56.34%	254 22.54%	238 21.12%	1,127
$A_{2} \times B_{1}$	662 59.00%	256 22.82%	204 18.18%	1,122
$A_{2} \times B_{2}$	663 59.09%	236 21.03%	223 19.88%	1,122
$A_{2} \times B_{3}$	627 55.64%	259 22.98%	241 21.38%	1,127

Response behavior is modeled in a logistic regression model to test hypotheses about the effect of the two treatment factors. This is a typical conditional analysis that does not account for sample design features like unequal selection probabilities and clustering of households within municipalities. Clustering induced by the two-stage sample design is ignored, since households are randomized over the treatments in the experiment. In this logistic regression analysis interest is focussed on differences in the observed sample, in this case due to differences in selective non-response. This gives additional information on whether the factors increase the response across the entire target population or that specific groups react differently to the treatments. Second and higher order interactions between the two treatment factors and socio-demographic categorical variables in the logistic regression model indicate that the variation in response between different subpopulations increases and that they react differently to the treatments.

In the logistic regression model, the dependent binary variable indicates whether a household completely responded versus the remaining response categories. The response behavior is assumed to depend upon:

a general mean,
treatment factors $A$ (name) and $B$ (content),
a block variable in 13 categories,
auxiliary variables:
- urbanization level at five categories,
- gender in three categories, specifying whether a household consists of men only, women only, or a mixture of men and women,
- age as a quantitative variable containing the average age of the household members,
- ethnicity in seven categories, specifying household compositions of native, western background, non-western background, and all possible mixtures,
- family composition in four categories; partners, single-parent family, single, and a remainder category,

All third order interactions between the variables are initially considered for backward model selection. The final selected model contains the terms that are given in the first column of Table 5.3. For brevity, the regression coefficients with their standard errors and test statistics for separate categories are only expressed for the treatment factors. The hypothesis that there are no interactions between the two treatment factors cannot be rejected (p-value Wald statistic equals 0.121). From Table 5.3 it follows that factor $A$ , i.e. using a letter addressed to a named individual, has a positive but non-significant effect on the response rate. Factor $B$ , i.e. two alternative letters with an improved content, has even a slightly negative but non-significant effect on the response rates. This is a remarkable result, since the two alternative letters attempt to improve the formal tone of the standard letter, but in line with the results of an earlier experiment where the response to a more informal advance letter for the LFS also resulted in significantly smaller response rates, (van den Brakel (2008). Since there are no interactions between the treatment factors and the auxiliary variables, there are also no indications that the treatment factors induce the response of specific subpopulations.

Table 5.3
Logistic regression analysis for response rates
Table summary
This table displays logistic regression analysis for response rates. The information is grouped by parameter (appearing as row headers), Coefficient, Standard error, Wald statistic,D.f.,p-value (appearing as column headers).
Parameter	Coefficient	Standard error	Wald statistic	D.f.	p-value
Mean	0.287	0.078	13.604	1	0.000
Block			212.425	12	0.000
Treatment A (name, A₂)	0.083	0.045	3.394	1	0.065
Treatment B (content)			2.965	2	0.227
Alternative 1 (B₂)	-0.046	0.051	0.816	1	0.366
Alternative 2 (B₃)	-0.083	0.051	2.678	1	0.102
Urbanization			16.589	4	0.002
Ethnic			127.734	6	0.000
Gender			48.076	2	0.000
Family composition			27.339	3	0.000

In the second step of this analysis it is tested whether the estimates for the unemployed labor force obtained with the six subsamples under the different advance letters are significantly different. The design-based analysis procedure developed in this paper is used to account for the sampling design, the experimental design and the estimation procedure of the LFS. The GREG estimator is applied to obtain estimates for the unemployed labor force under the six different treatment combinations. With this unconditional analysis it is tested whether the different advance letters introduce differences in selection bias, after correcting for the differences in response rates using the design-based estimation procedure applied in the regular LFS.

With this analysis, the linear measurement error model (2.1) is applied to a binary response variable. This might appear to be ridged, since logistic models are more natural in this case. Under the model-assisted approach linear regression models, however, are frequently applied to derive a GREG estimator for binary response variables. Also in the Dutch LFS a linear regression model is assumed to derive a GREG estimator for official labor force figures. To develop a design-based analysis procedure for embedded experiments that also account for the GREG estimator used in the regular survey, a linear measurement error model is assumed in a similar way. A detailed discussion about the use and interpretation of a linear measurement error model applied to binary response variables is given by van den Brakel (2008).

The inclusion probabilities in the GREG estimator (2.18) reflect the sampling design of the LFS and the experimental design used to divide the initial sample into six subsamples. The following weighting scheme was applied to calibrate the design weights: age + region + marital status + gender + urbanization level, where the five variables are categorical. This is a reduced version of the regular weighting scheme of the LFS.

The estimation results for the six subsamples are summarized in Table 5.4, where the unemployed labor force is expressed in percentages. It appears that there are no systematic patterns between subsample estimates. The subsample estimates and their variance estimates indicate that there are no significant differences between the control group and the five alternative treatment combinations. Finally the main effects and the interaction effects of the two treatment factors are tested, taking into account that the experiment was designed as an RBD where adjacent interviewer regions are collapsed in 13 blocks. The analysis results are summarized in Table 5.5.

Table 5.4
Point estimates and standard errors unemployed labor force (expressed in percentages)
Table summary
This table displays point estimates and standard errors unemployed labor force. The information is grouped by Treatment combination, Estimate ${\hat{\bar{Y}}}_{k l; g r e g}$ and Standard error $\sqrt{{\hat{d}}_{k l}}$ (appearing as column headers).
Treatment combination		Estimate ${\hat{\bar{Y}}}_{k l; g r e g}$	Standard error $\sqrt{{\hat{d}}_{k l}}$
$k (A_{k})$	$l (B_{l})$	Estimate ${\hat{\bar{Y}}}_{k l; g r e g}$	Standard error $\sqrt{{\hat{d}}_{k l}}$
1	1	4.10%	0.15%
1	2	3.76%	0.65%
1	3	5.26%	0.75%
2	1	3.61%	0.61%
2	2	4.55%	0.67%
2	3	3.39%	0.66%

Table 5.5
Analysis main effects and interactions unemployed labor force (expressed in percentages)
Table summary
This table displays the analysis main effects and interactions unemployed labor force. The information is grouped by source (appearing as row headers), Estimate $C {\hat{\bar{Y}}}_{g r e g}$ , Wald statistic, D.f. and p-value (appearing as column headers).
Source	Estimate $C {\hat{\bar{Y}}}_{g r e g}$	Wald statistic	D.f.	p-value
Treatment A (name) A₁ - A₂	0.528	1.109	1	0.292
Treatment B (content)		0.732	2	0.694
B₁ - B₂	-0.300
B₁ - B₃	-0.471
Interaction		3.801	2	0.150
AB₁₁ - AB₁₂ - AB₂₁ + AB₂₂	1.276
AB₁₁ - AB₁₃ - AB₂₁ + AB₂₃	-1.388

From the analysis results, summarized in Table 5.5, it can be concluded that there are no indications that the different advance letters result in different parameter estimates. This is in line with the analysis results of the response rates. Since there is no empirical evidence that the different advance letters affect response rates of the entire population or a subpopulation, it might be expected that no significant differences between the parameter estimates occur.

There are no indications that the alternative letters, considered in this experiment, improve response behavior or result in systematic effects in the estimates for target variables like the unemployed labor force. Therefore it was decided not to adapt the standard advance letter of the LFS.

Previous | Next

Date modified:: 2017-09-20

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

5 Testing new advance letters for the Dutch Labor Force Survey

5.1 Survey design

5.2 Experimental design

5.3 Results