5 Testing new advance letters for the Dutch Labor Force Survey
Jan A. van den Brakel
Previous | Next
In this
section an experiment with different advance letters embedded in the Dutch
Labor Force Survey (LFS) is described, which serves as a numerical example to
illustrate the methodology developed in this paper.
5.1 Survey design
The LFS is
based on a rotating panel survey. Each month a stratified two-stage cluster
sample of about 6,000 addresses is drawn from a register of all known addresses
in the Netherlands.
Strata are formed by geographical regions, municipalities are considered as
primary sampling units, and addresses as secondary sampling units. All
households residing at an address, with a maximum of three, are included in the
sample. In the first wave, data are collected by means of computer assisted
personal interviewing. The respondents are re-interviewed four times at
quarterly intervals by means of computer assisted telephone interviewing.
The
weighting procedure of the LFS is based on the GREG estimator of Särndal et al.
(1992). The inclusion probabilities reflect the sample design used to select
households as well as the different response rates between geographical
regions. The weighting scheme is based on a combination of different
socio-demographic categorical variables. One of the most important parameters
of the LFS is the unemployed labor force, which is defined as the ratio of the
total unemployment and the total labor force.
5.2 Experimental design
Advance
letters are one of the design parameters of a survey that affect response rates
and cooperation of respondents (De Leeuw, Callegaro, Hox, Korendijk and Lensveit-Mulders (2007)). The standard advance
letter of the LFS is addressed to the occupants of the accommodation and the
tone is formal and high-handed. As a result, this letter does not conform to
social psychological theories regarding survey participation proposed by Groves
et al. (1992) and
Groves
and Couper (1998). In an attempt to improve the LFS response rates, Luiten et
al. (2008) proposed different advance letters for the LFS that better meet
these principles about survey participation. The effects of these alternative
letters are investigated empirically by means of a large-scale field experiment
embedded in the LFS.
The first
factor considered in this experiment, say , concerns the salutation of the respondent on
two levels, i.e. the standard approach where the letter is addressed to the
occupants of the accommodation ( ) versus a named
letter ( ). It is
anticipated that named letters are more likely to be read and therefore
increase response rates and survey participation. The second factor, say , concerns the content of the letter on three
levels, i.e. the standard formal letter ( ) versus two
alternative letters ( and ). In the first
alternative, the content of the standard letter is adapted by explaining why
the survey is conducted, what the respondent gains by participating and why it
is important for Statistics Netherlands that the respondent participates in the
survey. The second alternative attempts to improve the formal tone of the
standard letter. The three versions of the advance letters can be found in van
den Brakel (2010).
A new
letter is only considered for implementation as a standard in the LFS, if its
positive effect on response behavior has been demonstrated and if its effect on
the main parameter estimates is quantified in a randomized experiment. Both
factors are tested in a
factorial design resulting in six treatment
combinations. This experiment is embedded in the first wave of the LFS for a
period of five months (December 2007 through April 2008). During this period the
monthly gross sample size is randomized over six subsamples according to an RBD
with interviewers as the block variables. About 220 interviewers were available
for the field work. In the analysis, adjacent interviewer regions were
collapsed into 13 blocks. A fraction of 0.8 of the sample is assigned to the
regular advance letter, i.e. treatment combination . A fraction of 0.04 of the sample is assigned to each of the other five
alternative treatment combinations.
The
allocation of the sampling units over the treatments is predominantly based on
practical arguments. Embedding experiments in ongoing sample surveys serves two
competing purposes. To estimate official figures as precisely as possible it is
beneficial to allocate as many sampling units as possible to the control group,
since this subsample is also used for regular publication purposes. To estimate
the contrasts in the experiment as precisely as possible it is, on the other
hand, beneficial to divide the total sample equally over the different
treatment combinations. In this application it was decided that a loss of at
most 20% of the sample size for regular publication purposes could be
tolerated. This led to the aforementioned allocation over the treatment
combinations. Under a response rate of 56% and a monthly sample size of 6,000
households it is expected that about 13,440 households are observed in the
control group and 670
households in each of the alternative treatment combinations.
Although
the allocation is based on practical considerations, it is important to have a
notion of the power of the planned experiment. The target variable analyzed in
this paper is the unemployed labor force, expressed as a percentage. Ignoring
the block design of this experiment, it follows that the variance of the
treatments equals to , where is
implicitly defined by (2.26). It is assumed that is equal
to say for each
treatment combination. With available sample data it follows for the unemployed
labor force that . Now the minimal observable difference for a
contrast that would reject the null hypothesis under a pre-specified
significance and power level equals
where denotes
the percentile point of the standard normal
distribution, the
significance level of the test and the
power. The main effect of factor concerns
one contrast . From (2.29) it follows that the variance of
this contrast equals . The main effect of factor concern
two contrasts , with
variances , . The interactions between factors and concern
the two contrasts with variances ,
Inserting
the variances of the different contrasts in (5.1), gives minimum values of
differences that would reject the null hypothesis for main effects and
interactions for pre-specified sample sizes, significance levels and power
levels. In Table 5.1 these differences for the unemployed labor force are
calculated for the aforementioned applied allocation, and a balanced design
where the sample size for each treatment combination is equal to 2,800. Values
are given for unspecified alternative hypotheses at a 5% significance level and
a power of 50%, 80% and 90%. In experimental design theory, 80% is a widely
accepted power level by sample size determination. In survey sampling minimum
sample size requirements are generally based on significance level requirements
only, which corresponds to a power level of 50%. Differences are specified for
separate tests of the contrasts. The main effect of factor
and the
interaction effects both contain two contrasts. To preserve an overall
significance level of 5%, differences for both tests are also calculated using
Bonferroni's simultaneous comparison procedure.
Table 5.1
Observable difference for the unemployed labor force in percentages at 5% significance levels and different power levels
Table summary
This table displays observable difference for the unemployed labor force in percentages at 5% significance levels and different power levels. The information is grouped by contrast, Number of contrasts, Power separate t-test, Power Bonferroni t-test (appearing as column headers).
| Contrast |
Number of contrasts |
Power separate t-test |
Power Bonferroni t-test |
| 50% |
80% |
90% |
50% |
80% |
90% |
| Applied design |
| Main effect A |
1 |
0.96 |
1.36 |
1.58 |
0.96 |
1.36 |
1.58 |
| Main effect B |
2 |
1.12 |
1.59 |
1.85 |
1.27 |
1.75 |
2 |
| Interaction |
2 |
2.23 |
3.19 |
3,.69 |
2.55 |
3.51 |
4 |
|
5 |
1.31 |
1.87 |
2.17 |
1.72 |
2.28 |
2.57 |
| Balanced design |
| Main effect A |
1 |
0.51 |
0.73 |
0.84 |
0.51 |
0.73 |
0.84 |
| Main effect B |
2 |
0.63 |
0.89 |
1.03 |
0.71 |
0.98 |
1.12 |
| Interaction |
2 |
1.25 |
1.79 |
2.07 |
1.43 |
1.97 |
2.25 |
|
5 |
0.88 |
1.26 |
1.46 |
1.16 |
1.54 |
1.74 |
Table 5.1
illustrates different aspects of embedded experiments and factorial designs.
First it illustrates the cost-benefits of a factorial setup. Twice as many
experimental units are required if the main effects of both factors are tested
at the same precision in two separate single factor experiments. Table 5.1 also
shows that the power for the test of interactions is much smaller than for the
tests of the two main effects. The more treatment factors that are combined in
one experiment, the smaller the sample size allocated to each treatment
combination and the smaller the power for the tests of interactions. This puts
the often cited advantage that factorial designs also allow testing of
interactions between the different treatment factors into perspective. In
practise, sample sizes are based on power calculations for the tests on the
main effects. Consequently, only large interactions can be detected with
sufficient power. A factorial design still has the advantage that the validity
of observed main effects increases, since they are tested over a wider range of
conditions.
If the null
hypothesis of no interactions is rejected, then main effects are difficult to
interpret. In that situation it is more useful to compare the control group,
i.e. , with the five alternative treatment
combinations. The minimum observable differences of these five contrasts that
reject the null hypothesis at a 5% significance level and different power
levels are also included in Table 5.1.
Comparing
minimum values for the differences under the applied design and the balanced
design, illustrates the loss of power if an extreme skew allocation over the
treatment combinations is chosen. Minimizing the risk of losing too much
precision for the regular publication is the motivation behind the choice for
this allocation. It clearly illustrates the duality of combining two competing
purposes in an embedded experiment; estimation for the regular publication
purposes versus testing contrasts of different treatment combinations.
To assess
the value of the results that can be obtained with this experiment, the minimum
observable differences with this experiment are related to the standard errors
of the regular survey estimates. Standard errors for the survey estimates at
the national level will generally be much smaller than the minimum observable
differences with an experiment since the sample size allocated to the
alternative treatments is generally much smaller than the regular sample size.
If, however, the assumption is adopted that differences observed with an
experiment at the national level also apply to the survey estimates for
important domains, then the differences observable with the experiment might
become comparable with the standard errors of these domain estimates. This
assumes no interaction between domains and treatment effects. The standard
errors for the monthly unemployed labor force figures at the national level
equals 0.15 percent points.
The standard errors for the domains vary between 0.3 and 1.0 percent points.
Comparing these standard errors with the differences in Table 5.1 shows that
the main effects are still larger than the standard errors at the national
level but become comparable with the precision of the regular monthly domain
estimates.
5.3 Results
Table 5.2
contains an overview of the response rates of the households in the six
subsamples of the experiment. It follows that the different advance letters
result in relatively small differences in the response rates. Factor results
in an increase of the response of 2.4 percent points by using a personalized
letter (after correcting proportions for the unbalanced allocation of the
sample over the treatment combinations). The alternative letters considered in
factor resulted
in a decrease of 1.5 percent points (alternative ) and 1.9
percent points (alternative ).
Table 5.2
Response rates experiment with advance letters
Table summary
This table displays the response rates experiment with advance letters. The information is grouped by Treatment, Response, Refusal, Rest, Total (appearing as column headers).
| Treatment |
Response |
Refusal |
Rest |
Total |
|
13,234 56.69% |
5,127 21,.96% |
4,985 21.35%
|
23,346 |
|
604 53.59% |
271 24.05% |
252 22.36% |
1,127 |
|
635 56.34% |
254 22.54% |
238 21.12% |
1,127 |
|
662 59.00% |
256 22.82% |
204 18.18% |
1,122 |
|
663 59.09% |
236 21.03% |
223 19.88% |
1,122 |
|
627 55.64%
|
259 22.98% |
241 21.38% |
1,127 |
Response behavior
is modeled in a logistic regression model to test hypotheses about the effect
of the two treatment factors. This is a typical conditional analysis that does
not account for sample design features like unequal selection probabilities and
clustering of households within municipalities. Clustering induced by the
two-stage sample design is ignored, since households are randomized over the
treatments in the experiment. In this logistic regression analysis interest is
focussed on differences in the observed sample, in this case due to differences
in selective non-response. This gives additional information on whether the
factors increase the response across the entire target population or that
specific groups react differently to the treatments. Second and higher order
interactions between the two treatment factors and socio-demographic
categorical variables in the logistic regression model indicate that the
variation in response between different subpopulations increases and that they
react differently to the treatments.
In the
logistic regression model, the dependent binary variable indicates whether a
household completely responded versus the remaining response categories. The
response behavior is assumed to depend upon:
- a general mean,
- treatment factors (name) and (content),
-
a block variable in 13 categories,
-
auxiliary variables:
-
urbanization
level at five categories,
-
gender
in three categories, specifying whether a household consists of men only, women
only, or a mixture of men and women,
-
age
as a quantitative variable containing the average age of the household members,
-
ethnicity
in seven categories, specifying household compositions of native, western
background, non-western background, and all possible mixtures,
-
family
composition in four categories; partners, single-parent family, single, and a
remainder category,
All third
order interactions between the variables are initially considered for backward
model selection. The final selected model contains the terms that are given in
the first column of Table 5.3. For brevity, the regression coefficients with
their standard errors and test statistics for separate categories are only
expressed for the treatment factors. The hypothesis that there are no
interactions between the two treatment factors cannot be rejected (p-value Wald statistic equals 0.121).
From Table 5.3 it follows that factor
, i.e. using a letter addressed to a named
individual, has a positive but non-significant effect on the response rate.
Factor
, i.e. two alternative letters with an improved
content, has even a slightly negative but non-significant effect on the
response rates. This is a remarkable result, since the two alternative letters
attempt to improve the formal tone of the standard letter, but in line with the
results of an earlier experiment where the response to a more informal advance
letter for the LFS also resulted in significantly smaller response rates, (van
den Brakel (2008). Since there are no interactions between the treatment
factors and the auxiliary variables, there are also no indications that the
treatment factors induce the response of specific subpopulations.
Table 5.3
Logistic regression analysis for response rates
Table summary
This table displays logistic regression analysis for response rates. The information is grouped by parameter (appearing as row headers), Coefficient, Standard error, Wald statistic,D.f.,p-value (appearing as column headers).
| Parameter |
Coefficient |
Standard error |
Wald statistic |
D.f. |
p-value |
| Mean |
0.287 |
0.078 |
13.604 |
1 |
0.000 |
| Block |
|
|
212.425 |
12 |
0.000 |
| Treatment A (name, A2) |
0.083 |
0.045 |
3.394 |
1 |
0.065 |
| Treatment B (content) |
|
|
2.965 |
2 |
0.227 |
| Alternative 1 (B2) |
-0.046 |
0.051 |
0.816 |
1 |
0.366 |
| Alternative 2 (B3) |
-0.083 |
0.051 |
2.678 |
1 |
0.102 |
| Urbanization |
|
|
16.589 |
4 |
0.002 |
| Ethnic |
|
|
127.734 |
6 |
0.000 |
| Gender |
|
|
48.076 |
2 |
0.000 |
| Family composition |
|
|
27.339 |
3 |
0.000 |
In the
second step of this analysis it is tested whether the estimates for the
unemployed labor force obtained with the six subsamples under the different
advance letters are significantly different. The design-based analysis
procedure developed in this paper is used to account for the sampling design,
the experimental design and the estimation procedure of the LFS. The GREG
estimator is applied to obtain estimates for the unemployed labor force under
the six different treatment combinations. With this unconditional analysis it
is tested whether the different advance letters introduce differences in
selection bias, after correcting for the differences in response rates using
the design-based estimation procedure applied in the regular LFS.
With this
analysis, the linear measurement error model (2.1) is applied to a binary
response variable. This might appear to be ridged, since logistic models are
more natural in this case. Under the model-assisted approach linear regression
models, however, are frequently applied to derive a GREG estimator for binary
response variables. Also in the Dutch LFS a linear regression model is assumed
to derive a GREG estimator for official labor force figures. To develop a
design-based analysis procedure for embedded experiments that also account for
the GREG estimator used in the regular survey, a linear measurement error model
is assumed in a similar way. A detailed discussion about the use and
interpretation of a linear measurement error model applied to binary response
variables is given by van den Brakel (2008).
The
inclusion probabilities in the GREG estimator (2.18) reflect the sampling
design of the LFS and the experimental design used to divide the initial sample
into six subsamples. The following weighting scheme was applied to calibrate
the design weights: age + region +
marital status + gender + urbanization level, where the five variables are
categorical. This is a reduced version of the regular weighting scheme of the
LFS.
The
estimation results for the six subsamples are summarized in Table 5.4, where
the unemployed labor force is expressed in percentages. It appears that there
are no systematic patterns between subsample estimates. The subsample estimates
and their variance estimates indicate that there are no significant differences
between the control group and the five alternative treatment combinations.
Finally the main effects and the interaction effects of the two treatment
factors are tested, taking into account that the experiment was designed as an
RBD where adjacent interviewer regions are collapsed in 13 blocks. The analysis
results are summarized in Table 5.5.
Table 5.4
Point estimates and standard errors unemployed labor force (expressed in percentages)
Table summary
This table displays point estimates and standard errors unemployed labor force. The information is grouped by Treatment combination, Estimate and Standard error (appearing as column headers).
| Treatment combination |
Estimate |
Standard error |
|
|
| 1 |
1 |
4.10% |
0.15% |
| 1 |
2 |
3.76% |
0.65% |
| 1 |
3 |
5.26% |
0.75% |
| 2 |
1 |
3.61% |
0.61% |
| 2 |
2 |
4.55% |
0.67% |
| 2 |
3 |
3.39% |
0.66% |
Table 5.5
Analysis main effects and interactions unemployed labor force (expressed in percentages)
Table summary
This table displays the analysis main effects and interactions unemployed labor force. The information is grouped by source (appearing as row headers), Estimate , Wald statistic, D.f. and p-value (appearing as column headers).
| Source |
Estimate |
Wald statistic |
D.f. |
p-value |
| Treatment A (name) A1 - A2 |
0.528 |
1.109 |
1 |
0.292 |
| Treatment B (content) |
|
0.732 |
2 |
0.694 |
| B1 - B2 |
-0.300 |
|
|
|
| B1 - B3 |
-0.471 |
|
|
|
| Interaction |
|
3.801 |
2 |
0.150 |
| AB11 - AB12 - AB21 + AB22 |
1.276 |
|
|
|
| AB11 - AB13 - AB21 + AB23 |
-1.388 |
|
|
|
From the
analysis results, summarized in Table 5.5, it can be concluded that there are
no indications that the different advance letters result in different parameter
estimates. This is in line with the analysis results of the response rates.
Since there is no empirical evidence that the different advance letters affect
response rates of the entire population or a subpopulation, it might be
expected that no significant differences between the parameter estimates occur.
There are
no indications that the alternative letters, considered in this experiment,
improve response behavior or result in systematic effects in the estimates for
target variables like the unemployed labor force. Therefore it was decided not
to adapt the standard advance letter of the LFS.
Previous | Next