Non-response follow-up for business surveys
Section 4. Simulation study
We conducted a simulation
study to evaluate the properties of the non-response-adjusted estimator (2.4), under different
response scenarios and follow-up sampling designs.
4.1 The simulation setup
Data used to create the sample
The data used for the
simulation study are sample data from an actual business survey: Statistics
Canada’s Monthly Survey of Food Services and Drinking Places (MSFSDP). As is
typical for business surveys, the MSFSDP is
stratified by province, industry and revenue (one take-all and one or more take-some
strata within each province/industry combination). For greater detail on the MSFSDP, see Statistics Canada (2017). Each “Take
All” stratum within a province/industry combination consists of the large and
important businesses, which are usually all followed up. These units are
excluded from the simulation study to focus on the follow-up strategy for the “Take
some” strata. The set of sample units included in the simulation study is thus
the original sample of 2,375 units selected in the “Take some” strata.
Two variables are used for
the simulation study: “Revenue” and “Sales”. The first variable, Revenue, comes
from the sampling frame (Statistics Canada’s Business Register) and is present
for all units selected in the MSFSDP sample. We use Revenue as an auxiliary
variable, for sampling the non-respondents
to the mail-out (see below). The second variable, Sales, is one of the
variables collected by the survey; it is the variable of interest Both unit and item
non-response are handled by imputation in the MSFSDP; thus Sales are available
for all units in the simulation study and is imputed for 15% of the sample
units. The correlation between Revenue and Sales is about 83% for both the
respondent only data and the fully imputed data.
In our
simulation experiments, the sample is not randomly generated multiple times from
MSFSDP data. Instead, is fixed and consists of the set of all units in the
original MSFSDP sample. The strata
identifier, the design weight, the variable of interest (Sales) and the
auxiliary variable (Revenue) for each
unit of are taken from the
MSFSDP sample file. Units with imputed values are included in and imputed values are treated as observed
values. This allows us to compute the full sample estimate given in (2.1). This estimate is used as a
benchmark to evaluate the properties of for different
response scenarios and follow-up sampling designs, as detailed below.
Generation of the set of mail-out non-respondents
Next, from response to the
mail-out is generated independently from one unit to another using a Bernoulli
distribution with probability Two response
probability scenarios are considered:
-
Uniform: for all sample
units. Under this scenario, the expected number of non-respondents to the
mail-out is 2,375/2 = 1,187.5.
-
Correlated to
the variable of interest: is determined using
the logit function
The constants -0.31 and 0.000004 are chosen by trial and
error so that the expected number of non-respondents to the mail-out is again
approximately half of the size of Note that the expected number of
non-respondents to the mail-out can be written as As a result, the constants are such that where
Selection of the follow-up sample
The next step in the
simulation is to select a follow-up sample from the set of mail-out
non-respondents, generated from one of the two response
probability scenarios above. Five different sampling
designs are considered for the selection of the follow-up sample:
- Census of the
mail-out non-respondents;
- Simple Random
Sampling (SRS) without replacement, ignoring
the original stratification;
- Stratified SRS
without replacement using the original stratification, with sample allocation
to strata proportional to the number of mail-out non-respondents;
- Systematic
sampling with probability proportional to Revenue, ignoring the
original stratification;
- Systematic
sampling with probability proportional to Revenue multiplied by the initial
design weight, ignoring the
original stratification.
Note that the size variables
used for the two Probability Proportional to Size (PPS) sampling designs are
trimmed from below the percentile to
remove zero-valued observations and some
extremely small values that caused instability. On average, there are 1,188
non-respondents to the mail-out. For the first design, all non-respondents are
followed up. For the remaining four designs, the follow-up sample sizes used
for the simulation are chosen as 100, 200, 300, 400, 500, 700, and 900.
Generation of call outcomes
The outcomes of the telephone
follow-up collection procedure are simulated at the call attempt level. For
each sample unit the probabilities and for the three
possible outcomes (see Section 2) are assigned before the start of the
simulation and do not vary as data collection progresses. Two response
scenarios are considered:
- Uniform: and
for all units.
These values were taken from Xie, Godbout, Youn and Lavallée (2011).
- Correlated to
the variable of interest: The probability of a “response” is based on the
following logit function:
- where is generated from the standard normal
distribution. The constants -1.29, 0.000002 and 0.3 are chosen by trial and
error so that the average of over all units in the sample is approximately 25%; i.e., where Note that the coefficient of correlation
between the response probability and the variable of
interest is 61%. The other two probabilities
are defined as: and This ensures that
For a given follow-up sample
unit, the probabilities and are used to
randomly generate the outcome of each call. After a call attempt, the unit
returns to the end of the calling queue unless it is finalized and an outcome
of “response” or “final non-response” is obtained. Outcomes are generated
independently from one call to another. There is no explicit upper limit on the
number of call attempts made to the same unit in our simulation study
Note that for the response
scenario with varying response probabilities, the units that respond to the
first call attempt are typically units with a higher response probability. As a
result, the units that remain in the calling queue for the second attempt tend
to be units with a lower response probability. It follows that the proportion
of units that respond in the second attempt tends to be lower than in the first
attempt. Similarly, the proportion of units that respond in the third attempt
tends to be lower than in the second attempt, and so on. The proportion of
units that respond decreases with each call attempt, as the units that remain
in the calling queue are those that are harder to reach. Therefore, estimates
may suffer from substantial bias if data collection ends prematurely, and if
those that are harder to reach tend to have -values larger or
smaller than the other sample units.
The total budget for
follow-up is fixed at 3,000 units (monetary or time units) in our study. A cost
is charged for each call attempt. The amount charged depends on the outcome of
the attempt: a “response” outcome has a cost of 5 units a “final
non-response” outcome has a cost of 2 units and a “still-in-progress”
outcome has a cost of 1 unit The collection ends when the budget runs out,
or when there are no more cases left in the calling queue (i.e., all units are
resolved), whichever occurs first. The cost values and
budget have been chosen somewhat arbitrarily as they are survey-specific.
However, we ensured that as this relation is generally expected to hold
in telephone surveys.
Monte Carlo measures
The generation of responses
to the mail-out, the selection of the follow-up sample and the generation of
responses to the follow-up are repeated independently times for each
combination of mail-out response scenario, follow-up sampling design and follow-up
response scenario described above. The non-response-adjusted estimator (2.4), is computed for
each replicate. The non-response weight adjustments are computed using
(2.5) as the inverse of the overall weighted response rate. We use given in (2.5), rather than given in (2.6), to avoid a few cases where
some of the sets are empty, which would lead to infinite values
of The non-response weight adjustment (2.5) can
be viewed as an extreme form of collapsing. Less extreme collapsing could be
applied in practice and might show better properties. We choose (2.5) in this
simulation study for its simplicity.
Using the 1,000 replicates of
the Monte Carlo
Relative Bias (RB) and Relative Root Mean Square Error (RRMSE) of are computed as
and
where is the
relative error for the simulation replicate, and is the non-response-adjusted Hansen-Hurwitz estimator for the replicate,
As pointed out above, the initial
sample is fixed for each of the 1,000 replicates to
focus on the mail-out and follow-up response mechanisms and the follow-up
sampling design. While it could have been possible to create an artificial
population and draw a different initial sample at each replicate, it was felt
that this additional complexity would not change our main conclusions, except
for systematically increasing the variance of Our simulation setup has also the advantage of
being conditional on real sample data.
4.2 Simulation results
In this section, we discuss
the simulation results for four scenarios of mail-out and follow-up response:
- The response
probability is uniform for both the mail-out and the follow-up. This serves as
a baseline scenario with which to compare the other scenarios.
- The response
probability is correlated to Sales for the mail-out and uniform for the
follow-up.
- The response
probability is uniform for the mail-out and correlated to Sales for the
follow-up.
- The response
probability is correlated to Sales for both the mail-out and the follow-up.
This scenario is probably the most realistic.
Response Scenario 1: Uniform response probability for
both the mail-out and the follow-up
Figure 4.1 shows the relative bias versus the
follow-up sample size for the five sampling designs. Figure 4.2 shows the
RRMSE versus the follow-up sample size. Note that the results for the follow-up
of all mail-out non-respondents are given by the last point on the figures
(i.e., a sample size of 1,188).

Description of Figure 4.1
Figure presenting the relative bias (RB) versus follow-up sample size of the five sampling designs for scenario 1. The RB of the first sampling design, which is the census of the mail-out non-respondents, is given by the last point on the figure (i.e., a sample size of 1,188). With the exception of the stratified SRS with a follow-up sample size of 100, the RB is approximately zero for all follow-up sample sizes and designs.

Description of Figure 4.2
Figure presenting the relative root mean square error (RRMSE) versus follow-up sample size of the five sampling designs for scenario 1. The RRMSE of the first sampling design, which is the census of the mail-out non-respondents, is given by the last point on the figure (i.e., a sample size of 1,188). As the sample size increases from 100 to 400, the RRMSE decreases for all designs. For sample sizes greater than 400, the RRMSE remains roughly constant for the SRS and stratified SRS designs.
The following observations can be made by examining Figures 4.1
and 4.2:
- The RB is approximately zero for all follow-up
sample sizes and designs. The only exception is stratified SRS with a follow-up
sample size of 100. The proportional allocation strategy for the follow-up
sample does not ensure that at least one unit is selected from each stratum. Therefore,
for smaller follow-up sample sizes (e.g., 100), some strata end up with no
follow-up sample although they may contain mail-out non-respondents. This
causes a negative bias for the estimation of a population total.
- As the sample size increases from 100 to 400,
the RRMSE decreases for all designs. This can be explained by an increase of
the average number of respondents as the sample size increases (not shown in
the figures).
- For sample sizes greater than 400, the RRMSE
remains roughly constant for the SRS and stratified SRS designs. For those
sample sizes, the average number of respondents remains roughly constant. This
is consistent with equation (3.8). It indicates that, under uniform
response to the follow-up, the expected number of respondents does not vary
with and thus with the follow-up sample size,
provided the budget is expended.
-
The PPS designs seem to be more efficient than
the SRS and stratified SRS designs. However, for sample sizes greater than 400,
the gains in efficiency diminish as the sample size increases.
Response Scenario 2: Response probability correlated to
Sales for the mail-out and uniform for the follow-up
Figures 4.3 and 4.4 show the relative bias and the
RRMSE for Scenario 2, respectively.

Description of Figure 4.3
Figure presenting the relative bias (RB) versus follow-up sample size of the five sampling designs for scenario 2. The RB of the first sampling design, which is the census of the mail-out non-respondents, is given by the last point on the figure (i.e., a sample size of 1,188). The results show that if the mail-out response probability is correlated to Sales, but the follow-upresponse probability is uniform, the bias can be nearly eliminated through the follow-up sampling design. With the exception of the stratified SRS with a follow-up sample size of 100, the RB is approximately zero for all follow-up sample sizes and designs.

Description of Figure 4.4
Figure presenting the relative root mean square error (RRMSE) versus follow-up sample size of the five sampling designs for scenario 2. The RRMSE of the first sampling design, which is the census of the mail-out non-respondents, is given by the last point on the figure (i.e., a sample size of 1,188). As the sample size increases from 100 to 400, the RRMSE decreases for all designs. For sample sizes greater than 400, the RRMSE remains roughly constant for the SRS and stratified SRS designs.
The following observations can be made by examining Figures 4.3
and 4.4:
- The results show that if the mail-out response
probability is correlated to Sales, but the follow-up response probability is
uniform, the bias can be nearly eliminated through the follow-up sampling
design. This can be explained by observing that the Hansen and Hurwitz (1946)
estimator (2.3) is unbiased for any mail-out response mechanism.
- The observations given for Scenario 1 apply to Scenario 2 as well.
Response Scenario 3: Response probability uniform for
the mail-out and correlated to Sales for the follow-up
Figures 4.5 and 4.6 show the relative bias and the
RRMSE for Scenario 3, respectively.

Description of Figure 4.5
Figure presenting the relative bias (RB) versus follow-up sample size of the five sampling designs for scenario 3. The RB of the first sampling design, which is the census of the mail-out non-respondents, is given by the last point on the figure (i.e., a sample size of 1,188). The RB is lowest for sample sizes less than or equal to 400, where we observed that all the units were finalized before the budget ran out. For sample sizes greater than 400, we observed a diminution of the average response rate as the sample size increases, explaining the increase of the RB as the sample size increases.

Description of Figure 4.6
Figure presenting the relative root mean square error (RRMSE) versus follow-up sample size of the five sampling designs for scenario 3. The RRMSE of the first sampling design, which is the census of the mail-out non-respondents, is given by the last point on the figure (i.e., a sample size of 1,188). The RRMSE is minimized for a sample size of 400. For sample sizes greater than 400, we observed a diminution of the average response rate as the sample size increases, explaining the increase of the RRMSE as the sample size increases. The PPS designs seem to be more efficient than the SRS and stratified SRS designs. However, for sample sizes greater than 400, the gains in efficiency diminish as the sample size increases.
The following observations can be made by examining
Figures 4.5 and 4.6:
- The RB is lowest for sample sizes less than or
equal to 400, where we observed that all the units were finalized before the
budget ran out. The lower RB for stratified SRS with a follow-up sample size of
100 is due to strata with no follow-up sample (see Response Scenario 1).
- The RRMSE is minimized for a sample size of 400.
- For sample sizes greater than 400, both RB and
RRMSE increase as the sample size increases. For those sample sizes, we
observed a diminution of the average response rate as the sample size increases
(see the discussion below equation (3.5) for a theoretical justification).
This explains the increase of RB and RRMSE as the sample size increases.
- The PPS designs seem again to be more efficient
than the SRS and stratified SRS designs. However, for sample sizes greater than
400, the gains in efficiency diminish as the sample size increases.
Response Scenario 4: Response probability correlated to
Sales for both the mail-out and the follow-up
Figures 4.7 and 4.8 show the relative bias and the
RRMSE for Scenario 4, respectively.
Figures 4.7 and 4.8 are similar to Figures 4.5
and 4.6. The observations given for Scenario 3 apply to Scenario 4 as
well.

Description of Figure 4.7
Figure presenting the relative bias (RB) versus follow-up sample size of the five sampling designs for scenario 4. The RB of the first sampling design, which is the census of the mail-out non-respondents, is given by the last point on the figure (i.e., a sample size of 1,188). The RB is lowest for sample sizes less than or equal to 400, where we observed that all the units were finalized before the budget ran out. For sample sizes greater than 400, we observed a diminution of the average response rate as the sample size increases, explaining the increase of the RB as the sample size increases.

Description of Figure 4.8
Figure presenting the relative root mean square error (RRMSE) versus follow-up sample size of the five sampling designs for scenario 4. The RRMSE of the first sampling design, which is the census of the mail-out non-respondents, is given by the last point on the figure (i.e., a sample size of 1,188). The RRMSE is minimized for a sample size of 400. For sample sizes greater than 400, we observed a diminution of the average response rate as the sample size increases, explaining the increase of the RRMSE as the sample size increases. The PPS designs seem to be more efficient than the SRS and stratified SRS designs. However, for sample sizes greater than 400, the gains in efficiency diminish as the sample size increases.
4.3 Remarks on the
simulation results
We observed that for
follow-up sample sizes smaller than or equal to 400, and for all sampling
designs and response scenarios, all the units were finalized with an outcome of
“response” or “final non-response” before the budget was exhausted, except for
two simulation replicates. As a result, the follow-up response rate remained
roughly constant whereas the number of respondents increased as the follow-up sample size increased from 100 to 400, reducing the variance and mean square error of the
estimator
For sample sizes of 500 or
over, the follow-up budget always ran out before all the units were finalized.
As the follow-up sample size increased, the number of respondents and finalized
units remained roughly constant. On average, between 430 and 445 cases were
finalized at the end of data collection
depending on the sampling design and response scenario; the other units
were left in the calling queue with an outcome of “still-in-progress”. It thus
appears that the follow-up budget used for the simulation study was just large
enough to finalize around 440 units for sample sizes greater than or equal to
500. Given that the number of respondents remained roughly constant as the
sample size increased, the response rate decreased. The reduction of the
response rate can be explained by a smaller average number of call attempts per
sample unit as the follow-up sample size increases. This has the undesirable
consequence of increasing the bias and mean square error of for the non-uniform follow-up response
mechanism.
From Figures 4.2, 4.4, 4.6 and 4.8, we also observe
that the RRMSE reaches a minimum for a sample size of 400 or 500 depending on
the response scenario and sampling design. The sample size that minimizes the
RRMSE seems to correspond roughly to the minimum sample size that expends the
follow-up budget on average. As discussed above, a smaller sample size
increases the variance of due to a smaller number of respondents,
whereas a larger sample size may increase the bias due to a reduced response
rate. The minimum sample size to expend the follow-up budget appears to be the
same as the expected number of resolved units, which was around 440 in our
simulation study for sample sizes of 500 or above.
The theory developed in Section 3 supports the
above empirical observations for uniform response to the follow-up. Table 4.1 provides values of the sample size
(3.7), the expected number of respondents (3.8), the expected response rate
(3.9), and the expected number of resolved units (3.10) for different values of
and for the values of and used in the simulation study: and The minimum sample size and the expected number of resolved units are equal to 439; this agrees with the
simulation results.
As shown in Table 4.1, a
small value of may reduce
significantly the expected response rate whereas the expected number of
respondents does not vary with provided the budget
is expended. Therefore, under uniform response to the follow-up, there does not
seem to be any advantage to using a follow-up sample size larger than the minimum sample
size to expend the budget on average, which is 439 in this scenario. This
choice maximizes the expected response rate without reducing the expected
number of respondents. Under moderate departure from uniform response, choosing
a sample size close to (or a large value
of would ensure the
non-response bias is better controlled.
Our simulation results indicate that the conclusions
drawn from Table 4.1 hold approximately for non-uniform response to the
follow-up. In particular, the minimum sample size that expends the budget was
close to 439 and the expected number of respondents and resolved units stayed
roughly constant when the follow-up sample size increased. As a result,
incorrectly assuming uniform response when it is not uniform leads to an
appropriate sample size in our simulation setup. Another conclusion of our
simulation study is that choosing a follow-up sample size close to appears to minimize both the non-response bias
and mean square error of However, we will show in the next two examples
that our conclusions may not always hold under larger departures from uniform
response.
Suppose that there are exactly 1,188 mail-out
non-respondents and that the values of and are exactly the same as those used in the
simulation study and Table 4.1. However, for one of the 1,188 units, unit say,
the probabilities and are replaced with and respectively. The response mechanism is almost
uniform, except for one unit with a very small probability of being resolved.
For simplicity, we assume that the follow-up sample is selected using simple
random sampling without replacement. For this
scenario, Table 4.2 shows the sample size (3.3), the expected
number of respondents (3.4), the expected response rate (3.5) and the expected
number of resolved units (3.6) for different values of
Table 4.1
Sample size, expected response rate, and expected number of respondents and resolved units for different values of
under uniform response to the follow-up
Table summary
This table displays the results of Sample size. The information is grouped by (équation) (appearing as row headers), Sample size (3.7), Expected response rate (3.9), Expected number of respondents (3.8) and Expected number of resolved units (3.10) (appearing as column headers).
|
|
Sample size (3.7) |
Expected response rate (3.9) |
Expected number of respondents (3.8) |
Expected number of resolved units (3.10) |
|
|
439 |
83.3% |
366 |
439 |
| 20 |
439 |
83.3% |
366 |
439 |
| 10 |
452 |
81.0% |
366 |
439 |
| 6 |
498 |
73.5% |
366 |
439 |
| 5 |
528 |
69.3% |
366 |
439 |
| 4 |
578 |
63.3% |
366 |
439 |
| 3 |
668 |
54.8% |
366 |
439 |
| 2 |
861 |
42.5% |
366 |
439 |
| 1Note * |
1.188 |
25.0% |
297 |
356 |
Table 4.2
Sample size, expected response rate, and expected number of respondents and resolved units for different values of
when one unit has a very small probability of being resolved
Table summary
This table displays the results of Sample size. The information is grouped by
(appearing as row headers), Sample size (3.3), Expected response rate (3.5), Expected number of respondents (3.4) and Expected number of resolved units (3.6) (appearing as column headers).
|
|
Sample size (3.3) |
Expected response rate (3.5) |
Expected number of respondents (3.4) |
Expected number of resolved units (3.6) |
|
|
20 |
83.3% |
17 |
20 |
| 20 |
439 |
83.2% |
365 |
438 |
| 10 |
452 |
80.9% |
365 |
438 |
| 6 |
498 |
73.5% |
366 |
439 |
| 5 |
528 |
69.3% |
366 |
439 |
| 4 |
578 |
63.3% |
366 |
439 |
| 3 |
668 |
54.7% |
366 |
439 |
| 2 |
861 |
42.5% |
366 |
439 |
| 1Note * |
1.188 |
25.0% |
297 |
356 |
The minimum sample size to expend the budget, on
average, is in that scenario. It is significantly smaller
than 439, the corresponding value for uniform response shown in Table 4.1.
As pointed out in Section 3, using a finite value of may avoid spending too large a portion of the
budget on a few units with a very small probability of being resolved (unit in this example). Indeed, Table 4.2 shows
that the expected response rate decreases marginally by reducing the value of from infinity to 20 whereas the expected
number of respondents drastically increases from 17 to 365. Using a finite
value of seems desirable in this scenario as it may
substantially reduce the variance of The impact on non-response bias is likely to
be negligible unless the value of unit is extremely different from other units. Incorrectly
assuming uniform response for all units would lead to choosing a sample size of
439, as shown in Table 4.1. This choice appears to remain appropriate for
this non-uniform follow-up response mechanism.
Suppose again that there are 1,188 mail-out
non-respondents, the values of and are the same as those used in the simulation
study and Table 4.1, and the follow-up sample is selected using simple
random sampling without replacement. Suppose now the 1,188 mail-out
non-respondents can be divided into two response homogeneous groups, each of
size 594. The probabilities are and for the 594 units in the first group and and for the remaining 594 units. The response
mechanism is not uniform; it is uniform within each of the two response
homogeneous groups. The average probabilities over the 1,188 mail-out
non-respondents are the same as those given in the uniform response scenario. Table 4.3 shows the sample size (3.3), the
expected number of respondents (3.4), the expected response rate (3.5), and the
expected number of resolved units (3.6) for different values of
Table 4.3
Sample size, expected response rate, and expected number of respondents and resolved units for different values of
under uniform response within groups
Table summary
This table displays the results of Sample size. The information is grouped by
(appearing as row headers), Sample size (3.3), Expected response rate (3.5), Expected number of respondents (3.4) and Expected number of resolved units (3.6) (appearing as column headers).
|
|
Sample size (3.3) |
Expected response rate (3.5) |
Expected number of respondents (3.4) |
Expected number of resolved units (3.6) |
|
|
235 |
83.3% |
196 |
235 |
| 20 |
305 |
71.2% |
217 |
261 |
| 10 |
409 |
60.9% |
249 |
299 |
| 6 |
519 |
54.2% |
281 |
338 |
| 5 |
566 |
51.9% |
294 |
352 |
| 4 |
629 |
48.9% |
308 |
370 |
| 3 |
727 |
44.7% |
325 |
390 |
| 2 |
914 |
37.7% |
344 |
413 |
| 1Note * |
1.188 |
25.0% |
297 |
356 |
The minimum sample size to expend the budget, on
average, is which is much smaller than the corresponding
value of 439 for uniform response. In this scenario, using a finite value of does not seem advantageous. By decreasing the
value of from infinity to 20, the expected number of
respondents only increases by 21 whereas the expected response rate decreases
by more than 10%. The small variance reduction could possibly be offset by a
larger increase of non-response bias. The magnitude of non-response bias
depends on the strength of the association between the variable and the response homogeneous groups.
A small value of (a large sample size) might be appropriate if
this association is weak so as to benefit from a larger expected number of
respondents. However, this is a risky choice as the expected response rate
would drop significantly, thereby offering a reduced protection against
departure from the assumed response mechanism. Therefore, a sample size of 439
in this scenario might not be appropriate due to the increased risk of
non-response bias. Then non-response bias can be dampened at the estimation
stage, at least asymptotically, by computing the non-response weight adjustment
(2.5) separately for each response homogeneous group. This weighting strategy
is standard and should be used when response homogeneous groups can be
identified; yet it does not offer full protection against departure from the
assumed response mechanism. It is for this reason that a large value of even infinite, may be preferable in this
scenario.
As pointed out in Section 3, plots of the expected
response rate and the expected number of respondents as a function of may be useful to determine a suitable
trade-off between the maximization of the expected response rate and the maximization of the expected number of
respondents, as illustrated in the above examples. An infinite value of should be the default as it minimizes
non-response bias. However, a large finite value of might be appropriate if it sharply increases
the expected number of respondents with minimal impact on the expected response
rate.