Science and survey management
Section 2. Responsive and adaptive design

Responsive and adaptive designs refer to a family of methods for tailoring field work to reduce bias, variance, or cost (see Chun, Heeringa and Schouten (2018); Schouten, Peytchev and Wagner (2017); and Tourangeau, Brick, Lohr and Li (2017), for reviews). With responsive designs, researchers use multiple phases of data collection to reduce survey costs or errors. Adaptive designs use various forms of case prioritization, tailoring, and rules for stopping data collection to achieve similar goals.

Groves and Heeringa (2006) got this particular ball rolling with their description of responsive designs:

Responsive designs are organized about design phases. A design phase is a time period of a data collection during which the same set of sampling frame, mode of data collection, sample design, recruitment protocols, and measurement conditions are extant. For example, a survey may start with a mail questionnaire attempt in the first phase, follow it with a telephone interview phase on non-respondents to the first phase and then have a final third phase of face-to-face interviewing. … Note that this use of “phase” includes more design features than merely the sample design, which are common to the term “multi-phase sampling”. (Pages 440-441)

Of course, the American Community Survey had been using a three-phase design (mail followed by telephone follow-up followed by face-to-face follow-up with a subsample of the remaining cases) just like the one Groves and Heeringa described years before Groves and Heeringa dubbed these “responsive designs” (U.S. Census Bureau, 2014).

Groves and Heeringa cite several surveys that used responsive designs but focus mainly on Cycle 6 of the National Survey of Family Growth (NSFG). Most of the surveys they discuss, including Cycle 6 of the NSFG, applied two-phase sampling (that is, they selected a subsample of the nonrespondents remaining at a certain point in the field period and restricted further follow-up to this subsample) and offered larger incentives or made other changes to the data collection protocol for these final-phase cases. The real innovation in the NSFG was not in its use of multiple phases of sampling (which had been around since Hansen and Hurwitz (1946)) or multiple modes of data collection (in fact, in the NSFG, all the cases were interviewed face-to-face) but in the application of paradata and real-time propensity modeling to guide the field work. The subsampling of nonrespondents in the Cycle 6 of the NSFG was based on propensity models that were updated frequently and that incorporated information gleaned from prior contacts with the sample case. In the final phase of Cycle 6 the NSFG, data collection was restricted to certain sample areas, with areas with larger numbers of active cases and those with cases with relatively high estimated propensities more likely to be retained for further follow-up field work.

Another difference between responsive designs and more traditional multi-phase designs, at least conceptually, is the notion of phase capacity. Groves and Heeringa argue that a given phase of data collection approaches a limit in its ability to change the survey estimates (and reduce any biases). Once it reaches this capacity limit, a change in protocol may be needed to improve the representativeness of the sample and reduce bias. Ideally, the later phases of data collection bring in different types of respondents from the earlier phases, reducing any remaining nonresponse biases. Different types of people may be inclined to respond by mail from those who respond to a face-to-face interview; larger incentives may help recruit those who are not interested in the topic (Groves, Singer and Corning, 2000). In the best case, the different phases of data collection are complementary and, together, create a more representative sample than each of the individual phases.

2.1    Case prioritization and related strategies

Cycle 6 of the NSFG is an early example of a strategy known as case prioritization ‒ deliberately allocating more effort to some sample cases than to others. Of course, survey managers have always given priority to some cases over others. Interviewers are instructed to make sure they keep appointments, for example, or to set “soft” refusal cases aside for a while. What is different about the recent uses of case prioritization is that they are not based on a case’s disposition but on models of the case’s response propensity. In the Cycle 6 of the NSFG, a probability subsample of cases was kept for further work, with the second phase sampling probabilities partly based on the predicted propensities of the remaining cases. Later efforts have been explicit in their use of response propensities to guide the field work.

Depending on which cases are prioritized, case prioritization can serve a variety of goals. For example, focusing field work on cases with high response propensities may maximize the final sample size or reduce the costs per case. Beaumont, Bocci and Haziza (2014) distinguish three potential goals for such designs:

  1. Minimizing variance;
  2. Minimizing nonresponse bias or some proxy for it, such as sample imbalance (Särndal, 2011; see also Schouten, Cobben and Bethlehem, 2009); or
  3. Maximizing response rates.

The first and third goals are related in that maximizing response rates tends to produce larger samples and, as a result, lower sample variances. Although some researchers have begun looking at the use of such designs to reduce measurement errors (Calinescu, Bhulai and Schouten, 2013), most efforts to date have been attempts to reduce nonresponse bias or costs.

With Cycle 6 of the NSFG, it is not completely clear what the statistical goal was. Oversampling areas with larger numbers of remaining cases and those with higher-propensity cases would tend to maximize the final sample size and reduce costs per case. Consistent with this, Groves, Benson, Mosher, Rosenbaum, Granda, Axinn, Lepkowski and Chandra (2005) noted that “this design option placed large emphasis on the cost efficiency of the … [final] phase design to produce interviews, not on minimizing standard errors of the resulting data set”. However, Groves et al. (2005) also said that the final phase of data collection was intended to produce a “more representative” sample (page 38) by altering the data collection protocol to appeal to sample members who had failed to respond earlier. However, targeting areas with more cases with high estimated response propensities ‒ that is, the cases predicted to be easiest to get ‒ might actually exacerbate any problems with representativeness by bringing in additional respondents similar to those who had already responded.

Most later applications of case prioritization have taken the opposite tack, attempting to equalize the overall response propensities by focusing the field effort on the hardest cases. To see why this is a reasonable strategy, it is useful to take a closer look at the mathematics of nonresponse bias.

2.2    Factors affecting nonresponse bias

Under a stochastic perspective (e.g., Bethlehem, 1988), the bias of the unadjusted estimator of a mean or proportion ( y ¯ ^ ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaadaqadeqaaiqadMhagaqe gaqcaaGaayjkaiaawMcaaaaa@4250@ can be expressed as

                                          Bias ( y ¯ ^ ) σ ϕ σ y ρ ϕ , y ϕ ¯ , ( 2.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaacaqGcbGaaeyAaiaabgga caqGZbWaaeWabeaaceWG5bGbaeHbaKaaaiaawIcacaGLPaaacaaMe8 UaaGjbVlabgIKi7kaaysW7caaMe8+aaSaaaeaacqaHdpWCdaWgaaWc baGaeqy1dygabeaakiabeo8aZnaaBaaaleaacaWG5baabeaakiabeg 8aYnaaBaaaleaacqaHvpGzcaGGSaGaamyEaaqabaaakeaacuaHvpGz gaqeaaaacaGGSaGaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caGGOa GaaGOmaiaac6cacaaIXaGaaiykaaaa@67CB@

where ϕ ¯ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaacuaHvpGzgaqeaaaa@4181@ and σ ϕ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaacqaHdpWCdaWgaaWcbaGa eqy1dygabeaaaaa@4358@ are the mean and standard deviation of the response propensities, σ y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaacqaHdpWCdaWgaaWcbaGa amyEaaqabaaaaa@428E@ is the standard deviation of a survey variable, and ρ ϕ , y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaacqaHbpGCdaWgaaWcbaGa eqy1dyMaaiilaiaadMhaaeqaaaaa@4503@ is the correlation between the response propensities and that survey variable. As (2.1) clearly demonstrates, both the overall response rate ( ϕ ¯ ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaadaqadeqaaiqbew9aMzaa raaacaGLOaGaayzkaaaaaa@430B@ and the variation in the response rates ( σ ϕ ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaadaqadeqaaiabeo8aZnaa BaaaleaacqaHvpGzaeqaaaGccaGLOaGaayzkaaaaaa@44EC@ play a role in the bias, so that trying to maximize the response rates (e.g., by prioritizing the relatively easy cases) or to equalize the response propensities (by prioritizing the harder cases) are both reasonable things to do.

As a number of researchers have pointed out, nonresponse bias is a property of a survey estimate not of a survey, and, as (2.1) makes explicit, two variable-level properties also affect the bias ‒ the correlation between the survey variable and the response propensities ( ρ ϕ , y ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaadaqadeqaaiabeg8aYnaa BaaaleaacqaHvpGzcaGGSaGaamyEaaqabaaakiaawIcacaGLPaaaaa a@4697@ and the variability of the survey variable ( σ y ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaadaqadeqaaiabeo8aZnaa BaaaleaacaWG5baabeaaaOGaayjkaiaawMcaaiaacYcaaaa@44D2@ both of which vary from one survey variable to the next. Given that two of the ingredients in the bias expression are study-level factors and two are variable-level, the question arises how much of the variation in nonresponse bias is between surveys and how much is within surveys.

Brick and I (Brick and Tourangeau, 2017) attempted to address this issue by reanalyzing data from a study done by Groves and Peytcheva (2008). They examined 959 nonresponse bias estimates from 59 studies. Eight hundred and four of these bias estimates involved proportions; almost all the others were means. (Four of the estimates seemed problematic to us, so we dropped them from our reanalysis.) Like Groves and Peytcheva, we examined the absolute relative bias statistic (absolute relbias), or the absolute difference between the respondent estimate and the full sample estimate divided by the full sample estimate:

                                                R i = | θ r i θ n i | θ n i , ( 2.2 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaacaWGsbWaaSbaaSqaaiaa dMgaaeqaaOGaaGjbVlaaykW7cqGH9aqpcaaMe8UaaGPaVpaalaaaba WaaqWabeaacaaMc8UaeqiUde3aaSbaaSqaaiaadkhacaWGPbaabeaa kiaaysW7cqGHsislcaaMe8UaeqiUde3aaSbaaSqaaiaad6gacaWGPb aabeaakiaaykW7aiaawEa7caGLiWoaaeaacqaH4oqCdaWgaaWcbaGa amOBaiaadMgaaeqaaaaakiaacYcacaaMf8UaaGzbVlaaywW7caaMf8 UaaGzbVlaacIcacaaIYaGaaiOlaiaaikdacaGGPaaaaa@6A85@

in which R i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaacaWGsbWaaSbaaSqaaiaa dMgaaeqaaaaa@4192@ is the absolute relbias for statistic i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaacaWGPbGaaiilaaaa@413F@ θ r i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaacqaH4oqCdaWgaaWcbaGa amOCaiaadMgaaeqaaaaa@4368@ is the estimated value for that statistic based on the respondents, and θ n i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaacqaH4oqCdaWgaaWcbaGa amOBaiaadMgaaeqaaaaa@4364@ is the corresponding full sample estimate. The absolute relbias is useful in that it puts all the bias estimates on the same metric the percentage by which the estimate is off. Our reanalysis also examined the absolute differences (the numerator in (2.2)) for the estimated proportions.

Table 2.1 displays various statistics from the reanalysis. For example, we calculated the correlation between the individual bias estimates and the study-level response rates; these results are shown in the top panel of the table. The middle three panels of the table show what happens when the average bias from the study is used in place of the individual bias estimates. Some of the correlations based on study-level averages are considerably higher than those based on the individual estimates, particularly when the data are weighted by the number of estimates from each study ( r ’s MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbcjxDZHgitnMC PbhDG0evaebbnrfifHhDYfgasaacH8rrps0lbbf9q8asVK0hHaVhbb f9v8qrpq0dc9vqFj0db9qqvqFr0dXdHiVc=bYP0xH8peeu0xXdcrpe 0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGacaGaaeqabaqaamGaea aakeaacaGGOaGaamOCaiaabMbicaqGZbaaaa@3BF7@ of 0.40 to 0.55). The bottom two panels of the table show that there is a substantial study-level component to the nonresponse bias. For example, the R 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaacaWGsbWaaWbaaSqabeaa caaIYaaaaaaa@4161@ estimates from a one-way ANOVA indicate that the between-study component accounts for 21 to 40 percent of the overall variation in the nonresponse bias estimates. The results from multi-level models lead to similar conclusions. This between-study component of the bias presumably reflects two main variables ‒ the mean response propensity (reflected in the overall response rate) and the variation across respondents in the response propensities.


Table 2.1
Relationship between response rates and bias measures at the estimate and study level
Table summary
This table displays the results of Relationship between response rates and bias measures at the estimate and study level All statistics and Proportions only (appearing as column headers).
All statistics Proportions only
Estimate-level correlations
Response rate and absolute relbias -0.191( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 955) -0.256 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 802)
Response rate and absolute difference - -0.323 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 802)
Unweighted study-level correlations
Response rate and mean absolute relbias -0.255 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 57) -0.315 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 43)
Response rate and mean absolute difference - -0.246 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 43)
Study-level correlations weighted by number of estimates
Response rate and mean absolute relbias -0.402( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 57) -0.552 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 43)
Response rate and mean absolute difference - -0.508 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 43)
Study-level correlations weighted by mean sample size
Response rate and mean absolute relbias -0.413 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 57) -0.247 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 43)
Response rate and mean absolute difference - -0.208 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 43)
Estimate-level ICCs from multilevel model
Absolute relbiases 0.164 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 955) 0.161 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 802)
Absolute differences - 0.509 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 802)
Estimate-level R 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edbe qabeWacmGabiqabeqabmWaemqbbaGcbaGaamOuamaaCaaaleqabaGa aGOmaaaaaaa@4460@ from one-way ANOVA
Absolute relbiases 0.221( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 955) 0.211 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 802)
Absolute differences - 0.395 ( n= MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acPqpw0le9v8qqaqVG0hP8F4rqqrFfpu0de9GqFf0xc9qqpeuf0xe9 q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr0=vr0=edba qaaeGaciGaaiaabeqaamGaeiqbaaGcbaGaamOBaiabg2da9aaa@448F@ 802)

The results in Table 2.1 are important because responsive and adaptive designs work primarily at the study level. For example, case prioritization generally either increases the overall response propensities or reduces the variation in the propensities, and these are the two main study-level variables affecting the level on nonresponse bias. In addition, if a design succeeds in reducing the overall variation in the response propensities, this will tend to attenuate the correlations between the propensities and the survey variables across the board. At the extreme, if there is no variation in the response propensities, the correlation with all the survey variables will be zero and there won’t be any nonresponse bias. The results in Table 2.1 seem to contradict the view that response rates don’t matter. Nonresponse rates are clearly an imperfect proxy for nonresponse bias, but they are an important predictor of the average level of bias in the estimates from a survey.

2.3    Experimental evaluations of responsive and adaptive designs

How well do responsive and adaptive designs achieve their goals? At the outset, I should note that our expectations shouldn’t be too high. As we noted in an earlier paper (Tourangeau et al., 2017, page 208), these designs “represent an attempt to do more with less or at least to do as much as possible with less” in an increasingly survey unfavorable environment. To date, studies have used four basic strategies to achieve one or more statistical goals ‒ multi-phase designs (like the one described by Groves and Heeringa, 2006), other types of case prioritization (in which different cases are slated to receive different levels of effort), adaptive contact strategies (changing the timing of contact attempts based on propensity models to maximize the chances of making contact), and tailoring of the field work or mode of data collection based on what is known about the cases before they are fielded. I briefly review some of the major efforts to evaluate each of these approaches.

Multi-phase designs and case prioritization. Peytchev, Baxter and Carley-Baxter (2009) report another study that, like Cycle 6 of the NSFG, employed a multi-phase design. They conducted a telephone study with two phases. The second phase used a much shorter questionnaire and offered a larger incentive than the first. Cases received up to twenty calls during Phase 1, with some cases getting even more. Overall, this phase produced a response rate of 28.5 percent. In Phase 2, the researchers subsampled the remaining nonrespondents, shortened the questionnaire from 30 to 14 minutes, gave a prepaid incentive of $5, and offered a conditional incentive of $20. (Phase 1 had offered only conditional incentives.) Phase 2 produced a response rate of 9.8 percent (or 35.5 percent overall). The evaluation of the design was based two sets of comparisons: Peytchev and his colleagues compared early and late respondents from Phase 1 and they compared Phase 1 to Phase 2 respondents. They reasoned that the late respondents (interviewed after at least six call attempts) from Phase 1 were unlikely to differ on the key study variables ‒ reported crime victimizations of various sorts ‒ from the early respondents (interviewed in five or fewer attempts) because they were recruited via the same protocol. The results indicated that the addition of the late Phase 1 respondents did not significantly change the estimates. In contrast, the authors believed the Phase 2 respondents were likely to differ from the Phase 1 respondents, because the changes in protocol would attract different types of respondents. There was some support for this line of argument for males. The Phase 1 male respondents were more likely to report victimizations than the Phase 2 male respondents, with significant differences on four of six victimization rates. However, there was less evidence that the change in protocol in Phase 2 affected the estimates for females. In addition, even within the Phase 1 sample, there were differences between male cases who never refused and those who were converted after refusing. Like the Phase 2 male respondents, the converted Phase 1 male refusals also showed significantly lower victimization rates on four of six key estimates. This suggests that the refusal conversion protocols changed the make-up of the Phase 1 sample and did not just bring in more of the same type of respondents.

Peytchev, Riley, Rosen, Murphy and Lindblad (2010) report a study that tailored the data collection protocol for different groups of cases from the outset. Their study involved a panel survey and the response propensities for each case was estimated using information from the prior round. Cases with low predicted response propensities were randomly assigned to an experimental or control treatment. For most of the data collection period, interviewers got a $10 bonus for each completed interview with one of the control cases, but $20 for each completed interview with one of the experimental cases. (During Phase 1, there was no bonus for control interviews and a $10 bonus for experimental interviews.) There was little difference in the final response rates for the two groups of cases (89.8 percent for the control cases versus 90.8 percent for the experimental cases) or in the average number of contact attempts per case (5.0 for the controls versus 4.9 for the experimental cases). Although the variance in the estimated response propensities was lower among the experimental cases, the estimated nonresponse biases (based on the correlations between the survey variables and the fitted response propensities) were higher.

Another set of experiments illustrates some of the practical difficulties with case prioritization. Wagner, West, Kirgis, Lepkowski, Axinn and Kruger Ndiaye (2012; see also Lepkowski, Mosher, Groves, West, Wagner and Gu (2013)) carried out 16 experiments over the course of Cycle 7 of the NSFG, which fielded 20 quarterly samples. The experiments examined the effectiveness of “assigning a random subset of active cases with specific characteristics to receive higher priority from the interviewers… The first objective of these experiments was to determine whether interviewers would respond to a request to prioritize particular cases” (Wagner et al., 2012, page 482). In only seven of the 16 experiments did the priority cases actually receive significantly more calls than the control cases, and only twice did this lead to a significant increase in response rates for the priority cases. Additional experiments attempted to shift the effort of NSFG interviewers from trying to complete main interviews to trying to complete screeners during one week of the field period. This intervention did lead to more screener calls than in prior or later weeks, but the impact on the number of completed screeners varied across quarters. In both cases, the efforts at case prioritization in Cycle 7 of the NSFG had some impact on what the interviewers did, but less impact on the intended survey outcomes, such as response rates.

Statistics Canada has also begun implementing responsive designs for its CATI surveys and carried out two experiments assessing these designs. Both experiments used three phases of data collection with case prioritization in one phase (Laflamme and Karaganis, 2010; Laflamme and St-Jean, 2011). In Phase 1, cases were categorized by response propensities; in Phase 2, cases were randomly assigned either to the responsive collection condition (in which cases were assigned priorities and the high priority cases got more calls) or the control condition; and in Phase 3, all remaining cases got the same treatment. In Phase 2, the priority cases in the responsive collection group were apparently those with high predicted response propensities. The goal in Phase 3 was to equalize response propensities across key subgroups. Once again, the results indicated modest effects. The overall response rates were essentially unaffected by case prioritization. In one survey, the response rates were 74.0 percent for the control group versus 74.1 for the responsive collection group; in the other, the control group had a slightly higher response rate (73.0 versus 72.8 percent). This is a little surprising since the responsive collection targeted the easier cases in Phase 2. In addition, neither the new three-phase design nor the responsive collection protocol had a clear effect on the representativeness of the samples, but may have decreased the number of interviewer hours (see Table 2.2 in Laflamme and St-Jean (2011)). Still, reducing costs without reducing representativeness may represent a worthwhile, if modest, advance.

Adaptive contact strategies. Can survey managers improve the rate at which sample members are contacted by modelling the best time to contact them? Although many papers have explored optimal times for contacting sample members in surveys, few have examined whether these “optimal” call schedules produce gains empirically. Wagner (2013) is an exception. He reported five experiments that used models to predict whether a given sample household would be contacted on the next call attempt in each of four call “windows” (e.g., Tuesday through Thursday from 4 p.m. to 9 p.m.). Similar models were used in telephone (the Survey of Consumer Attitudes, or SCA) and face-to-face (Cycle 7 of the NSFG) surveys. The models were used to identify the best call window (the one with the highest probability of a contact) for each sample household. In the experimental groups, cases were moved to the top of the list for calling in that window (in the SCA) or field interviewers received that window as the recommended time to contact the household (in the NSFG).

Three experiments involved the SCA. In the first, the proportion of calls producing a contact was higher for the experimental cases than for the controls (12.0 percent versus 9.9 percent), but the strategy seemed to backfire for cases who had initially refused, with lower contact rates among the initial refusals in the experimental group. A second experiment varied the call window for experimental cases after an initial refusal but this strategy lowered the overall proportion of calls producing a contact. The final SCA experiment still found that the contact rate for refusal conversion calls was lower in the experimental group than in the control group. The results in the NSFG were also somewhat disappointing. The field interviewers apparently ignored the recommended call windows; only 23.6 percent of the experimental cases were contacted in the recommended window (versus 23.0 percent in the control group). We had a similar experience in our effort to get interviewers to follow an optimal route in their trips to the field (see Section 3.1 below).

Tailored field work. Luiten and Schouten (2013) report an experiment that tailored the data collection approach to different subgroups in the Dutch Survey of Consumer Sentiments (SCS). The goal was to equalize response propensities across the subgroups. The SCS consists of repeated cross-sectional surveys and, based on earlier rounds, Luiten and Schouten fit contact and cooperation propensity models based on demographic characteristics of the sample members; these variables were available for the entire sample from the population registry. There were two phases of data collection. In the initial phase, cases with lowest estimated cooperation propensities were sent a mail questionnaire; those with the highest estimated propensities were invited to complete a web survey; and those in the middle were given a choice between mail and web. The second phase consisted of following up nonrespondents by telephone. Cases in different contact propensity quartiles were assigned to different call schedules. Those with the highest estimated contact propensities were fielded later in the field period and called during the day; those in the second highest quartile were called twice at night and then switched to a schedule alternating daytime and nighttime calls; and those in the lowest two contact propensity quartiles were called on every shift of every day. Finally, the best telephone interviewers were assigned to the cases with the lowest estimated cooperation propensities and the worst telephone interviewers were assigned to the cases with the highest estimated cooperation propensities. The control group for the experiment was the regular SCS, which is a CATI-only survey.

Although the adaptive field work group had only a slightly higher response rate than the regular SCS (63.8 percent versus 62.8 percent, a non-significant difference), the representativeness of the experimental sample, as measured by the R-indicator, was significantly higher than that of the control sample. (The R-indicator, introduced by Schouten, Cobben and Bethlehem (2009), is based on the variation in the estimated response propensities. A higher number indicates less variation and therefore a more representative sample.) Table 2.2 below shows that the adaptive field work did lower the variation in both contact and cooperation rates. Across contact propensity quartiles, the contact rates ranged from 84.2 percent to 96.9 percent in the regular SCS; in the experimental sample, the range was from 87.1 to 95.3. The adaptive design also lowered variation in the cooperation rates. Still, the costs for the adaptive design were marginally higher than those of the SCS and the overall cooperation rate was significantly lower in the experimental sample. Unfortunately, as this study illustrates, reducing the variability in the response propensities often means not trying as hard to get the easiest cases and this may lower the overall response rate.


Table 2.2
Contact and cooperation rates, by propensity quartile groups
Table summary
This table displays the results of Contact and cooperation rates. The information is grouped by Contact propensity quartile (appearing as row headers), Contact rates (appearing as column headers).
Contact propensity quartile Contact rates
Experimental Control
Lowest Contact Propensity 87.1 84.2
Second Lowest Contact Propensity 96.6 94.5
Second Highest Contact Propensity 93.7 95.7
Highest Contact Propensity 95.3 96.9
Cooperation propensity quartile Cooperation rates
Experimental Control
Lowest Cooperation Propensity 65.1 62.7
Second Lowest Cooperation Propensity 71.4 68.4
Second Highest Cooperation Propensity 72.8 75.3
Highest Cooperation Propensity 74.7 79.2

2.4    Simulation studies

Besides the experiments discussed in the previous section, three additional studies have used simulations to explore the properties of responsive and adaptive designs.

Stopping rules. Lundquist and Särndal (2013) used data from the 2009 Swedish Living Conditions Survey (LCS) to explore the impact of various “stopping rules”, rules for ending data collection. The LCS follows a two-phase data collection strategy, with up to 20 telephone contact attempts in the first phase of data collection followed by ten more in the second phase. They noted that continuing to follow the same data collection protocol “will produce very little change in the estimates beyond a certain ‘stability point’ reached quite early in the data collection” (page 561). This is quite similar to Groves and Heeringa’s (2006) notion of “phase capacity”, or the point at which a given data collection protocol begins to achieve diminishing (or vanishing) returns. Sturgis, Williams, Brunton-Smith and Moore (2017) present results suggesting that this stability point may be reached quite early during the field period. They examined estimates derived from 541 questions from six face-to-face surveys in the U.K. They found that the expected proportions were, on average, only 1.6 percent from the final estimate after a single contact attempt and were off by only 0.4 percent after five attempts. These results suggest that, from the vantage point of reducing bias, a lot of field effort is wasted.

Lundquist and Särndal show that the estimated nonresponse bias (based on three variables available for both respondents and nonrespondents from the Swedish population register) in the LCS was lowest after five to ten call attempts and actually got progressively worse thereafter. The second phase of data collection, which increased the response rate from 60.4 percent to 67.4 percent, made the nonresponse biases worse for two of the three register variables. They examined three alternatives to continuing the same protocol up to 30 attempts. They divided the sample into eight subgroups based on education, property ownership, and national origin. Under the first alternative response rates for each of eight the subgroups would be checked at call 12 of the initial phase of data collection and again at call 2 of the second phase; data collection would end for subgroups with response rates of 65 percent or better at these points. This strategy would have yielded a lower response rate (63.9 percent) than the actual protocol but a sample that was more closely aligned with the population on eight demographic characteristics. The second alternative they examined would have ended data collection for a subgroup as soon as its response rate reached 60 percent and the third alternative, as soon as the subgroup response rate reached 50 percent. The 50 percent strategy would have produced the most balanced sample of all and would have reduced the total number of call attempts by more than a third. In part, this strategy worked so well because it would have lowered the response rates in the high propensity subgroups so they were closer to those in the low propensity subgroups. As in the study by Peytchev and colleagues (Peytchev et al., 2009), continuing with the same data collection protocol seemed to do little improve the representativeness of the sample, and may in fact have reduced it.

In a related effort, in 2017, the Medical Expenditure Panel Survey (MEPS) used a stopping rule based on a propensity model. MEPS is a rotating panel study. Each year a new panel of about 10,000 addresses is selected from a sample of households that completed National Health Interview Survey the previous year. Sample households were asked to complete two MEPS interviews in their first year, two in the second, and a fifth in the third year. The survey is continuous, with interviews conducted throughout the year. The stopping rules were applied in two stages in the first half of 2017: first to cases in their third round (a relatively soft start, since most Round 3 interviews were scheduled by telephone and most respondents were cooperative, having already participated twice), and then to Round 1 cases. Interviewers are often reluctant to comply with directions to stop contacting a case after a specific number of attempts. The MEPS approach was to remove low propensity cases with too many attempts ‒ generally six ‒ from the interviewer assignment and have a supervisor review them. Supervisor could move a case back into the interviewer’s assignment if there was some reason to believe the case might be completed, but most of the time these cases were closed out (Hubbard, 2018). Overall, implementing the stopping rule reduced the number of in-person attempts by 8,500, producing a large saving in field costs.

Different case prioritization strategies. In a later paper, Särndal and Lundquist (2014b) simulated the effects of two methods for equalizing response propensities across cases, using data from the Living Conditions Survey and the Party Preference Survey. Under the first method (the threshold method), no further follow-up attempts are made to cases whose response propensities have reached some threshold (lower than the overall target response rate). This is similar to the strategies examined in their earlier paper (Lundquist and Särndal, 2013). Under the other method (the equal proportions method), at various points during the field period (e.g., after three, six, or nine call attempts), the portion of the sample with the highest response propensities is set aside and field work continues only for the remaining cases. In both surveys, both methods for equalizing the response propensities reduced the distance between the respondents and the full sample on a set of auxiliary variables, as compared to continuing to field all remaining nonrespondents, as was done in the actual surveys. Another conclusion from this study is that calibrating the sample using the auxiliary variables removed some of the nonresponse bias, but that bias was reduced even further when the set of respondents was more closely aligned with the population in the first place. This is an important finding, since the same variables available for fitting propensity models are also available for post-survey adjustments, and it is not clear whether equalizing response rates (or response propensities) during data collection is more effective than simply adjusting the case weights afterwards. Särndal and Lundquist (2014b) find gains for both.

Beaumont, Bocci and Haziza (2014) report another simulation study that examines the impact of case prioritization. They contrasted four strategies: 1) constant effort (no case prioritization); 2) optimal effort (by reducing calls to members of groups approaching their target response rate); 3) equalizing response rates across groups (by concentrating calls on low response propensity groups); and maximizing the overall response rate (by concentrating calls on high propensity groups). The simulations by Beaumont and his colleagues assumed three different scenarios ‒ uniform response propensities, uniform response propensities within groups, and response propensities that are highly ( r = 0.67 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBamXvP5wqonvsaeHbmv3yPrwyGmuy SXwANjxyWHwEaeHbcjxDZHgitnMCPbhDG0evaebbnrfifHhDYfgasa acH8rrps0lbbf9q8asVK0hHaVhbbf9v8qrpq0dc9vqFj0db9qqvqFr 0dXdHiVc=bYP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tme aabaqaciGacaGaaeqabaWacqGafaaakeaadaqadeqaaiaadkhacaaM e8Uaeyypa0JaaGjbVlaaicdacaGGUaGaaGOnaiaaiEdaaiaawIcaca GLPaaaaaa@492F@ correlated with the survey variable of interest. (In addition, the simulation assumed that the sample consisted of three subgroups, that calls yielding an interview were 25 times more expensive than ones that didn’t, that calls to a case were capped at 25, and the survey had a fixed data collection budget.)

The simulations supported three major conclusions. First, when response propensities are constant overall or constant within each group, all the effort strategies produce unbiased estimates, but when the propensities were strongly related to the survey variable, all of them produced bias. Second, neither the R-indicator nor the nonresponse rate was a good indicator of nonresponse bias or nonresponse variance. Finally, when response propensities were known, the optimal effort strategy produced somewhat lower root mean square error than the other strategies (see Table 2.2 in Beaumont et al. (2014)) and the strategy that attempted to maximize response rates produced the worst. The optimal effort strategy resembles the approaches explored by Lundquist and Särndal (2013). Of course, a practical difficulty is that response propensities are not known with real surveys, and they may not be accurately estimated from the available auxiliary variables.

2.5    Summary

Table 2.3 summarizes the results from the experimental and simulation studies. In general, they show how hard it is to raise response rates in the current environment. For example, only two of the 16 experiments described by Wagner and his colleagues significantly raised response rates in the NSFG (Wagner et al., 2012). Some studies (e.g., Luiten and Schouten, 2013) demonstrate reductions in variation in response rates across subgroups of the sample, although in one study (Peytchev et al., 2010) this apparent reduction in the variation in estimated response propensities appeared to increase nonresponse bias rather than reduce it. Laflamme and St-Jean (2011) reported that responsive design reduced costs relative to the standard protocol, but Luiten and Schouten (2013) reported that an adaptive design increased the costs per case. Across all the studies (including Cycle 6 of the NSFG), then, responsive and adaptive designs appeared to produce some gains in sample representativeness, but had little effect on overall response rates or overall costs.

Several non-experimental studies come to similar conclusions. These studies compare the final survey estimates with those that would have been obtained without the final phase of data collection, when a major change in the data collection protocol was introduced. For example, Groves and his colleagues (Groves et al., 2005) showed that the final phase of data collection in Cycle 6 of the NSFG, which boosted the overall response rate from 64 to 80 percent, also decreased variation in the response rates across subgroups (see also Axinn, Link and Groves, 2011). This is similar to the experimental results reported by Peytchev, Baxter and Carley-Baxter (2009) who found that major changes in protocol (larger incentives and a shorter questionnaire) produced changes in the study estimates, at least for males. However, the changes were generally small ‒ less than two percentage points.


Table 2.3
Selected study characteristics and outcomes, by study
Table summary
This table displays the results of Selected study characteristics and outcomes. The information is grouped by Experimental Study (appearing as row headers), Statistical Goal, Intervention and Results (appearing as column headers).
Experimental Study Statistical Goal Intervention Results
Peytchev et al. (2010) Equalize response propensities Bonus for interviewers for completing high priority cases
  • Variance in response propensities lower in experimental group
  • Estimated bias higher in experimental group
  • Response rate 1.5% higher in experimental group
Wagner et al. (2012) Increase response rates, improve representativeness Case prioritization
  • Significantly increased number of calls to priority cases in seven of 16 experiments
  • Significantly increased response rate in two experiments
Screener week
  • Increased number of screening calls
Laflamme and St-Jean (2011) Increase response rates (Phase 2), equalize response propensities (Phase 3) Categorization and prioritization of cases
  • Less variance in response propensities in experimental group
  • Response rate 1.5% higher in experimental group
Wagner (2013) Increase contact rate per call Models used to assign cases to optimal call window SCA
  • Contact rate improved (12.0 vs. 9.9 percent)
  • No change in response rate
NSFG
  • Interviewers did not follow recommended call window
Luiten and Schouten (2013) Equalize response propensities Initial mode (mail versus Web) varied by propensity quartile
  • Lower cooperation rate in adaptive group
  • R-indicator significantly improved in adaptive group
  • Reduced variation in contact and cooperation rates in adaptive group
  • No significant difference in costs or response rates
Hard cases assigned to best telephone interviewers, easiest to worst telephone interviewers
Simulation Study Statistical Goal Intervention Results
Lundquist and Särndal (2013) Increase sample balance, reduce nonresponse bias Stopping data collection for a subgroup once a target rate achieved for that subgroup
  • Lowest response rate threshold produced the highest balance
  • Lowest threshold also achieved lowest nonresponse bias (on three registry variables)
Särndal and Lundquist (2014a, b) Increase sample balance, reduce nonresponse bias 12 stopping rules
  • Lowest response rate threshold again produced the highest balance
  • Lowest threshold also achieved lowest nonresponse bias on three registry variables
  • Both balance in data collection and calibration reduce nonresponse bias
Beaumont, Bocci and Haziza (2014) Optimal effort, equalize response rates, maximize overall response rate Four case prioritization strategies (constant effort, reduce effort for groups approaching target response rate, prioritize low propensity cases, prioritize high propensity cases)
  • With uniform response propensities, all four strategies yield unbiased estimates
  • When response propensities strongly related to survey variables, all strategies produce biased estimates
  • With known propensities, optimal strategy yields best root mean square error (RMSE); maximizing the response rate, the worst

Date modified: