Strategies for subsampling nonrespondents for economic programs
Section 4. Conclusion

Table of contents

In general, the NRFU procedures for economic programs conducted by the U.S. Census Bureau follow a calendar schedule. Budget is tied to the fiscal year, and contact strategies are budgeted accordingly. Since economic populations are highly skewed and the statistics of interest are totals, a large fraction of the NRFU budget is allocated to the larger units. The smaller units are believed to be homogeneous $-$ at least in size. However, it is difficult to validate that belief in the absence of collected respondent data. Given that the NRFU procedures rely on obtaining response data from the larger units, the response rates from smaller units tend to be much lower. It is quite likely that the realized respondent set is neither “balanced…which means (the selected sample has) the same or almost the same characteristics as the whole population” for selected items (Särndal, 2011) nor “representative… with respect to the sample if the response propensities $ρ_{i}$ are the same for all units in the population” (Schouten et al., 2009). The emphasis on obtaining responses from the larger units at the cost of the lower unit response in turn creates a bias in the estimates, as imputed or adjusted values for smaller units resemble the large unit values (Thompson and Washington, 2013).

By limiting the target domain for nonrespondent subsampling to the smaller units, we can reduce this unmeasurable bias. Our allocation method increases the potential of obtaining a balanced and representative sample by targeting the low responding areas that usually would not receive any special treatment. It can be implemented at any stage of the data collection process and with any sample design, making it quite flexible although not necessarily optimal for specific sample designs and estimators. It is a “safe” approach for a multi-purpose survey, presumably designed to obtain reliable estimates for a variety of items. Moreover, selecting a systematic subsample from a list sorted by a unit measure of size avoids incidence of additional nonresponse bias incurred by focusing NRFU efforts on high response propensity cases (Tourangeau et al., 2016; Beaumont et al., 2014). We acknowledge that the increased variability in design weights and reduction in response rates are less than desirable effects caused by subsampling. However, these effects can be lessened via the choice of estimator, as demonstrated by our improved results with a ratio estimator. More sophisticated calibration estimators or other collapsed estimators could likewise be considered at the estimation stage.

Without probability subsampling, the contention that the realized respondent set of small businesses remains a probability sample is debatable. Several discussions of the summary report of the AAPOR Task Force on non-probability sampling (Baker, Brick, Bates, Battaglia, Couper, Dever, Gile and Tourangeau, 2013) specifically question whether “a probability sample with less than full coverage and high nonresponse should still be considered a probability sample”. That question is certainly relevant in our studied context, where sampled smaller units truly “opt in” to respond. Selecting a probability subsample of nonrespondents and instructing survey analysts to limit NRFU contact to these cases may limit this phenomenon. In addition, with a probability subsample, one can use accepted quality measures such as sampling error or response rates for evaluation.

All of the results presented for our case study assume that the existing NRFU contact strategies are used with the subsampled designs. However, subsampling nonrespondents without changing the data collection procedure may have minimal tangible benefits besides cost reduction. The reverse is also true: for example, Kirgis and Lepkowski (2013) present improved response data results for targeted small domains obtained with probability samples and revised contact strategies.

Tourangeau et al. (2016) note that “it is not always clear how to intervene to obtain cases, particularly cases with low underlying propensities, to respond”. This is especially relevant in the business survey context. Business surveys can draw on a wealth of cognitive research on data collection strategies for large companies: see Paxson, Dillman and Tarnai, 1995; Tuttle, Morrison and Willimack, 2010; Willimack and Nichols, 2010; Snijkers, Haraldsen, Jones and Willimack, 2013. In contrast, the smaller businesses receive very little personal contact (if any) and there is limited cognitive research on preferable contact strategies to draw upon. That said, the literature suggests that there are differences in collected data quality between large and small businesses: see Thompson and Washington (2013), Willimack and Nichols (2010), Bavdaž (2010), Torres van Grinsven, Bolko and Bavdaž (2014), and Thompson, Oliver and Beck (2015). Additional cognitive research for small establishments combined with field tests could yield better contact strategies. Subsampling nonrespondents paired with a new contact strategy for these “hard to reach” establishments would create a truly adaptive approach for all units, not just the larger ones. To this point, in response to these presented analyses, the Census Bureau conducted an embedded field experiment to test alternative NRFU strategies for selected small units in the 2014 ASM (Thompson and Kaputa, 2017). The outcome of that study was a new NRFU protocol implemented in the 2015 ASM and a second embedded field experiment that paired our proposed nonrespondent subsampling design with the most effective follow-up procedures determined from the 2014 test (Kaputa, Thompson and Beck, 2017).

Acknowledgements

Any views expressed are those of the author(s) and not necessarily those of the U.S. Census Bureau. The authors thank Eric Fink, Xijian Liu, Jared Martin, Edward Watkins III, Hannah Thaw, the Associate Editor, and two referees for their review of an earlier version of the manuscript, David Haziza for his thoughtful discussion of the paper, and Barry Schouten for his useful suggestions on the optimization problems.

Appendix

Our objective is to estimate $Y,$ population total of characteristic $y,$ from the realized sample of respondents. Let

S_{h i} =

1 if unit

i

in domain

h

was in original sample; 0 otherwise.

θ_{h i} =

the probability of sampling unit

i

in domain

h

into the original sample

(w_{h i} = 1 / θ_{h i}) .

R_{h i} =

1 if unit

i

in domain

h

provided a response before subsampling time

t

(value for

y);

0 otherwise.

I_{h i} =

1 if unit

i

in domain

h

was selected for NRFU (i.e., was a subsampled nonrespondent); 0 otherwise.

J_{h i} =

1 if unit

i

in domain

h

responds, given selected into nonrespondent subsample; 0 otherwise.

f_{h i} =

adjustment factor for nonrespondent subsampling and unit nonresponse after NRFU.

y_{h i} =

value of characteristic

y

for unit

i

in domain

h,

available only for respondents.

x_{h i} =

value of characteristic

x

for unit

i

in domain

h,

available for all sampled units considered for nonrespondent subsampling (i.e., the nonrespondent subsampling frame). Then

\hat{Y} = \sum_{h} \sum_{i} w_{h i} y_{h i} S_{h i} R_{h i} + \sum_{h} \sum_{i} w_{h i} f_{h i} y_{h i} S_{h i} (1 - R_{h i}) I_{h i} J_{h i} = {\hat{Y}}_{R 1} + {\hat{Y}}_{R 2} .

We consider three different adjustment-to-sample reweighting estimators of ${\hat{Y}}_{R 2} :$

Double Expansion (DE): ${\hat{Y}}_{R 2}^{DE} = \sum_{h} \sum_{i \in h} w_{h i} K_{h} (\frac{m_{1 h}}{r_{2 h}}) y_{h i} S_{h i} (1 - R_{h i}) I_{h i} J_{h i}$

Separate Ratio (SR): ${\hat{Y}}_{R 2}^{SR} = \sum_{h} \sum_{i \in h} w_{h i} K_{h} (\frac{\sum_{i \in m_{1 h}} x_{h i}}{\sum_{i \in r_{2 h}} x_{h i}}) y_{h i} S_{h i} (1 - R_{h i}) I_{h i} J_{h i}$

Combined Ratio (CR): ${\hat{Y}}_{R 2}^{CR} = \sum_{h} \sum_{i \in h} w_{h i} K_{h} (\frac{m_{1 h}}{r_{2 h}}) (\frac{\sum_{i \in m_{1 h}} w_{h i} K_{h} x_{h i}}{\sum_{i \in r_{2 h}} w_{h i} K_{h} (\frac{m_{1 h}}{r_{2 h}}) x_{h i}}) y_{h i} S_{h i} (1 - R_{h i}) I_{h i} J_{h i} .$

Note that the DE and CR estimators are variations of the recommended reweighting procedure described in Brick (2013) and are discussed in Binder et al. (2000) among others. The DE estimator is the InfoS estimator presented in Särndal and Lundström (2005), studied in Shao and Thompson (2009), among others; the SR estimator is a variation of the InfoP estimator presented in Särndal and Lundström (2005), treating the realized sample as the “population”. Sampling weights were not included in the SR so that the adjustment reduces to the DE adjustment when $x_{h i} \equiv 1 \forall i \in h;$ note that this unweighted response rate adjustment is recommended in Little and Vartivarian (2005). The CR estimator is presented in Binder et al. (2000), and is also studied in Shao and Thompson (2009). In our case study, a better choice might have been the quasi-randomization estimator from Oh and Scheuren (1983), which incorporates sampling weights in the adjustment factor, thus reducing their variability.

Collapsed estimators are used in three scenarios: (1) All units in the domain receive NRFU (no subsampling); (2) No units in the domain receive NRFU because response rate targets have been achieved (no subsampling); and (3) A single subsampled unit responded to NRFU (subsampling). The collapsed estimators analogues are given as follows:

Collapsed DE: ${\hat{Y}}_{h}^{DE, C} = \sum_{i \in h} w_{h i} (\frac{n_{h}}{r_{1 h} + r_{2 h}}) y_{h i} S_{h i} R_{h i}$

Collapsed SR: ${\hat{Y}}_{h}^{S R, C} = \sum_{i \in h} w_{h i} (\frac{\sum_{i \in n_{h}} x_{h i}}{\sum_{i \in r_{1 h} + r_{2 h}} x_{h i}}) y_{h i} S_{h i} (1 - R_{h i}) I_{h i} J_{h i}$

Collapsed CR: ${\hat{Y}}_{h}^{CR, C} = \sum_{i \in h} w_{h i} (\frac{n_{h}}{r_{1 h} + r_{2 h}}) (\frac{\sum_{i \in n_{h}} w_{h i} x_{h i}}{\sum_{i \in r_{1 h} + r_{2 h}} w_{h i} (\frac{n_{h}}{r_{1 h} + r_{2 h}}) x_{h i}}) y_{h i} S_{h i} R_{h i} .$

References

Baker, R., Brick, J.M., Bates, N., Battaglia, M., Couper, M., Dever, J., Gile, K. and Tourangeau, R. (2013). Summary report of the AAPOR task force on non-probability sampling – Report and rejoinder. Journal of Survey Statistics and Methodology, 1, 90-137.

Bavdaž, M. (2010). The multidimensional integral business survey response model. Survey Methodology, 36, 1, 81- 93. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2010001/article/11245-eng.pdf.

Beaumont, J.-F., Bocci, C. and Haziza, D. (2014). An adaptive data collection procedure for call prioritization. Journal of Official Statistics, 30(4), 607-621.

Bechtel, L., and Thompson, K.J. (2013). Optimizing unit nonresponse adjustment procedures after subsampling nonrespondents in the Economic Census. Proceedings of the Federal Committee on Statistical Methods Research Conference, https://nces.ed.gov/FCSM/index.asp.

Biemer, P. (2010). Total survey error: Design, implementation, and evaluation. The Public Opinion Quarterly, 74(5), 817-848.

Binder, D., Babyak, C., Brodeur, M., Hidiroglou, M. and Wisner, J. (2000). Variance estimation for two-phase stratified sampling. The Canadian Journal of Statistics, 28, 751-764.

Brick, J.M. (2013). Unit nonresponse and weighting adjustments: A critical review. Journal of Official Statistics, 29, 329-353.

Federal Register Notice (2006). OMB Standards and Guidelines for Statistical Surveys, Washington, DC.

Fink, E., and Lineback, J.F. (2013). Using paradata to understand business survey reporting patterns. Proceedings of the Federal Committee on Statistical Methods Research Conference, https://nces.ed.gov/FCSM/index.asp.

Groves, R., and Herringa, S. (2006). Responsive design for household surveys: Tools for actively controlling survey errors and costs. Journal of the Royal Statistical Society Series A, 169(3), 439-57.

Hansen, M.H., and Hurwitz, W.N. (1946). The problem of non-response in sample surveys. Journal of the American Statistical Association, 41, 517-529.

Harter, R.M., Mach, T.L., Chaplin, J.F. and Wolken, J.D. (2007). Determining subsampling rates for nonrespondents. Proceedings of the Third International Conference on Establishment Surveys, American Statistical Association.

Haziza, D., Thompson, K.J. and Yung, W. (2010). The effect of nonresponse adjustments on variance estimation. Survey Methodology, 36, 1, 35-43. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2010001/article/11246-eng.pdf.

Kalton, G., and Flores-Cervantes, I. (2003). Weighting methods. Journal of Official Statistics, 19, 2, 81-97.

Kaputa, S., Thompson, K.J. and Beck, J. (2017). An embedded experiment for targeted nonresponse follow-up in establishment surveys. Proceedings of the Section on Survey Research Methods, American Statistical Association.

Kirgis, N., and Lepkowski, J. (2013). Design and management strategies for paradata-driven responsive design: Illustrations for the 2006-2010 National Survey of Family Growth. Improving Surveys with Paradata, (Ed., Frauke Kreuter). Hoboken, NJ: John Wiley & Sons, Inc.

Kish, L. (1992). Weighting for unequal P_i. Journal of Official Statistics, 8(2), 183-200.

Kott, P. (1994). A note on handling nonresponse in sample surveys. Journal of the American Statistical Association, 89, 693-696.

Little, R.J., and Vartivarian, S. (2005). Does weighting for nonresponse increase the variance of survey means? Survey Methodology, 31, 2, 161-168. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2005002/article/9046-eng.pdf.

Lohr, S.L. (2010). Sampling: Design and Analysis. Boston: Brooks/Cole.

Oh, H.L., and Scheuren, F.J. (1983). Weighting adjustment of unit nonresponse. Incomplete Data in Sample Surveys. New York: Academic Press, 20, 143-184.

Olson, K., and Groves, R.M. (2012). An examination of within-person variation in response propensity over the data collection field period. Journal of Official Statistics, 28, 29-51.

Paxson, M.C., Dillman, D.A. and Tarnai, J. (1995). Improving response to business mail surveys. In Business Survey Methods, (Eds., B.G. Cox, D. Binder, B. Nanajamma Chinnappa, M. Colledge and P. Kott). New York: John Wiley & Sons, Inc.

Särndal, C.-E. (2011). The 2010 Morris Hansen lecture: Dealing with survey nonresponse in data collection, in estimation. Journal of Official Statistics, 27, 1-21.

Särndal, C., and Lundquist, P. (2014). Accuracy in estimation with nonresponse: A function of degree of imbalance and degree of explanation. Journal of Survey Statistics and Methodology, 2(4), 361-387.

Särndal, C.-E., and Lundström, S. (2005). Estimation in Surveys with Nonresponse. Hoboken, NJ: John Wiley & Sons, Inc.

Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. New York: Springer Verlag.

Schouten, B., Calinescu, M. and Luiten, A. (2013). Optimizing quality of response through adaptive survey designs. Survey Methodology, 39, 1, 29-58. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2013001/article/11824-eng.pdf.

Schouten, B., Cobben, F. and Bethlehem, J. (2009). Indicators for the representativeness of survey response. Survey Methodology, 35, 1, 101-113. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2009001/article/10887-eng.pdf.

Shao, J., and Thompson, K.J. (2009). Variance estimation in the presence of nonrespondents and certainty strata. Survey Methodology, 35, 2, 215-225. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2009002/article/11043-eng.pdf.

Snijkers, G., Haraldsen, G., Jones, J. and Willimack, D.K. (2013). Designing and Conducting Business Surveys. Hoboken, NJ: John Wiley & Sons, Inc.

Thompson, K.J., and Kaputa, S. (2017). Investigating adaptive nonresponse follow-up strategies for small businesses through embedded experiments. Journal of Official Statistics, 33(3), 1-23.

Thompson, K.J., and Oliver, B. (2012). Response rates in business surveys: Going beyond the usual performance measure. Journal of Official Statistics, 27, 221-237.

Thompson, K.J., Oliver, B. and Beck, J. (2015). An analysis of the mixed collection modes for two business surveys conducted by the US Census Bureau. Public Opinion Quarterly, 79(3), 769-789.

Thompson, K.J., and Washington, K.T. (2013). Challenges in the treatment of unit nonresponse for selected business surveys: A case study. Survey Methods: Insights from the Field. Retrieved from http://surveyinsights.org/?p=2991.

Torres van Grinsven, V., Bolko, I. and Bavdaž, M. (2014). In search of motivation for the business survey response task. Journal of Official Statistics, 30(4), 579-606.

Tourangeau, R., Brick, J.M., Lohr, S. and Li, J. (2016). Adaptive and responsive survey designs: A review and assessment. Journal of the Royal Statistical Society A, 180, 203-223.

Tuttle, A., Morrison, R. and Willimack, D. (2010). From start to pilot: A multi-method approach to the comprehensive redesign of an economic survey questionnaire. Journal of Official Statistics, 26, 87-103.

Wagner, J. (2012). Research synthesis: A comparison of alternative indicators for the risk of nonresponse bias. Public Opinion Quarterly, 76(3), 555-575.

Willimack, D., and Nichols, E. (2010). A hybrid response process model for business surveys. Journal of Official Statistics, 26, 3-24.

Zhang, L.C. (2008). On some common practices of systematic sampling. Journal of Official Statistics, 24, 557-569.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2018-06-21

Language selection

Search and menus

Search

Strategies for subsampling nonrespondents for economic programs
Section 4. Conclusion

Acknowledgements

Appendix

References

Strategies for subsampling nonrespondents for economic programs Section 4. Conclusion

Acknowledgements

Appendix

References

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Strategies for subsampling nonrespondents for economic programs
Section 4. Conclusion