Estimation of response propensities and indicators of representative response using population-level information
Section 1. Introduction

Table of contents

Nonresponse bias in surveys is of increasing concern with declining response rates and tighter budgets. National Statistics Institutes (NSIs) charged with conducting national surveys to convey the state of their country’s economic, social and demographic characteristics are facing increasing challenges in maintaining the quality of their survey response. In this paper, we focus on one particular survey conducted since 1998 by Statistics Netherlands, The Dutch Health Survey, which up until 2010 was a face-to-face survey. In 2010, online data collection was added as a sequential mode before the face-to-face interviews. The response rates have gradually declined from values close to 70% to values around 60%. Other NSIs and survey organizations have reported declining response rates, particularly when moving to mixed modes of data collection in order to reduce budgets, with respondents pushed towards cheaper modes. However, response rates alone are not enough to judge the quality of the survey response, as nonresponse bias results from the contrast between those responding and not responding to the surveys. Nonresponse bias in the Dutch Health Survey is conjectured to arise from persons with weaker health, certain habits like smoking or fewer dentist visits, and poorer living conditions. Important predictors are age, marital status, income and ethnicity.

A number of indirect measures of nonresponse bias have been developed recently to supplement the traditional response rate. Wagner (2012) provides a taxonomy of such measures: indicators that include only observed auxiliary variables and indicators that also include observed survey variables which may or may not account for nonresponse weighting. The most prominent indicators that only use observed auxiliary variables are R-indicators (Schouten, Cobben and Bethlehem, 2009; Schouten, Shlomo and Skinner, 2011) and balance indicators (Särndal, 2011; Lundquist and Särndal, 2013).

The development of these measures comes at a time when there is an increased interest in adapting data collection (Schouten, Calinescu and Luiten, 2013; Wagner, 2013; Wagner and Hubbard, 2014; Beaumont, Bocci and Haziza, 2014) so that the level of effort targeted at different subgroups as defined by auxiliary variables may be varied over time, possibly through a change of strategy, according to patterns of response (Schouten, Bethlehem, Beulens, Kleven, Loosveldt, Rutar, Shlomo and Skinner, 2012; Särndal and Lundquist, 2014). Both R-indicators and balance indicators must be viewed in conjunction with the auxiliary data that is employed. Different auxiliary variables may lead to different values of the indicators.

In addition, Schouten, Cobben, Lundquist and Wagner (2016) present empirical evidence that it is beneficial for samples to be more balanced with respect to auxiliary variables, even when these variables are used in nonresponse adjustment afterwards. Based on 14 survey data sets they show that, on average, a design with a more representative response has smaller nonresponse biases, even after adjustments on the characteristics for which representativeness was evaluated. Särndal and Lundquist (2014) also found gains in balancing the respondents set, over and beyond those obtained by calibrating the sample. Further, it is worth noting that a more balanced sample leads to less variability in adjustment weights, which is a desirable property as large variation in adjustment weights may inflate standard errors of estimates. Of course, nonresponse adjustment weighting will still be necessary as there will always be some imbalance remaining in the final response dataset.

The auxiliary data used for the response indicator measures may stem from sampling frame data, administrative data and data about the data collection process, called paradata (Kreuter, 2013). Balance indicators and R-indicators are very similar and are often proportional in size. In this paper, we focus on R-indicators. However, much of the discussion and results can easily be translated to balance indicators.

R-indicators presume the availability of auxiliary variables obtained by linking data from, for example, sample frames or registers, to the survey sample. This presumption of linked survey samples may be infeasible in many settings and hampers application. While national statistical institutes often have access to government registrations, university and market researchers usually do not. For indicators to become useful for these researchers, they must be based on different forms of auxiliary information. The only form of auxiliary information that is generally accessible are the sets of statistics produced by the national statistical institutes. These institutes disseminate tables on a wide range of population statistics. This paper develops R-indicators that are based solely on such population statistics and that can be computed without any knowledge about the non-respondents. As an example, market research companies compare the response distributions of a fixed, prescribed set of auxiliary variables to national statistics, termed the gold standard. The R-indicator estimators proposed here allow for monitoring and evaluating gold standard variables during and after data collection.

Although the R-indicators based on population auxiliary information are motivated in this paper from survey data collection practice, they can be applied to any setting with missing data on variables of interest and (almost) complete auxiliary data. They can for instance, be used to monitor and evaluate the completion of administrative data, which is useful if the data is streamed and gradually accumulated over time. In this case, population based R-indicators would provide an assessment of the representativeness of the streamed administration data. Another useful application for such indicators is to assess the representativeness of linked records. Van der Laan and Bakker (2015) proposed a Linkage Representativeness Indicator (LR-indicator) which examines the similarity of linked records to the target population under investigation.

R-indicators and their statistical properties, as discussed in Shlomo, Skinner and Schouten (2012), relate to the case where we have linked sample level auxiliary information for non-respondents. To develop R-indicators based on population statistics, we propose a new method for estimating response propensities that does not require auxiliary information for non-respondents to the survey. They will be called population-based response propensities. To our knowledge, there is no record in the literature about models for response propensities that employ population information only. In this respect, the current paper is innovative and may be valuable and relevant to other statistical areas as well. In this paper, we concentrate on the use of population-based response propensities in the computation of R-indicators.

With respect to adapting data collection, it is clear that settings where population-based R-indicators are needed are harder for the implementation of these types of adaptive designs as we do not know the values of the covariates for nonrespondents. However, using these types of R-indicators based on population-based auxiliary information, we can make design features more salient to those that are lagging behind in terms of response. So, for example, if young people have lower response rates, we can send a general reminder with more focus on young persons or alternatively instruct interviewers to monitor more carefully those addresses where they expect younger persons.

The auxiliary information for population-based response propensities is obtained from population tables and population counts. In order to do so, we first propose estimating response propensities based on population values, by replacing sample covariance matrices and sample means by known population covariances and population means. Next, using population-based response propensities, we compute estimates for the R-indicator. We call the resulting indicator a population-based R-indicator, and we call the traditional R-indicator a sample-based R-indicator. We focus on three research questions:

How to extend sample-based response propensities and R-indicators to population-based response propensities and R-indicators?
What are the statistical properties of population-based R-indicators?
Are the population-based R-indicators practicable in real survey settings?

In Section 2, we propose a new method for estimating population-based response propensities. In Section 3, we briefly review the definitions and methodology behind R-indicators and then consider their estimation in the population-based setting. In Section 4, we present an evaluation study that is based on drawing samples from real Census data under realistic assumptions about nonresponse in social surveys and evaluate the properties of the population-based R-indicators. In Section 5, we demonstrate the proposed R-indicators on an application from the Dutch Health Survey of the Netherlands. In Section 6, we end with a discussion and present some caveats related to the proposed indicators and future work.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2019-07-04

Language selection

Search and menus

Search

Estimation of response propensities and indicators of representative response using population-level information
Section 1. Introduction

Estimation of response propensities and indicators of representative response using population-level information Section 1. Introduction

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Estimation of response propensities and indicators of representative response using population-level information
Section 1. Introduction