A note on multiply robust predictive mean matching imputation with complex survey data
Section 1. Introduction

Table of contents

Predictive mean matching (PMM), a procedure closely related to nearest-neighbour imputation (NNI, Chen and Shao, 2000; Beaumont and Bocci, 2009; Yang and Kim, 2019), is a popular imputation procedure in practice (Little, 1988; Yang and Kim, 2020). In NNI, a missing value to a survey variable $y$ is replaced by the $y$ -value of the closest respondent with respect to a vector of fully observed variables $x .$ However, with NNI, the resulting imputed estimator may suffer from a non-negligible bias when the dimension of $x$ is large (Yang and Kim, 2019), a problem often referred to as the curse of dimensionality. In contrast, PMM starts with fitting a parametric model (e.g., a linear regression model) based on the responding units with $y$ as the response variable and $x$ as the set of explanatory variables. This leads to a set of predicted values or scores, $\hat{m},$ for all the sample units (respondents and nonrespondents). A missing value to the survey variable $y$ is then replaced by the $y$ -value of the closest respondent with respect to $\hat{m} .$ The latter may be viewed as a scalar summary of the information contained in the vector $x$ . Therefore, unlike NNI, PMM is not sensitive to the dimension of $x,$ which is a desirable feature.

Both NNI and PMM belong to the class of nonparametric procedures. Therefore, both procedures are less vulnerable to model misspecification unlike parametric methods (e.g., linear regression imputation). Also, both NNI and PMM belong to the class of donor imputation procedures; that is, they produce eligible imputed values as they use actual observed values “borrowed” from the respondents.

In the first step of PMM, the information contained in the vector $x$ is compressed into a single score $\hat{m}$ through the use of a parametric model (e.g., a linear regression model). If the specified model provides an accurate description of the relationship linking $y$ and $x,$ we expect PMM to perform well in terms of bias. On the other hand, if the specified model is grossly misspecified, PMM may yield biased estimators.

Multiply robust approaches with multiple outcome regression and nonresponse models have been shown to improve the robustness against model misspecification, see Han and Wang (2013), Han (2014), and Chen and Haziza (2019a) among others. In this note, we propose a novel PMM procedure that allows for multiple models, each which may be based on a different functional and/or a different set of explanatory variables. Postulating multiple models may prove useful in a number of situations; e.g., see Chen and Haziza (2017) and Chen and Haziza (2019b) for a discussion. The specified models may be parametric or nonparametric. The rationale behind the proposed method is to fit each of these specified models based on the responding units, which leads to multiple set of predicted values (scores) for all the sample units. After describing the theoretical setup in Section 2, we show how to combine these scores to construct the imputed values in Section 3. The proposed PMM procedure is multiply robust in the sense that the resulting estimator is consistent if all but one model are misspecified. Because the true model linking $y$ and $x$ is unknown, the proposed approach is attractive because it provides some protection against model misspecification. Also, unlike the multiply robust imputation procedure considered in Chen and Haziza (2017), the proposed method belongs to the class of donor imputation procedures. In Section 4, we conduct a simulation study to assess the performance of the proposed method in terms of bias and efficiency.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2021-06-24

Language selection

Search and menus

Search

A note on multiply robust predictive mean matching imputation with complex survey data
Section 1. Introduction

A note on multiply robust predictive mean matching imputation with complex survey data Section 1. Introduction

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

A note on multiply robust predictive mean matching imputation with complex survey data
Section 1. Introduction