A note on multiply robust predictive mean matching imputation with complex survey data
Section 1. Introduction
Predictive mean matching (PMM), a procedure closely related to nearest-neighbour imputation (NNI, Chen and Shao, 2000; Beaumont and Bocci, 2009; Yang and Kim, 2019), is a popular imputation procedure in practice (Little, 1988; Yang and Kim, 2020). In NNI, a missing value to a survey variable is replaced by the -value of the closest respondent with respect to a vector of fully observed variables However, with NNI, the resulting imputed estimator may suffer from a non-negligible bias when the dimension of is large (Yang and Kim, 2019), a problem often referred to as the curse of dimensionality. In contrast, PMM starts with fitting a parametric model (e.g., a linear regression model) based on the responding units with as the response variable and as the set of explanatory variables. This leads to a set of predicted values or scores, for all the sample units (respondents and nonrespondents). A missing value to the survey variable is then replaced by the -value of the closest respondent with respect to The latter may be viewed as a scalar summary of the information contained in the vector . Therefore, unlike NNI, PMM is not sensitive to the dimension of which is a desirable feature.
Both NNI and PMM belong to the class of nonparametric procedures. Therefore, both procedures are less vulnerable to model misspecification unlike parametric methods (e.g., linear regression imputation). Also, both NNI and PMM belong to the class of donor imputation procedures; that is, they produce eligible imputed values as they use actual observed values “borrowed” from the respondents.
In the first step of PMM, the information contained in the vector is compressed into a single score through the use of a parametric model (e.g., a linear regression model). If the specified model provides an accurate description of the relationship linking and we expect PMM to perform well in terms of bias. On the other hand, if the specified model is grossly misspecified, PMM may yield biased estimators.
Multiply robust approaches with multiple outcome regression and nonresponse models have been shown to improve the robustness against model misspecification, see Han and Wang (2013), Han (2014), and Chen and Haziza (2019a) among others. In this note, we propose a novel PMM procedure that allows for multiple models, each which may be based on a different functional and/or a different set of explanatory variables. Postulating multiple models may prove useful in a number of situations; e.g., see Chen and Haziza (2017) and Chen and Haziza (2019b) for a discussion. The specified models may be parametric or nonparametric. The rationale behind the proposed method is to fit each of these specified models based on the responding units, which leads to multiple set of predicted values (scores) for all the sample units. After describing the theoretical setup in Section 2, we show how to combine these scores to construct the imputed values in Section 3. The proposed PMM procedure is multiply robust in the sense that the resulting estimator is consistent if all but one model are misspecified. Because the true model linking and is unknown, the proposed approach is attractive because it provides some protection against model misspecification. Also, unlike the multiply robust imputation procedure considered in Chen and Haziza (2017), the proposed method belongs to the class of donor imputation procedures. In Section 4, we conduct a simulation study to assess the performance of the proposed method in terms of bias and efficiency.
- Date modified: