A note on multiply robust predictive mean matching imputation with complex survey data
Section 2. Basic setup
Consider a finite population assumed to have been generated from the following superpopulation model:
where is an unknown functional, is a vector of fully observed variables attached to unit and the are mutually independent random variables such that and For simplicity, we assume that the variance structure is homoscedastic but our method can be easily extended to the case of unequal variances.
The interest lies in estimating the population mean, Given the finite population, a probability sample of size is selected according to a sampling design with first-order inclusion probabilities and second-order inclusion probabilities The sampling weight attached to unit is denoted by
Let be response indicator attached to unit such that if is observed, and if is missing. Let denote the set of respondents to the survey variable We assume that the data are Missing At Random (MAR):
The customary PMM procedure can be described as follows. We first postulate a parametric outcome regression model where is a vector of unknown parameters (Yang and Kim, 2020). For we compute the score where is a suitable estimator of based on the responding units. Then, the imputed value for the missing is where is the index of the nearest-neighbour of unit which satisfies for any where denotes a distance function; e.g., the Euclidean distance. In order for PMM to be robust against misspecification, the specified parametric model must satisfy the Lipschitz continuity condition (Yang and Kim, 2020). This condition may not be satisfied for some commonly used models and functional forms, including quadratic models; see Yang and Kim (2020) for a discussion.
- Date modified: