A note on multiply robust predictive mean matching imputation with complex survey data
Section 4. Simulation study
To assess the performance
of the proposed method in terms of bias and efficiency, we conducted a limited
simulation study. We generated
2,000 finite
populations, each of size
20,000. First,
the explanatory variables
-
were generated
from a multivariate standard normal distribution. Then, given
-
we generated
the survey variable
according to
the following outcome regression models:
(M1).
where
(M2).
where
Note that both (M1) and
(M2) are linear models based on the explanatory variables
-
except that
(M2) includes quadratic terms and an interaction term.
From each finite
population, a probability sample
was selected
according to probability proportional-to-size (PPS) systematic sampling based
on the size variable
where
The first-order
inclusion probabilities are given by
with
200, 500
and 1,000.
In each sample, the
response indicators
were generated
from a Bernoulli distribution with probability
where
We used two sets of values for
and
These led to response rates approximately
equal to 70%, and 50%,
respectively.
We computed the following
estimators of
(Naive).
The weighted mean of the respondents,
(Reg).
The imputed estimator based on deterministic linear
regression imputation, assuming the model
The imputed estimator based on
PMM, where the score
was obtained by fitting the model
(New1).
The imputed estimator based on the proposed multiply robust
PMM
procedure using both models (M1) and (M2).
(New2).
The imputed estimator based on the proposed multiply robust
PMM
procedure using models (M1), (M2), and two additional models (M3) and (M4),
where (M3) uses
only as the predictor and (M4) uses
only as the predictor.
We computed the Monte Carlo relative bias (MCRB), the Monte Carlo relative
standard error (MCRSE) and the Monte Carlo relative root mean squared error
(MCRMSE), defined respectively as
and
where
denotes the population mean in the
population,
denotes the estimator
in the
sample,
2,000, and
The results are presented
in Tables 4.1 and 4.2. The naive estimator exhibited a significant bias in
all the scenarios, as expected. When the true model was given by (M1), we note
from Table 4.1 that linear regression imputation performed very well in
terms of bias, as expected. Both PMM and the proposed method showed negligible
bias for
1,000 and a
slight bias for
500 and
200. For
instance, for
200 and a
response rate of 70%, the value of RB was equal to 2.4% for PMM, New1 and New2.
In terms of efficiency, linear regression imputation slightly outperformed both
PMM and the proposed methods, as expected. For instance for
1,000 and a
response rate of 70%, the value of RMSE was equal to 7.5% for linear regression
imputation and equal to 8.0% for both PMM, New1 and New2. It is worth pointing
out that both PMM and the proposed methods exhibited almost identical
performances in all the scenarios presented in Table 4.1. Therefore,
incorporating two additional models did not seem to affect the efficiency of
the resulting estimator (New2).
When the true model was
given by (M2), we note from Table 4.2 that both linear regression
imputation and PMM led to significant biases in all the scenarios, as expected.
Being a parametric imputation procedure, linear regression imputation is
vulnerable to model misspecification. On the other hand, PMM showed smaller
biases than linear regression imputation, suggesting some robustness against
model misspecification. For instance, for
1,000 and a
response rate of 70%, the value of RB was equal to -9.2% for linear regression
imputation and -3.7% for PMM. The proposed methods outperformed both linear
regression imputation and PMM in terms of bias, standard error and mean square
error in all the scenarios. Finally, both New1 and New2 exhibited almost
identical performances.
Table 4.1
Monte Carlo relative bias (MCRB), relative standard error (MCRSE), and relative root mean squared error (MCRMSE) when the true model is (M1)
Table summary
This table displays the results of Monte Carlo relative bias (MCRB) Method (appearing as column headers).
|
Method |
| Response rate |
Sample Size |
Measure |
Naive |
Reg |
PMM1 |
New1 |
New2 |
| 70% |
1,000 |
MCRB |
64.7 |
-0.1 |
0.4 |
0.4 |
0.4 |
| MCRSE |
7.5 |
7.5 |
8.0 |
8.0 |
8.0 |
| MCRMSE |
65.1 |
7.5 |
8.0 |
8.0 |
8.0 |
| 70% |
500 |
MCRB |
65.3 |
0.5 |
1.4 |
1.4 |
1.4 |
| MCRSE |
10.7 |
10.4 |
11.2 |
11.2 |
11.2 |
| MCRMSE |
66.1 |
10.4 |
11.3 |
11.3 |
11.3 |
| 70% |
200 |
MCRB |
64.6 |
0.3 |
2.4 |
2.4 |
2.4 |
| MCRSE |
16.5 |
16.7 |
17.5 |
17.5 |
17.6 |
| MCRMSE |
66.7 |
16.7 |
17.7 |
17.7 |
17.7 |
| 50% |
1,000 |
MCRB |
99.3 |
0.0 |
0.7 |
0.7 |
0.6 |
| MCRSE |
8.8 |
8.1 |
9.0 |
9.0 |
9.0 |
| MCRMSE |
99.7 |
8.1 |
9.1 |
9.1 |
9.1 |
| 50% |
500 |
MCRB |
98.9 |
-0.1 |
1.3 |
1.3 |
1.3 |
| MCRSE |
12.1 |
11.2 |
12.5 |
12.5 |
12.5 |
| MCRMSE |
99.6 |
11.2 |
12.6 |
12.6 |
12.6 |
| 50% |
200 |
MCRB |
99.8 |
0.8 |
4.3 |
4.3 |
4.4 |
| MCRSE |
19.3 |
17.7 |
19.6 |
19.6 |
19.6 |
| MCRMSE |
101.6 |
17.7 |
20.1 |
20.1 |
20.0 |
Table 4.2
Monte Carlo relative bias (MCRB), relative standard error (MCRSE), and relative root mean squared error (MCRMSE) when the true model is (M2)
Table summary
This table displays the results of Monte Carlo relative bias (MCRB) Method (appearing as column headers).
|
Method |
| Response rate |
Sample Size |
Measure
|
Naive |
Reg |
PMM1 |
New1 |
New2 |
| 70% |
1,000 |
MCRB |
7.5 |
-9.2 |
-3.7 |
0.1 |
0.1 |
| MCRSE |
3.5 |
3.5 |
3.9 |
3.1 |
3.1 |
| MCRMSE |
8.2 |
9.9 |
5.4 |
3.1 |
3.1 |
| 70% |
500 |
MCRB |
7.5 |
-9.4 |
-4.0 |
0.2 |
0.2 |
| MCRSE |
5.0 |
5.1 |
5.6 |
4.5 |
4.5 |
| MCRMSE |
9.0 |
10.7 |
6.9 |
4.5 |
4.5 |
| 70% |
200 |
MCRB |
7.6 |
-9.2 |
-4.0 |
0.1 |
0.1 |
| MCRSE |
7.8 |
7.9 |
8.5 |
6.8 |
6.8 |
| MCRMSE |
10.9 |
12.1 |
9.4 |
6.8 |
6.8 |
| 50% |
1,000 |
MCRB |
16.6 |
-11.3 |
-3.1 |
0.3 |
0.3 |
| MCRSE |
4.0 |
4.5 |
5.0 |
3.3 |
3.3 |
| MCRMSE |
17.1 |
12.2 |
5.9 |
3.3 |
3.3 |
| 50% |
500 |
MCRB |
16.5 |
-11.5 |
-3.5 |
0.3 |
0.3 |
| MCRSE |
5.7 |
6.3 |
7.0 |
4.8 |
4.7 |
| MCRMSE |
17.5 |
13.2 |
7.8 |
4.8 |
4.8 |
| 50% |
200 |
MCRB |
16.5 |
-12.0 |
-3.9 |
-0.1 |
-0.1 |
| MCRSE |
9.1 |
9.9 |
11.0 |
7.4 |
7.4 |
| MCRMSE |
18.8 |
15.6 |
11.7 |
7.4 |
7.4 |
Acknowledgements
S. Chen was
supported by the National Institute on Minority Health and Health Disparities
(NIMHD) at National Institutes of Health (NIH) (1R21MD014658-01A1) and the
Oklahoma Shared Clinical and Translational Resources (U54GM104938) with an
Institutional Development Award (IDeA) from National Institute of General Medical
Sciences. The content is solely the responsibility of the authors and does not
necessarily represent the official views of the National Institutes of Health.
The work of D. Haziza was supported by grants from the Natural Sciences
and Engineering Research Council of Canada.
References
Beaumont, J.-F., and
Bocci, C. (2009). Variance estimation when donor imputation is used to fill in
missing values. Canadian Journal of Statistics, 37,
400-416.
Chen, S., and Haziza, D.
(2017). Multiply robust imputation procedures for the treatment of item
nonresponse in surveys. Biometrika, 102, 439-453.
Chen, S., and Haziza, D.
(2019a). Multiply robust nonparametric multiple imputation for the treatment of
missing data. Statistica Sinica, 29, 2035-2053.
Chen, S., and Haziza, D.
(2019b). Recent developments in dealing with item nonresponse in surveys: A
critical review. International Statistical Review, 87,
S192-S218.
Chen, J., and Shao, J.
(2000). Nearest-neighbour imputation for survey data. Journal of Official
Statistics, 16, 583-599.
Han, P. (2014). Multiply
robust estimation in regression analysis with missing data. Journal of the
American Statistical Association, 109, 1159-1173.
Han, P., and Wang, L.
(2013). Estimation with missing data: Beyond double robustness. Biometrika, 100, 417-430.
Little, R.J.A. (1988).
Missing-data adjustments in large surveys. Journal of Business and Economic
Statistics, 6, 287-296.
Rust, K.F., and Rao,
J.N.K. (1996). Variance estimation for complex surveys using replication
techniques. Statistical Methods in Medical Research, 5, 283-310.
Wolter, K. (2007). Introduction
to Variance Estimation, 2nd Edition. Springer, Berlin.
Yang, S., and Kim, J.K.
(2019). Nearest neighbor imputation for general parameter estimation in survey
sampling. Advances in Econometrics - The Econometrics of Complex Survey
Data: Theory and Applications, 39, 209-234.
Yang, S., and Kim, J.K.
(2020). Predictive mean matching imputation in survey sampling. To appear in the Scandinavian Journal of Statistics.