A note on multiply robust predictive mean matching imputation with complex survey data
Section 4. Simulation study

Table of contents

To assess the performance of the proposed method in terms of bias and efficiency, we conducted a limited simulation study. We generated $B =$ 2,000 finite populations, each of size $N =$ 20,000. First, the explanatory variables $x_{1}$ - $x_{4}$ were generated from a multivariate standard normal distribution. Then, given $x_{1}$ - $x_{4},$ we generated the survey variable $y$ according to the following outcome regression models:

(M1). $y = 1 + x_{1} + x_{2} + x_{3} + x_{4} + ε,$ where $ε ~ N (0, 1) .$

(M2). $y = 1 + x_{1}^{2} + x_{2}^{2} + x_{3} + x_{4} + x_{3} x_{4} + ε,$ where $ε ~ N (0, 1) .$

Note that both (M1) and (M2) are linear models based on the explanatory variables $x_{1}$ - $x_{4},$ except that (M2) includes quadratic terms and an interaction term.

From each finite population, a probability sample $S$ was selected according to probability proportional-to-size (PPS) systematic sampling based on the size variable $z_{i} = \log (0.1 | y_{i} + v_{i} | + 4),$ where $v_{i} ~ N (0, 1) .$ The first-order inclusion probabilities are given by $π_{i} = n z_{i} / \sum_{i =1}^{N} z_{i}$ with $n =$ 200, 500 and 1,000.

In each sample, the response indicators $r_{i}$ were generated from a Bernoulli distribution with probability $p_{i},$ where

$p_{i} = 0 .1 + 0 .9 \times \frac{\exp (α_{0} + α_{1} x_{1 i} + α_{2} x_{2 i} + α_{3} x_{3 i} + α_{4} x_{4 i})}{1 + \exp (α_{0} + α_{1} x_{1 i} + α_{2} x_{2 i} + α_{3} x_{3 i} + α_{4} x_{4 i})} . (4.1)$

We used two sets of values for $(α_{0}, α_{1}, α_{2}, α_{3}, α_{4}) :$ $(0, 1, 1, 1, 1)$ and $(1 .38, 1, 1, 1, 1) .$ These led to response rates approximately equal to 70%, and 50%, respectively.

We computed the following estimators of $θ$

(Naive).

The weighted mean of the respondents,

{\hat{θ}}_{naive} = \sum_{i \in S_{r}} w_{i} y_{i} / \sum_{i \in S_{r}} w_{i} .

(Reg).

The imputed estimator based on deterministic linear regression imputation, assuming the model

(​ M 1 ​) .

(PMM1).

The imputed estimator based on PMM, where the score

{\hat{m}}_{i}, i \in S,

was obtained by fitting the model

(​ M 1 ​) .

(New1).

The imputed estimator based on the proposed multiply robust PMM procedure using both models (M1) and (M2).

(New2).

The imputed estimator based on the proposed multiply robust PMM procedure using models (M1), (M2), and two additional models (M3) and (M4), where (M3) uses

x_{1}

only as the predictor and (M4) uses

x_{1}^{2}

only as the predictor.

We computed the Monte Carlo relative bias (MCRB), the Monte Carlo relative standard error (MCRSE) and the Monte Carlo relative root mean squared error (MCRMSE), defined respectively as

$MCRB = \frac{{2,000}^{- 1} \sum_{b =1}^{2,000} ({\hat{θ}}_{b} - θ_{b})}{θ_{MC}},$

$MCRSE = \frac{\sqrt{{(B - 1)}^{- 1} {\sum_{b =1}^{B} ({\hat{θ}}_{b} - {\hat{θ}}_{MC})}^{2}}}{θ_{MC}}$

and

$MCRMSE = \frac{\sqrt{{(B - 1)}^{- 1} {\sum_{b =1}^{B} ({\hat{θ}}_{b} - θ_{MC})}^{2}}}{θ_{MC}},$

where $θ_{b}$ denotes the population mean in the $b^{th}$ population, ${\hat{θ}}_{b}$ denotes the estimator $\hat{θ}$ in the $b^{th}$ sample, $b = 1, \dots,$ 2,000, and

$θ_{MC} = \frac{1}{2,000} \sum_{b =1}^{2,000} θ_{b}, {\hat{θ}}_{MC} = \frac{1}{2,000} \sum_{b =1}^{2,000} {\hat{θ}}_{b} .$

The results are presented in Tables 4.1 and 4.2. The naive estimator exhibited a significant bias in all the scenarios, as expected. When the true model was given by (M1), we note from Table 4.1 that linear regression imputation performed very well in terms of bias, as expected. Both PMM and the proposed method showed negligible bias for $n =$ 1,000 and a slight bias for $n =$ 500 and $n =$ 200. For instance, for $n =$ 200 and a response rate of 70%, the value of RB was equal to 2.4% for PMM, New1 and New2. In terms of efficiency, linear regression imputation slightly outperformed both PMM and the proposed methods, as expected. For instance for $n =$ 1,000 and a response rate of 70%, the value of RMSE was equal to 7.5% for linear regression imputation and equal to 8.0% for both PMM, New1 and New2. It is worth pointing out that both PMM and the proposed methods exhibited almost identical performances in all the scenarios presented in Table 4.1. Therefore, incorporating two additional models did not seem to affect the efficiency of the resulting estimator (New2).

When the true model was given by (M2), we note from Table 4.2 that both linear regression imputation and PMM led to significant biases in all the scenarios, as expected. Being a parametric imputation procedure, linear regression imputation is vulnerable to model misspecification. On the other hand, PMM showed smaller biases than linear regression imputation, suggesting some robustness against model misspecification. For instance, for $n =$ 1,000 and a response rate of 70%, the value of RB was equal to -9.2% for linear regression imputation and -3.7% for PMM. The proposed methods outperformed both linear regression imputation and PMM in terms of bias, standard error and mean square error in all the scenarios. Finally, both New1 and New2 exhibited almost identical performances.

Table 4.1
Monte Carlo relative bias (MCRB), relative standard error (MCRSE), and relative root mean squared error (MCRMSE) when the true model is (M1)
Table summary
This table displays the results of Monte Carlo relative bias (MCRB) Method (appearing as column headers).
			Method
Response rate	Sample Size	Measure $(× 1 0^{2})$	Naive	Reg	PMM1	New1	New2
70%	1,000	MCRB	64.7	-0.1	0.4	0.4	0.4
		MCRSE	7.5	7.5	8.0	8.0	8.0
		MCRMSE	65.1	7.5	8.0	8.0	8.0
70%	500	MCRB	65.3	0.5	1.4	1.4	1.4
		MCRSE	10.7	10.4	11.2	11.2	11.2
		MCRMSE	66.1	10.4	11.3	11.3	11.3
70%	200	MCRB	64.6	0.3	2.4	2.4	2.4
		MCRSE	16.5	16.7	17.5	17.5	17.6
		MCRMSE	66.7	16.7	17.7	17.7	17.7
50%	1,000	MCRB	99.3	0.0	0.7	0.7	0.6
		MCRSE	8.8	8.1	9.0	9.0	9.0
		MCRMSE	99.7	8.1	9.1	9.1	9.1
50%	500	MCRB	98.9	-0.1	1.3	1.3	1.3
		MCRSE	12.1	11.2	12.5	12.5	12.5
		MCRMSE	99.6	11.2	12.6	12.6	12.6
50%	200	MCRB	99.8	0.8	4.3	4.3	4.4
		MCRSE	19.3	17.7	19.6	19.6	19.6
		MCRMSE	101.6	17.7	20.1	20.1	20.0

Table 4.2
Monte Carlo relative bias (MCRB), relative standard error (MCRSE), and relative root mean squared error (MCRMSE) when the true model is (M2)
Table summary
This table displays the results of Monte Carlo relative bias (MCRB) Method (appearing as column headers).
			Method
Response rate	Sample Size	Measure $(× 1 0^{2})$	Naive	Reg	PMM1	New1	New2
70%	1,000	MCRB	7.5	-9.2	-3.7	0.1	0.1
		MCRSE	3.5	3.5	3.9	3.1	3.1
		MCRMSE	8.2	9.9	5.4	3.1	3.1
70%	500	MCRB	7.5	-9.4	-4.0	0.2	0.2
		MCRSE	5.0	5.1	5.6	4.5	4.5
		MCRMSE	9.0	10.7	6.9	4.5	4.5
70%	200	MCRB	7.6	-9.2	-4.0	0.1	0.1
		MCRSE	7.8	7.9	8.5	6.8	6.8
		MCRMSE	10.9	12.1	9.4	6.8	6.8
50%	1,000	MCRB	16.6	-11.3	-3.1	0.3	0.3
		MCRSE	4.0	4.5	5.0	3.3	3.3
		MCRMSE	17.1	12.2	5.9	3.3	3.3
50%	500	MCRB	16.5	-11.5	-3.5	0.3	0.3
		MCRSE	5.7	6.3	7.0	4.8	4.7
		MCRMSE	17.5	13.2	7.8	4.8	4.8
50%	200	MCRB	16.5	-12.0	-3.9	-0.1	-0.1
		MCRSE	9.1	9.9	11.0	7.4	7.4
		MCRMSE	18.8	15.6	11.7	7.4	7.4

Acknowledgements

S. Chen was supported by the National Institute on Minority Health and Health Disparities (NIMHD) at National Institutes of Health (NIH) (1R21MD014658-01A1) and the Oklahoma Shared Clinical and Translational Resources (U54GM104938) with an Institutional Development Award (IDeA) from National Institute of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The work of D. Haziza was supported by grants from the Natural Sciences and Engineering Research Council of Canada.

References

Beaumont, J.-F., and Bocci, C. (2009). Variance estimation when donor imputation is used to fill in missing values. Canadian Journal of Statistics, 37, 400-416.

Chen, S., and Haziza, D. (2017). Multiply robust imputation procedures for the treatment of item nonresponse in surveys. Biometrika, 102, 439-453.

Chen, S., and Haziza, D. (2019a). Multiply robust nonparametric multiple imputation for the treatment of missing data. Statistica Sinica, 29, 2035-2053.

Chen, S., and Haziza, D. (2019b). Recent developments in dealing with item nonresponse in surveys: A critical review. International Statistical Review, 87, S192-S218.

Chen, J., and Shao, J. (2000). Nearest-neighbour imputation for survey data. Journal of Official Statistics, 16, 583-599.

Han, P. (2014). Multiply robust estimation in regression analysis with missing data. Journal of the American Statistical Association, 109, 1159-1173.

Han, P., and Wang, L. (2013). Estimation with missing data: Beyond double robustness. Biometrika, 100, 417-430.

Little, R.J.A. (1988). Missing-data adjustments in large surveys. Journal of Business and Economic Statistics, 6, 287-296.

Rust, K.F., and Rao, J.N.K. (1996). Variance estimation for complex surveys using replication techniques. Statistical Methods in Medical Research, 5, 283-310.

Wolter, K. (2007). Introduction to Variance Estimation, 2^nd Edition. Springer, Berlin.

Yang, S., and Kim, J.K. (2019). Nearest neighbor imputation for general parameter estimation in survey sampling. Advances in Econometrics - The Econometrics of Complex Survey Data: Theory and Applications, 39, 209-234.

Yang, S., and Kim, J.K. (2020). Predictive mean matching imputation in survey sampling. To appear in the Scandinavian Journal of Statistics.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2021-06-24

Language selection

Search and menus

Search

A note on multiply robust predictive mean matching imputation with complex survey data
Section 4. Simulation study

Acknowledgements

References

A note on multiply robust predictive mean matching imputation with complex survey data Section 4. Simulation study

Acknowledgements

References

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

A note on multiply robust predictive mean matching imputation with complex survey data
Section 4. Simulation study