5. Summary

Jeroen Pannekoek and Li-Chun Zhang

Previous

In this paper we have formulated an optimization approach to the micro-level inconsistency problem that may be caused by measurement errors and/or imputation of missing values. This provides a general methodology that extends beyond the traditional single-constraint adjustment methods such as prorating. All constraints are handled simultaneously; if a variable appears in more than one constraint then it is adjusted according to all of them. Besides being optimal according to the chosen distance (or discrepancy) function, the approach also has the practical advantage that there is no need to specify the order in which the constraints are to be applied.

Several distance (or discrepancy) functions are analysed. It is shown that minimizing the weighted least squares leads to additive adjustments and minimizing the Kullback-Leibler divergence measure leads to multiplicative adjustments. However, for a specific choice of weights the WLS solution of the optimization problem is an approximation to the KL solution.

Adjustments based on statistical assumptions in addition to the logical constraints is introduced under the generalized ratio approach. The GR adjustments can be considered as a generalization of the single-ratio adjustment under a ratio model. All the observed variable-specific ratios between the receptor and donor records are utilized; a variable that does not stand in any constraint can also be adjusted if it is included in the distance function.

Also discussed are adjustments involving categorical data, unit-missing records and macro-level benchmark constraints in addition to the micro-level consistency constraints. Taken together, the proposed optimization approach is applicable to continuous data in a number of situations.

Acknowledgements

The views expressed in this paper are those of the authors and do not necessarily reflect the policies of Statistics Netherlands.

References

Banff Support Team (2008). Functional Description of the Banff System for Edit and Imputation. Technical Report, Statistics Canada.

Bankier, M., Lachance, M. and Poirier, P. (2000). 2001 Canadian Census Minimum Change Donor Imputation Methodology. Working paper 17, UN/ECE Work Session on Statistical Data Editing, Cardiff.

Beaumont, J.-F. (2005). Calibrated imputation in surveys under a quasi-model-assisted approach. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 67, 445-458.

Boyd, S., and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press, Cambridge.

Censor, Y., and Zenios, S.A. (1997). Parallel Optimization. Theory, Algorithms, and Applications. Oxford University Press, New York.

Chambers, R.L., and Ren, R. (2004). Outlier robust imputation of survey data. In Proceedings of the Survey Research Methods Section, American Statistical Association, 3336-3344.

Chen, J., and Shao, J. (2000). Biases and variances of survey estimators based on nearest neighbour imputation. Journal of Official Statistics, 16, 113-132.

de Waal, T., Pannekoek, J. and Scholtus, S. (2011). Handbook of Statistical Data Editing and Imputation. New Jersey: John Wiley & Sons Inc., Hoboken.

Luenberger, D.G. (1984). Linear and Nonlinear Programming, Second Edition. Addison-Wesley, Reading.

Pannekoek, J., Shlomo, N. and de Waal, T. (2013). Calibrated imputation of numerical data under linear edit restrictions. Annals of Applied Statistics, 7, 1983-2006.

Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer-Verlag.

van der Loo, M. (2012). rspa: Adapt numerical records to (in)equality restrictions with the Successive Projection Algorithm. R package version 0.1-5. Available at: http://cran.r-project.org/web/packages/rspa/index.html.

Zhang, L.-C. (2009). A Triple-Goal Imputation Method for Statistical Registers. Working paper 28, UN/ECE Work Session on Statistical Data Editing, Neuchâtel, Switzerland.

Previous

Date modified: