Integration of data from probability surveys and big found data for finite population inference using mass imputation

Articles and reports: 12-001-X202100100004

Description:

Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining data from a probability survey and big found data. We focus on the case when the study variable is observed in the big data only, but the other auxiliary variables are commonly observed in both data. Unlike the usual imputation for missing data analysis, we create imputed values for all units in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency.

Issue Number: 2021001
Author(s): Yang, Shu; Kwang Kim, Jae; Hwang, Youngdeok

Main Product: Survey Methodology

FormatRelease dateMore information
HTMLJune 24, 2021
PDFJune 24, 2021

Related information

Subjects and keywords

Subjects

Keywords

Date modified: