Browse by

1. Introduction

Jae-kwang Kim, Seunghwan Park and Seo-young Kim

Combining information from different sources is an important problem in statistics. In survey sampling, combining information from multiple surveys can improve the quality of small area estimates. The source of information can come from a probability sample with direct measurements, from another probability sample with indirect measurements (such as self-reported health status), or from auxiliary area-level information. Many approaches of combining information, such as the multiple-frame and statistical matching methods, require access to individual level data, which is not always feasible in practice.

We consider an area-level model approach to small area estimation when there are several sources of auxiliary information. Pfeffermann (2002) and Rao (2003) provided thorough reviews of methods used in small area estimation. Lohr and Prasad (2003) used multivariate models to combine information from several surveys. Ybarra and Lohr (2008) considered the small area estimation problem when the area-level auxiliary information has measurement errors. Merkouris (2010) discussed the small area estimation by combining information from multiple surveys. Raghunathan, Xie, Schenker, Parsons, Davis, Dodd and Feuer (2007) and Manzi, Spiegelhalter, Turner, Flowers and Thompson (2011) used Bayesian hierarchical models to combine information from multiple surveys for small area estimation. Kim and Rao (2012) considered a design-based approach to combining information from two independent surveys.

To describe the setup, suppose that the finite population consists of $H$ subpopulations, denoted by $U_{1}, \dots, U_{H},$ and that we are interested in estimating the subpopulation totals $X_{h} = \sum_{i \in U_{h}} x_{i}$ of a variable $x$ for each area $h .$ We assume that there is a survey that measures $x_{i}$ from the sample but its sample size is not large enough to obtain estimates for $X_{h}$ with reasonable accuracy. Consider one of the surveys, called survey $A,$ as the main survey, and let ${\hat{X}}_{h}$ denote a design-consistent estimator of $X_{h}$ obtained from survey $A .$ Often, we compute ${\hat{X}}_{h} = \sum_{i \in A_{h}} w_{i a} x_{i},$ where $A_{h}$ is the set of sample $A$ for subpopulation $h$ and $w_{i a}$ is the weight of unit $i$ in sample $A .$

In addition to the main survey, suppose that there is another survey, called survey $B,$ that measures a rough estimate for $x_{i} .$ Let $y_{1 i}$ be the measurement taken from survey $B .$ We may assume that $y_{1 i}$ is a rough measurement of $x_{i}$ with some level of measurement error. Thus, we may assume

$y_{1 i} = β_{0} + β_{1} x_{i} + e_{1 i} (1.1)$

for some $(β_{0}, β_{1}),$ where $e_{1 i} \sim (0, σ_{e 1}^{2}) .$ Model (1.1) is variable-specific and the linear regression assumption or equal variance assumptions can be relaxed later. If $(β_{0}, β_{1}) = (0,1),$ then model (1.1) means that there is no measurement bias. Note that model parameters $(β_{0}, β_{1})$ in (1.1) are not area specific, but may be different for groups of areas, as demonstrated in the Korean labor force survey application in Section 5. Separate regression models for different groups may lead to smaller model errors and thus improve the statistical efficiency of the proposed method. From survey $B,$ we can obtain another estimator ${\hat{Y}}_{1 h} = \sum_{i \in B_{h}} w_{i b} y_{1 i}$ of $X_{h},$ where $w_{i b}$ is the weight of unit $i$ in the sample from survey $B$ and $B_{h}$ is the $B -$ sample for subpopulation $h .$ Note that ${\hat{Y}}_{1 h}$ can be obtained, for each area, if the same areas are identified in both surveys $A$ and $B .$ Model (1.1) can be used to combine information from the two surveys.

Finally, another source of information can be the Census information. Census information does not suffer from coverage error or sampling error. But, it may have measurement errors and it does not provide updated information for each month or year. Let $y_{2 i}$ be the measurement for unit $i$ from the Census. The subpopulation total $Y_{2 h} = \sum_{i \in C_{h}} y_{2 i}$ is available when $C_{h}$ is the set of Census $C$ for subpopulation $h .$

Table 1.1 summarizes the major sources of information that we can consider into small area estimation.

Table 1.1
Available information for small area estimation
Table summary
This table displays the results of Available information for small area estimation. The information is grouped by Data (appearing as row headers), Observation, Area level estimate and Properties (appearing as column headers).
Data	Observation	Area level estimate	Properties
Survey $A$	direct obs. $(x_{i})$	${\hat{X}}_{h}, \hat{V} ({\hat{X}}_{h})$	Sampling error (large)
Survey $B$	aux. obs. $(y_{1 i})$	${\hat{Y}}_{1 h}, \hat{V} ({\hat{Y}}_{1 h})$	Bias Measurement error Sampling error
Census	aux. obs. $(y_{2 i})$	$Y_{2 h}$	Measurement error No updated information

In this paper, we consider an area-level model approach for small area estimation combining all available information. The proposed approach is based on the measurement error models, where the sampling errors of the direct estimators are treated as measurement errors, and all the other auxiliary information are combined through a set of linking models. The proposed approach is applied to the small area estimation problem for labor force surveys in Korea, where three estimates are combined to produce small area estimates for unemployment rates.

The paper is organized as follows. In Section 2, the basic setup is introduced and the small area estimation problem is viewed as a measurement error model prediction problem. In Section 3, parameter estimation for the area level small area model is discussed. In Section 4, estimation of mean squared error is briefly discussed. In Section 5, the proposed method is applied to the labor force survey data in Korea. Concluding remarks are made in Section 6.

Previous | Next

Date modified:: 2015-11-27

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

1. Introduction