1. Introduction
Jae-kwang Kim, Seunghwan Park and Seo-young Kim
Previous | Next
Combining
information from different sources is an important problem in statistics. In
survey sampling, combining information from multiple surveys can improve the
quality of small area estimates. The source of information can come from a
probability sample with direct measurements, from another probability sample
with indirect measurements (such as self-reported health status), or from
auxiliary area-level information. Many approaches of combining information,
such as the multiple-frame and statistical matching methods, require access to
individual level data, which is not always feasible in practice.
We
consider an area-level model approach to small area estimation when there are
several sources of auxiliary information. Pfeffermann (2002) and Rao (2003)
provided thorough reviews of methods used in small area estimation. Lohr and
Prasad (2003) used multivariate models to combine information from several
surveys. Ybarra and Lohr (2008) considered the small area estimation problem
when the area-level auxiliary information has measurement errors. Merkouris
(2010) discussed the small area estimation by combining information from
multiple surveys. Raghunathan, Xie, Schenker, Parsons, Davis, Dodd and Feuer (2007)
and Manzi, Spiegelhalter, Turner, Flowers and Thompson (2011) used Bayesian
hierarchical models to combine information from multiple surveys for small area
estimation. Kim and Rao (2012) considered a design-based approach to combining
information from two independent surveys.
To
describe the setup, suppose that the finite population consists of
subpopulations,
denoted by
and that we are
interested in estimating the subpopulation totals
of a variable
for each area
We assume that
there is a survey that measures
from the sample
but its sample size is not large enough to obtain estimates for
with reasonable
accuracy. Consider one of the surveys, called survey as the main survey, and
let
denote a
design-consistent estimator of
obtained from
survey Often, we compute
where
is the set of
sample for subpopulation
and
is the weight of
unit
in sample
In
addition to the main survey, suppose that there is another survey, called
survey that measures a rough estimate for
Let
be the
measurement taken from survey We may assume that
is a rough
measurement of
with some level
of measurement error. Thus, we may assume
for some
where
Model (1.1) is
variable-specific and the linear regression assumption or equal variance
assumptions can be relaxed later. If
then model (1.1)
means that there is no measurement bias. Note that model parameters
in (1.1) are not
area specific, but may be different for groups of areas, as demonstrated in the
Korean labor force survey application in Section 5. Separate regression models
for different groups may lead to smaller model errors and thus improve the
statistical efficiency of the proposed method. From survey we can obtain
another estimator
of
where
is the weight of
unit
in the sample
from survey and
is the sample
for subpopulation
Note that
can be obtained,
for each area, if the same areas are identified in both surveys and Model
(1.1) can be used to combine information from the two surveys.
Finally,
another source of information can be the Census information. Census information
does not suffer from coverage error or sampling error. But, it may have
measurement errors and it does not provide updated information for each month
or year. Let
be the
measurement for unit
from the Census.
The subpopulation total
is available
when
is the set of
Census
for
subpopulation
Table 1.1 summarizes the major sources of information that we can consider into small
area estimation.
Table 1.1
Available information for small area estimation
Table summary
This table displays the results of Available information for small area estimation. The information is grouped by Data (appearing as row headers), Observation, Area level estimate and Properties (appearing as column headers).
| Data |
Observation |
Area level estimate |
Properties |
| Survey |
direct obs. |
|
Sampling error (large) |
| Survey |
aux. obs. |
|
Bias
Measurement error
Sampling error |
| Census |
aux. obs. |
|
Measurement error
No updated information |
In
this paper, we consider an area-level model approach for small area estimation
combining all available information. The proposed approach is based on the
measurement error models, where the sampling errors of the direct estimators
are treated as measurement errors, and all the other auxiliary information are
combined through a set of linking models. The proposed approach is applied to
the small area estimation problem for labor force surveys in Korea, where three
estimates are combined to produce small area estimates for unemployment rates.
The
paper is organized as follows. In Section 2, the basic setup is introduced and
the small area estimation problem is viewed as a measurement error model
prediction problem. In Section 3, parameter estimation for the area level small
area model is discussed. In Section 4, estimation of mean squared error is
briefly discussed. In Section 5, the proposed method is applied to the labor
force survey data in Korea. Concluding remarks are made in Section 6.
Previous | Next