6. Concluding remark
Jae-kwang Kim, Seunghwan Park and Seo-young Kim
Previous
In
this paper, a small area estimation problem is treated as a measurement error
model prediction problem where the covariates, which are the direct estimates
for small areas, are subject to sampling errors. In our measurement error model
approach, the sampling errors of the direct estimators are treated as
measurement errors and the structural error model can be used to link the other
auxiliary estimates to the direct estimators. The proposed model is actually
the opposite of the model of Ybarra and Lohr (2008), where the direct estimator
is treated as a dependent variable in the regression model and the nonsampling
errors of auxiliary estimates are treated as measurement errors.
In
our approach, each auxiliary estimate is treated as a dependent variable in the
regression model using the direct estimate as the covariate and the sampling
error of the direct estimator is treated as measurement error. The measurement
error variance is easy to estimate because it is essentially the sampling
variance of the direct estimate. The measurement error model approach is also
very useful when there are several sources of auxiliary information of
area-levels. Unlike the Bayesian approach, the resulting estimator does not
rely on parametric model assumptions about the structural error model and is
still optimal in the sense of minimizing the mean squared errors among the
class of unbiased estimators that are linear in the available data.
In
the example of the Korean labor survey application, two sample estimates and
the Census information are used to compute the GLS estimates for small area
parameters and the two sample estimates are correlated due to the two-phase
sampling structure. We simply used linear regression models for the linking
models, mainly for the sake of computational simplicity. Instead of the linear
model, one may consider a generalized linear model to improve model prediction
power. Such extension would involve the theory for nonlinear measurement error
models. Further investigation on this extension will be a topic of future
research.
Acknowledgements
We
thank an anonymous referee and the Associate Editor for their constructive
comments. The research of the first author was partially supported by a grant
from NSF (MMS-121339).
Appendix
Reversed two-phase sampling
In the classical two-phase sampling, the second-phase
sample
is a subset of
the first-phase sample
We consider
another type of sampling design that has a reversed structure of the two-phase
sampling design. In the reversed two-phase sampling design, we have the
following sampling steps:
- Step 1 From the finite population, we select the
first-phase sample
of size
- Step 2 In the
second-phase sample, we select
from
of size
The final sample
consists of
and
That is,
and
The reversed two-phase sampling is used when the sample is augmented by an
additional sampling procedure.
To
discuss parameter estimation under reversed two-phase sampling, let
be the
first-order inclusion probability for
Let
be the
conditional first-order inclusion probability for
given
To compute the
inclusion probability for
Thus, we can use
to compute the
Horvitz-Thompson estimator of the form
Note that, instead of (A.1), we can consider the following class of
estimators:
Since
and
are both
unbiased for
is also unbiased
regardless of the choice of
A reasonable
choice of
is
Under
simple random sampling in both designs, the two estimators are equal to
where
is the sample
mean of
in
Writing
and
we have
where
Using
where
we have, for
Also,
If
does not hold,
then (A.5) and (A.6) do not hold.
In
the KLF application in Section 5, since
and
are measuring
the same item, we may assume
and the
variance-covariance matrix of the sampling errors can be smoothed as
References
Battese, G.E., Harter,
R.M. and Fuller, W.A. (1988). An error-components model for prediction of county
crop areas using survey and satellite data. Journal of the American
Statistical Association, 83, 28-36.
Carroll, R.J., Rupert, D.
and Stefanski, L.A. (1995). Measurement error in nonlinear models. New
York: Chapman & Hall.
Fay, R.E., and Herriot,
R.A. (1979). Estimation of income from small places: An application of
James-Stein procedures to census data. Journal of the American Statistical
Association, 74, 269-277.
Fuller, W.A. (1987). Measurement error models. New York: John Wiley & Sons, Inc.
Fuller, W.A. (1991).
Small area estimation as a measurement error problem. In Economic Models,
Estimation, and Socioeconomic Systems: Essays in Honor of Karl A. Fox, (Eds.,
Tij K. Kaul and Jati K. Sengupta), Elsevier Science
Publishers, 333-352.
Fuller, W.A. (2009). Sampling Statistics. John Wiley & Sons, Inc., Hoboken, NJ.
Jiang, J., Lahiri, P. and
Wan, S. (2002). A unified jackknife theory for empirical best prediction with
M-estimation. Annals of Statistics, 30, 1782-1810.
Kackar, R.N., and
Harville, D.A. (1984). Approximations for standard errors of estimators of
fixed and random effects in mixed linear models. Journal of the American
Statistical Association, 79, 853-862.
Kim, J.K., and Rao,
J.N.K. (2012). Combining data from two independent surveys: A model-assisted
approach. Biometrika, 99, 85-100.
Lohr, S.L., and Prasad,
N.G.N. (2003). Small area estimation with auxiliary survey data. The
Canadian Journal of Statistics, 31, 383-396.
Manzi, G., Spiegelhalter,
D.J., Turner, R.M., Flowers, J. and Thompson, S.G. (2011). Modelling bias in
combining small area prevalence estimates from multiple surveys. Journal of
the Royal Statistical Society A, 174, 31-50.
Merkouris, T. (2010).
Combining information from multiple surveys by using regression for efficient
small domain estimation. Journal of the Royal Statistical Society B, 68,
509-521.
Pfeffermann, D. (2002).
Small area estimation - New developments and directions. International
Statistical Review, 70, 125-144.
Quenouille, M.H. (1956).
Notes on bias in estimation. Biometrika, 43, 353-360.
Raghunathan, T.E., Xie,
D., Schenker, N., Parsons, V.I., Davis, W.W., Dodd, K.W. and Feuer, E.J.
(2007). Combining information from two surveys to estimate county-level
prevalence rates of cancer risk factors and screening. Journal of the
American Statistical Association, 102, 474-486.
Rao, J.N.K. (2003). Small Area Estimation. John Wiley & Sons, Inc., Hoboken, NJ.
Schafer, D.W. (2001).
Semiparametric maximum likelihood for measurement error model regression. Biometrics, 57, 53-61.
Ybarra, L.M.R., and Lohr,
S.L. (2008). Small area estimation when auxiliary information is measured with
error. Biometrika, 95, 919-931.
Previous