Small area benchmarked estimation under the basic unit level model when the sampling rates are non‑negligible
Section 6. Conclusion
In general, the sum of model-based small area estimates
is not equal to a direct estimate obtained across the union of these small
areas. The weight that is associated with the direct estimator can be the
sampling weight or one obtained as a result of using the GREG estimator. The
auxiliary data that are used to obtain the GREG and the unit-level small area
estimates may not necessarily coincide. In this paper, we have suggested
several benchmarking procedures for two well-known small area estimators (EBLUP
and YR) that are based on the unit level model. We considered the case when the
sampling rates are not negligible, and that the sample design is ignorable. In
the event that it is deemed that the sample design is not ignorable for some of
the survey items, the auxiliary data vector
in model (2.2) could be augmented by including an additional variable
specified function of the survey
weights to offset the potential bias of the EBLUP or YR estimators. Verret et al.
(2015) proposed a number of choices for
that included the survey weight
In the case of the EBLUP estimator,
benchmarking is achieved by adding the variable
Since
should be highly correlated to
the suggested procedure for benchmarking EBLUP
should provide good protection against possible non ignorable sampling. The
simulations in Verret et al. (2015) illustrated that the YR procedure, on
its own, provides good protection as well against possible non ignorable
sampling. Their simulation also showed that further protection can be obtained
by their setting
equal to
We extended the benchmarking procedures in Stefan and
Hidiroglou (2020) to the case of non‑negligible sampling rates within each
small area. These procedures are based on estimators that were initially
developed by Battese et al. (1988) (EBLUP estimator), and You and Rao
(2002) (YR estimator) when the sampling rates within each small area are
negligible. Ugarte et al. (2009) proposed a different benchmarked
estimator which is a restricted EBLUP estimator. We extended the procedure in
Ugarte et al. (2009) to obtain a benchmarked estimator that incorporates
the survey weights, and that is essentially a restricted YR estimator. We also
considered two benchmarked estimators based on simple ratio adjustments applied
on the EBLUP and YR estimators respectively. We carried out a simulation study
to compare the properties of these six benchmarked estimators.
If the auxiliary data used to estimate the small area
means are the same as those used in the GREG, and if the model is correct, the
restricted procedure in Ugarte et al. (2009) and the ratio adjusted EBLUP
estimator will have the smaller
and
On the other hand, if the model is incorrect
and the auxiliary data are the same ones, the YR estimator based on Stefan and
Hidiroglou (2020) procedure, adapted to non‑negligible sampling rates, has the
smallest
whereas
the restricted YR estimator has the smallest
On the other hand, if the auxiliary data used
to estimate the small area means are not the same as those used in the GREG, we
come to the following conclusions. The restricted EBLUP and the ratio adjusted
EBLUP estimators are the benchmarked estimators that have the smallest
and
if the model is correct. If the model is not
correct, the restricted YR estimator is the preferred choice both in terms of
and
Benchmarking should be based on the EBLUP procedure if
the linear mixed effects model is appropriate. If the linear model and the
benchmark (the GREG estimator) have in common a large amount of auxiliary
information, the benchmarked estimators are similar to their non benchmarked
versions, otherwise the loss of efficiency due to benchmarking may be
important. If the model is not correct, the YR procedure should be used to
achieve benchmarking. In this case, benchmarking may bring about important
gains in terms of
and
especially if the small area model and the
GREG estimator share a small number of auxiliary variables. The finite
populations associated with incorrect modeling were generated based on model (4.2),
with mean function incorrectly specified. However, there are many ways in which
a model may be wrong, and the conclusions associated with these cases
may be different.
Acknowledgements
We would like to thank the two anonymous referees and
the associate editor for their constructive suggestions.
Appendix A
Proof of Result 1. The EBLUP estimators
and
that are
based on model (3.6), satisfy the equation
Equation (A.1)
has the form of equation (2.10) and corresponds to augmented model (3.6).
Expanding the second equation in (A.1), we obtain that
The variable
is
defined as
The
right-hand side of (A.2) is
The sums that
appear on the left-hand side of (A.2) are given by
In
establishing that last equality of (A.4), we used that
and that
weights
satisfy
equation (3.3). Result 1 follows by replacing (A.3), (A.4), (A.5) and (A.6)
into (A.2).
Proof of Result 2. The survey-weighted estimating equations that
defines
and
are
given by (2.12) constructed with the weights
Since the
first term of
is one
(representing an intercept), it follows that
The terms in
(A.7) are given by:
and
Plugging (A.8),
(A.9) and (A.10) into (A.7) leads to
Equation (A.11)
proves Result 2.
Appendix B
Re-parameterized REML estimation of variance components
Let
be the vector of variance components, where
and
We define the vector
such that
and
The restricted maximum log-likelihood
function, denoted as
is
where
is a
generic constant,
and
Notice
that
The
solution to the maximization of
is
obtained iteratively using the Fisher-scoring algorithm by updating the
following equation
Here,
is the
vector of first-order partial derivatives of
with
respect to
and
is the
matrix of expected second-order derivatives of
with
respect to
where
Under the BHF model, the first-order partial derivatives
of
are given by
where
and
The
expected values of the second-order partial derivatives of
are
The
re-parameterized REML estimator of
is
obtained as
References
Battese, G.E., Harter, R.M. and Fuller, W.A. (1988). An
error component model for prediction of county crop areas using survey and
satellite data. Journal of the American
Statistical Association, 83, 28-36.
Bell, W.R., Datta, G.S. and Ghosh, M. (2013). Benchmarking
small area estimators. Biometrika,
100, 189-202.
Datta, G.S., Ghosh, M., Steorts, R. and Maples, J.
(2011). Bayesian benchmarking with applications to small area estimation. Test, 20, 574-588.
Deville, J.-C., and Särndal, C.-E. (1992). Calibration
estimators in survey sampling. Journal of
the American Statistical Association, 87, 376-382.
Fay, R.E., and Herriot, R.A. (1979). Estimation of
income for small places: An application of James-Stein procedures to census
data. Journal of the American Statistical
Association, 74, 269-277.
Hidiroglou, M.A., and Estevao, V.M. (2016). A comparison
of small area and traditional estimators via simulation. Statistics in Transition, 17, 133-154.
Huang, R., and Hidiroglou, M.A. (2003). Design
consistent estimators for a mixed linear model on survey data. Proceedings of the Section on Survey
Research Methods, American Statistical Association, 1897-1904.
Moghtased-Azar, K., Tehranchi, R. and Amiri-Simkooei, A.R.
(2014). An alternative method for non-negative estimation of variance
components. Journal of Geodesy, 88,
427-439.
Nandram, B., and Sayit, H. (2011). A
Bayesian analysis of small area probabilities under a constraint. Survey Methodology, 37, 2, 137-152.
Paper available at
https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2011002/article/11603-eng.pdf.
Pfeffermann, D., and Barnard, C. (1991). Some new
estimators for small area means with applications to the assessment of farmland
values. Journal of Business and Economic
Statistics, 9, 73-83.
Prasad, N.G.N., and Rao, J.N.K. (1990). The estimation
of the mean squared error of small-area estimators. Journal of the American Statistical Association, 85, 163-171.
Rao, J.N.K., and Molina, I. (2015). Small Area Estimation. New York: John Wiley & Sons, Inc.
Särndal, C.-E., Swensson, B. and Wretman, J.H. (1989).
The weighted residual technique for estimating the variance of the general
regression estimator of the finite population total. Biometrika, 76, 527-537.
Stefan, M., and Hidiroglou, M.A. (2020). Benchmarked
estimators for a small area mean under a one-fold nested regression model. International Statistical Review, (To appear).
Tillé, Y. (2006). Sampling
Algorithms. New York: Springer.
Ugarte, M.D., Militino, A.F. and Goicoa, T. (2009).
Benchmarked estimates in small areas using linear mixed models with
restrictions. Test, 18, 342-364.
Verret, F., Rao, J.N.K. and Hidiroglou, M.A. (2015). Model-based
small area estimation under informative sampling. Survey Methodology, 41, 2, 333-347.
Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2015002/article/14248-eng.pdf.
Wang, J., Fuller, W.A. and Qu, Y. (2008). Small
area estimation under a restriction. Survey Methodology, 34, 1, 29-36. Paper available at
https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2008001/article/10619-eng.pdf.
You, Y., and Rao, J.N.K. (2002). A pseudo-empirical best
linear unbiased prediction approach to small area estimation using survey
weights. The Canadian Journal of
Statistics, 30, 431-439.
You, Y., Rao, J.N.K. and Dick, P. (2004). Benchmarking
hierarchical Bayes small area estimators in the Canadian census undercoverage
estimation. Statistics in Transition,
6, 631-640.
You, Y., Rao, J.N.K. and Hidiroglou, M.A. (2013). On
the performance of self benchmarked small area estimators under the Fay-Herriot
area level model. Survey
Methodology, 39, 1, 217-229. Paper available at
https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2013001/article/11830-eng.pdf.