Browse by

6. Concluding remark

Jae-kwang Kim, Seunghwan Park and Seo-young Kim

In this paper, a small area estimation problem is treated as a measurement error model prediction problem where the covariates, which are the direct estimates for small areas, are subject to sampling errors. In our measurement error model approach, the sampling errors of the direct estimators are treated as measurement errors and the structural error model can be used to link the other auxiliary estimates to the direct estimators. The proposed model is actually the opposite of the model of Ybarra and Lohr (2008), where the direct estimator is treated as a dependent variable in the regression model and the nonsampling errors of auxiliary estimates are treated as measurement errors.

In our approach, each auxiliary estimate is treated as a dependent variable in the regression model using the direct estimate as the covariate and the sampling error of the direct estimator is treated as measurement error. The measurement error variance is easy to estimate because it is essentially the sampling variance of the direct estimate. The measurement error model approach is also very useful when there are several sources of auxiliary information of area-levels. Unlike the Bayesian approach, the resulting estimator does not rely on parametric model assumptions about the structural error model and is still optimal in the sense of minimizing the mean squared errors among the class of unbiased estimators that are linear in the available data.

In the example of the Korean labor survey application, two sample estimates and the Census information are used to compute the GLS estimates for small area parameters and the two sample estimates are correlated due to the two-phase sampling structure. We simply used linear regression models for the linking models, mainly for the sake of computational simplicity. Instead of the linear model, one may consider a generalized linear model to improve model prediction power. Such extension would involve the theory for nonlinear measurement error models. Further investigation on this extension will be a topic of future research.

Acknowledgements

We thank an anonymous referee and the Associate Editor for their constructive comments. The research of the first author was partially supported by a grant from NSF (MMS-121339).

Appendix

Reversed two-phase sampling

In the classical two-phase sampling, the second-phase sample $(A_{2})$ is a subset of the first-phase sample $(A_{1}) .$ We consider another type of sampling design that has a reversed structure of the two-phase sampling design. In the reversed two-phase sampling design, we have the following sampling steps:

Step 1 From the finite population, we select the first-phase sample $A_{1}$ of size $n_{1} .$
Step 2 In the second-phase sample, we select $A_{2}$ from $U - A_{1}$ of size $n_{2} .$ The final sample $A$ consists of $A_{1}$ and $A_{2} .$ That is, $A = A_{1} \cup A_{2}$ and $| A | = n = n_{1} + n_{2} .$

The reversed two-phase sampling is used when the sample is augmented by an additional sampling procedure.

To discuss parameter estimation under reversed two-phase sampling, let $π_{1 i} = Pr (i \in A_{1})$ be the first-order inclusion probability for $A_{1} .$ Let $π_{2 i | 1} = Pr (i \in A_{2} | A_{1}^{c})$ be the conditional first-order inclusion probability for $A_{2}$ given $A_{1}^{c} = U - A_{1} .$ To compute the inclusion probability for $A,$

$Pr (i \in A) = Pr (i \in A_{1}) + Pr (i \in A_{2} | A_{1}^{c}) Pr (i \in A_{1}^{c}) .$

Thus, we can use $π_{i} = π_{1 i} + (1 - π_{1 i}) π_{2 i | 1}$ to compute the Horvitz-Thompson estimator of the form

${\hat{Y}}_{r, HT} = \sum_{i \in A} \frac{1}{π_{i}} y_{i} . (A .1)$

Note that, instead of (A.1), we can consider the following class of estimators:

${\hat{Y}}_{w} = W \sum_{i \in A_{1}} \frac{1}{π_{1 i}} y_{i} + (1 - W) \sum_{i \in A_{2}} \frac{1}{π_{2 i | 1} (1 - π_{1 i})} y_{i} : = W {\hat{Y}}_{1} + (1 - W) {\hat{Y}}_{2} . (A .2)$

Since ${\hat{Y}}_{1}$ and ${\hat{Y}}_{2}$ are both unbiased for $Y,$ ${\hat{Y}}_{w}$ is also unbiased regardless of the choice of $W .$ A reasonable choice of $W$ is $W = n_{1} / n .$

Under simple random sampling in both designs, the two estimators are equal to $\hat{Y} = N {\bar{y}}_{n},$ where ${\bar{y}}_{n}$ is the sample mean of $y$ in $A .$ Writing ${\bar{y}}_{1} = n_{1}^{- 1} \sum_{i \in A_{1}} y_{i}$ and ${\bar{y}}_{2} = \sum_{i \in A_{2}} y_{i} / n_{2},$ we have

${\bar{y}}_{n} = W {\bar{y}}_{1} + (1 - W) {\bar{y}}_{2} (A .3)$

where $W = n_{1} / n .$ Using

$\begin{array}{l} \begin{array}{l} V ({\bar{y}}_{1}) & = & (\frac{1}{n_{1}} - \frac{1}{N}) S_{y}^{2} (A .4) \\ V ({\bar{y}}_{2}) & = & (\frac{1}{n_{2}} - \frac{1}{N}) S_{y}^{2} \\ Cov ({\bar{y}}_{1}, {\bar{y}}_{2}) = & Cov ({\bar{y}}_{1}, {\bar{y}}_{1}^{c}) & = & - \frac{n_{1}}{N - n_{1}} (\frac{1}{n_{1}} - \frac{1}{N}) S_{y}^{2} = - \frac{1}{N} S_{y}^{2}, \end{array} \end{array}$

where ${\bar{y}}_{1}^{c} = \sum_{i \in A_{1}^{c}} y_{i} / (N - n_{1}),$ we have, for $W = n_{1} / n,$

$V ({\bar{y}}_{n}) = (\frac{1}{n} - \frac{1}{N}) S_{y}^{2} . (A .5)$

Also,

$Cov ({\bar{y}}_{1}, {\bar{y}}_{n}) = Cov [{\bar{y}}_{1}, W {\bar{y}}_{1} + (1 - W) {\bar{y}}_{2}] = (\frac{1}{n} - \frac{1}{N}) S_{y}^{2} . (A .6)$

If $W = n_{1} / n$ does not hold, then (A.5) and (A.6) do not hold.

In the KLF application in Section 5, since $x$ and $y$ are measuring the same item, we may assume $S_{x}^{2} = S_{y}^{2} = S_{x y}$ and the variance-covariance matrix of the sampling errors can be smoothed as

$V (a_{h}, b_{h}) = (\begin{matrix} n_{1}^{- 1} & n^{- 1} \\ n^{- 1} & n^{- 1} \end{matrix}) S_{y}^{2} .$

References

Battese, G.E., Harter, R.M. and Fuller, W.A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83, 28-36.

Carroll, R.J., Rupert, D. and Stefanski, L.A. (1995). Measurement error in nonlinear models. New York: Chapman & Hall.

Fay, R.E., and Herriot, R.A. (1979). Estimation of income from small places: An application of James-Stein procedures to census data. Journal of the American Statistical Association, 74, 269-277.

Fuller, W.A. (1987). Measurement error models. New York: John Wiley & Sons, Inc.

Fuller, W.A. (1991). Small area estimation as a measurement error problem. In Economic Models, Estimation, and Socioeconomic Systems: Essays in Honor of Karl A. Fox, (Eds., Tij K. Kaul and Jati K. Sengupta), Elsevier Science Publishers, 333-352.

Fuller, W.A. (2009). Sampling Statistics. John Wiley & Sons, Inc., Hoboken, NJ.

Jiang, J., Lahiri, P. and Wan, S. (2002). A unified jackknife theory for empirical best prediction with M-estimation. Annals of Statistics, 30, 1782-1810.

Kackar, R.N., and Harville, D.A. (1984). Approximations for standard errors of estimators of fixed and random effects in mixed linear models. Journal of the American Statistical Association, 79, 853-862.

Kim, J.K., and Rao, J.N.K. (2012). Combining data from two independent surveys: A model-assisted approach. Biometrika, 99, 85-100.

Lohr, S.L., and Prasad, N.G.N. (2003). Small area estimation with auxiliary survey data. The Canadian Journal of Statistics, 31, 383-396.

Manzi, G., Spiegelhalter, D.J., Turner, R.M., Flowers, J. and Thompson, S.G. (2011). Modelling bias in combining small area prevalence estimates from multiple surveys. Journal of the Royal Statistical Society A, 174, 31-50.

Merkouris, T. (2010). Combining information from multiple surveys by using regression for efficient small domain estimation. Journal of the Royal Statistical Society B, 68, 509-521.

Pfeffermann, D. (2002). Small area estimation - New developments and directions. International Statistical Review, 70, 125-144.

Quenouille, M.H. (1956). Notes on bias in estimation. Biometrika, 43, 353-360.

Raghunathan, T.E., Xie, D., Schenker, N., Parsons, V.I., Davis, W.W., Dodd, K.W. and Feuer, E.J. (2007). Combining information from two surveys to estimate county-level prevalence rates of cancer risk factors and screening. Journal of the American Statistical Association, 102, 474-486.

Rao, J.N.K. (2003). Small Area Estimation. John Wiley & Sons, Inc., Hoboken, NJ.

Schafer, D.W. (2001). Semiparametric maximum likelihood for measurement error model regression. Biometrics, 57, 53-61.

Ybarra, L.M.R., and Lohr, S.L. (2008). Small area estimation when auxiliary information is measured with error. Biometrika, 95, 919-931.

Date modified:: 2015-11-27

Language selection

Search and menus

Search

Publications

Survey Methodology