Sample empirical likelihood approach under complex survey design with scrambled responses
Section 6. Conclusions
In this paper, we proposed a sample empirical
likelihood (SEL)-based approach using scrambled responses to protect the
confidentiality of complex survey data. The proposed SEL approach is easy to
implement in practice and can be used as a general tool for statistical
disclosure control. The idea of our proposed approach is to replace the true
values by some scrambled values through random device, then the existing sample
empirical likelihood approach can be applied with scrambled values to obtain
the point estimation. However, the variance estimation and confidence interval
estimation are different from that by treating the scrambled values as true
values since we need to incorporate the randomness due to random device in the
statistical inference. Such theoretical properties have been investigated and
verified through simulation study and real data application. The SEL
outperforms traditional approaches, such as HJ, by improving coverage rates and
reducing the coverage lengths of confidence intervals. Chen and Kim (2014) has
compared Wald-type CI and Wilk-type CI in the simulation studies by using sample
empirical likelihood method. In general, the Wilk-type confidence intervals
show better coverage properties than the Wald-type confidence intervals in
terms of coverage rates. We would expect similar results by using our proposed
approaches here. In future research, we will extend the proposed approach to
estimate more general parameters, such as population quantiles and distribution
functions. The corresponding statistical computational tools, such as R
package, will also be developed.
Acknowledgements
Dr. Sixia Chen was
partially supported by the Oklahoma Shared Clinical and Translational Resources
(U54GM104938) with an Institutional Development Award (IDeA) from National
Institute of General Medical Sciences. The content is solely the responsibility
of the authors and does not necessarily represent the official views of the
National Institutes of Health. The
research of Yichuan Zhao was supported by the National Security Agency (NSA)
Grant (H98230-12-1-0209) and the National Science Foundation Grant
(DMS-1613176).
Appendix
A. Regularity conditions
We present the regularity conditions needed for proving Theorem 1 to
Theorem 3 as following:
(C1).
for
with
(C2).
as
and
where
(C3).
as
and
where
(C4).
and
are bounded.
(C5).
and
B. Sketched proof of Theorem 1
can be written as the solution of estimating
equation
where
Under the assumptions that
converges to
uniformly,
and because of
it can be shown that
By using a Taylor expansion,
After some algebra, it can be shown that
Because
According to (B.1), (B.2), and after some algebra, we can show that
where
is defined in equation (2.4). Under the
regularity conditions in Fuller and Isaki (1981), the asymptotic normality can
be derived.
C. Sketched proof of Theorem 2
Define
and
Then,
and
are the solutions of
By using techniques similar to those of Chen
and Kim (2014), it can be shown that
and
Then, by using Taylor expansion, we have
and
According to (C.1), (C.2), and after some algebra, it can be shown that
and
where
is defined in Theorem 2. Because
where
is defined in Theorem 1,
is defined in Theorem 2 and
After some algebra, we can show that
with
defined in Theorem 2. Furthermore, under
the regularity conditions in Fuller and Isaki (1981), we obtain the asymptotic
normality.
D. Sketched proof of Theorem 3
Because
and by using a
Taylor expansion of
at
and (C.3), we
have
with
We now consider to maximize
subject to the following constraints
and
where
The above constraints are equivalent with the original
constraints (3.2) and (3.3). Define
. Therefore, by using a
similar argument, we have
provided
According to (D.1), (D.4), and after some algebra, we have
Therefore, Theorem 3 is proven.
References
Bar-Lev, S.K., Bobovitch,
E. and Boukai, B. (2004). A note on randomized response models for quantitative
data. Metrika, 60, 255-260.
Berger, Y.G. (2018a).
Empirical likelihood approaches in survey sampling. The Survey Statistician, 78, 22-31.
Berger, Y.G. (2018b). An
empirical likelihood approach under cluster sampling with missing observations. Annals of the Institute of Statistical Mathematics,
doi:10.1007/s10463-018-0681-x.
Berger, Y.G., and Torres,
O.D.L.R. (2016). An empirical likelihood approach for inference under complex
sampling design. Journal of the Royal Statistical Society, Series B, 78(2), 319-341.
Chen, S., and Kim, J.K.
(2014). Population empirical likelihood for nonparametric inference in survey
sampling. Statistica Sinica, 24,
335-355.
Cochran, W.G. (1977). Sampling
Techniques, 3rd Ed. New York: John Wiley & Sons, Inc.
Eichhorn, B.H., and
Hayre, L.S. (1983). Scrambled randomized response methods for obtaining
sensitive quantitative data. Journal of Statistical Planning and Inference, 7, 307-316.
Fienberg, S.E., and
McIntyre, J. (2005). Data swapping: Variations on a theme by Dalenius and
Reiss. Journal of Official Statistics, 21, 309-323.
Fox, J.A., and Tracy,
P.E. (1986). Randomized Response: A Method for Sensitive Surveys.
Beverly Hills, CA: Sage.
Fuller, W.A. (2009). Sampling
Statistics. Hoboken, NJ: John Wiley & Sons, Inc.
Fuller, W.A., and Isaki,
C.T. (1981). Survey design under superpopulation models. In Current Topics
in Survey Sampling, (Eds., D. Krewski, J.N.K. Rao, and
R. Platek). New York: Academic Press, 199-226.
Gouweleeuw, J.M.,
Kooiman, P., Willenborg, L.C.R. and Wolf, P. (1998). Post randomization for
statistical disclosure control: Theory and implementation. Journal of
Official Statistics, 14, 463-478.
Hájek, J. (1971). Comment
on “An essay on the logical foundations
of survey sampling, Part one”. In The Foundations of Survey Sampling, (Eds.,
V.P. Godambe and D.A. Sprott), Holt, Rinehart, and Winston, 236.
Hartley, H.O., and Rao,
J.N.K. (1968). A new estimation theory for sample surveys. Biometrika, 55, 547-557.
Horvitz, D.G., Shah, B.V.
and Simmons, W.R. (1967). The unrelated question randomized response model. In Proceedings
of the Social Statistics Section, American
Statistical Association, 65-72.
Hundepool, A.,
Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E., Spicer, K. and
Wolf, P. (2012). Statistical Disclosure Control, Wiley Series In Survey
Methodology.
Kim, J.K., and Yang, S.
(2017). A note on multiple imputation under informative sampling. Biometrika, 104, 221-228.
Kish, L. (1965). Survey
Sampling. New York: John Wiley & Sons, Inc.
Krenzke, T., Li, J.,
Freedman, M., Judkins, D., Hubble, D., Roisman, R. and Larsen, M. (2011).
Producing transportation data products from the American Community Survey that
comply with disclosure rules. Washington, DC: National Cooperative Highway
Research Program, Transportation Research Board, National Academy of Sciences.
Meng, X.L. (1994).
Multiple-imputation inferences with uncongenial sources of input (with
discussion). Statistical Science,
9, 538-573.
Montanari, G.E. (1987).
Post-sampling efficient Q-R prediction in large-sample surveys. International
Statistical Review, 55, 191-202.
Owen, A.B. (1988).
Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75, 237-249.
Owen, A.B. (2001). Empirical
Likelihood. New York: Chapman and Hall.
Qin, J., and Lawless, J.
(1994). Empirical likelihood and general estimating equations. The Annals of
Statistics, 22, 300-325.
Raghunathan, T.E.,
Reiter, J.P. and Rubin, D.B. (2003). Multiple imputation for statistical
disclosure limitation. Journal of Official Statistics, 19, 1-16.
Raghunathan, T.E.,
Lepkowski, J.M., van Hoewyk, J., and Solenberger, P. (2001). A
multivariate technique for multiply imputing missing values using a sequence of
regression models. Survey Methodology, 27, 1, 85-95. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2001001/article/5857-eng.pdf.
Rao, J.N.K. (1994).
Estimating totals and distribution functions using auxiliary information at the
estimation stage. Journal of Official Statistics, 10, 153-165.
Saha, A. (2011). An
optional scrambled randomized response technique for practical surveys. Metrika, 73, 139-149.
Singh, S., and Kim, J.M.
(2011). A pseudo-empirical log-likelihood estimator using scrambled responses. Statistics
and Probability Letters, 81,
345-351.
Tracy, D.S., and Mangat,
N.S. (1996). Some developments in randomized response sampling during the last
decade-a follow up of review by Chaudhuri and Mukerjee. Journal of Applied
Statistical Science, 4, 147-158.
Warner, S.L. (1965).
Randomized response: A survey technique for eliminating evasive answer bias. Journal
of the American Statistical Association, 60, 63-69.
Wu, C., and Rao, J.N.K.
(2006). Pseudo-empirical likelihood ratio confidence intervals for complex
surveys. Canadian Journal of Statistics, 34, 359-375.