Linearization versus bootstrap for variance estimation of the change between Gini indexes
Section 5. Conclusion
In
this paper, we considered the estimation of the change between Gini indexes. We
presented the class of composite estimators introduced by Goga, Deville and
Ruiz-Gazen (2009), and studied more particularly the intersection estimator
which makes use of the common sample only, and the union estimator which makes
use of the whole available samples. We justified both heuristically and through
the simulation study in Section 4.2 that the intersection estimator can be
close to the optimal estimator, while the union estimator exhibits poor
performances in all the scenarios considered. The intersection estimator is
also easy to compute, while the optimal estimator involves unknown quantities
which need to be estimated in practice. We therefore advocate for the use of
the intersection estimator for estimating the change between Gini indexes.
We
also compared linearization and bootstrap for variance estimation and for
producing confidence intervals. In the scenarios that we considered in the
simulation study, the linearization performed better with usually smaller
relative biases for the variance estimator, and better coverage rates with
normality-based confidence intervals than with percentile confidence intervals.
Bootstrap
confidence intervals (not considered in the simulation study)
would be a competitor of interest, but due to the intensive computational work
involved, they are less attractive for a data user. Linearization has also the
advantage to offer a unified approach suitable for any sampling design, while a
specific sampling design usually requires a specific bootstrap procedure, as
illustrated with the BWO for SI sampling and the BWR for multistage sampling.
From
the simulation study, we note that the coverage rates may not be well respected
neither with linearization nor bootstrap, particularly in the multistage
context and even with large sample sizes. There is a need for confidence
intervals with better coverage rates under a reasonable computational burden.
This is a matter for further research.
Acknowledgements
We
thank Anne Ruiz-Gazen for helpful discussion. We also thank two referees and an
Associate Editor for useful comments and suggestions which led to a significant
improvement of the paper.
Appendix
Proof of equation (3.6)
From
(3.3), we have
where
and
This leads to
We
compute the elements in
separately. We have
Also, since
we have
and
Similar
arguments lead to
Finally,
we consider
We first compute
which may be written as
Similar
arguments lead to
We obtain
In summary, we
obtain
which, along with (A.1), leads to (3.6).
References
Antal, E., and Tillé, Y.
(2011). A direct bootstrap method for complex sampling designs from a finite
population. Journal of the American Statistical Association, 106, 534-543.
Barrett, G.F., and
Donald, S.G. (2009). Statistical inference with generalized Gini indices of
inequality, poverty, and welfare. Journal of Business and Economic
Statistics, 27, 1-17.
Beaumont, J.-F., and
Patak, Z. (2012). On the generalized bootstrap for sample surveys with special
attention to Poisson sampling. International Statistical Review, 80, 127-148.
Berger, Y.G. (2004).
Variance estimation for measures of change in probability sampling. Canadian
Journal of Statistics, 32,
451-467.
Berger, Y.G. (2008). A note
on the asymptotic equivalence of jackknife and linearization variance estimation
for the Gini coefficient. Journal of Official Statistics, 24, 541-555.
Bertail, P., and Combris, P. (1997). Bootstrap généralisé d’un sondage. Annales
d’Économie et de Statistique, 46,
49-83.
Bhattacharya, D. (2007).
Inference on inequality from household survey data. Journal of Econometrics,
137, 674-707.
Bickel, P.J., and
Freedman, D.A. (1984). Asymptotic normality and the bootstrap in stratified
sampling. The Annals of Statistics, 12, 470-482.
Booth, J.G., Butler, R.W.
and Hall, P. (1994). Bootstrap methods for finite populations. Journal of
the American Statistical Association, 89, 1282-1289.
Brändén, P., and
Jonasson, J. (2012). Negative dependence in sampling. Scandinavian Journal
of Statistics, 39, 830-838.
Campbell, C. (1980). A different
view of finite population estimation. Proceedings of the Survey Research
Methods Section, American
Statistical Association, 319-324.
Chao, M.-T., and Lo,
S.-H. (1985). A Bootstrap method for finite population. Sankhyā, Series A, 47, 3, 399-405.
Chauvet, G. (2007). Méthodes de Bootstrap en population finie. Ph.D. dissertation, Université Rennes 2.
Chauvet, G. (2015).
Coupling methods for multistage sampling. The Annals of Statistics, 43(6),
2484-2506.
Chen, J., and Rao, J.N.K.
(2007). Asymptotic normality under two-phase sampling designs. Statistica
Sinica, 17, 1047-1064.
Davison, A.C., and
Hinkley, D.V. (1997). Bootstrap Methods
and their Application. Cambridge University
Press.
Davison, A.C., and Sardy,
S. (2007). Resampling variance estimation in surveys with missing data. Journal
of Official Statistics, 23, 3,
371-386.
Demnati, A., and Rao,
J.N.K. (2004). Linearization variance estimators for survey data. Survey
Methodology, 30, 1, 17-26.
Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2004001/article/6991-eng.pdf.
Deville, J.-C. (1997). Estimation de la variance du coefficient de Gini
mesurée par sondage. Actes des Journées de Méthodologie
Statistique, Insee Méthodes.
Deville, J.-C. (1999). Variance
estimation for complex statistics and estimators: Linearization and residual
techniques. Survey Methodology, 25,
2, 193-203. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/1999002/article/4882-eng.pdf.
Druckman, A., and
Jackson, T. (2008). Measuring resource inequalities: The concepts and
methodology for an area-based Gini coefficient. Ecological Economics, 65, 242-252.
Gini, C. (1914). Sulla misura della concentrazione e della variabilità dei
caratteri. Atti del Reale Istituto
Veneto di Scienze Lettere ed Arti.
Glasser, G.J. (1962).
Variance formulas for the mean difference and coefficient of concentration. Journal of the American
Statistical Association, 57, 648-654.
Goga, C. (2003). Estimation de la variance dans les sondages à plusieurs
échantillons et prise en compte de l’information auxiliaire par des modèles
nonparamétriques. Ph.D. dissertation,
Université Rennes 2.
Goga, C., and Ruiz-Gazen, A. (2014). Efficient estimation of nonlinear finite population parameters using
nonparametrics. Journal of the Royal Statistical Society B, 76, 113-140.
Goga, C., Deville, J.-C.
and Ruiz-Gazen, A. (2009). Composite estimation and linearization method for
two-sample survey data. Biometrika, 96, 691-709.
Gordon, L. (1983).
Successive sampling in large finite populations. Annals of Statistics, 11, 702-706.
Graczyk, P.P. (2007).
Gini coefficient: A new way to express selectivity of kinase inhibitors against
a family of Kinases. Journal of Medicinal Chemistry, 50, 5773-5779.
Gross, S.T. (1980).
Median estimation in sample surveys. ASA Proceedings of Survey Research,
181-184.
Groves-Kirkby, C.J.,
Denman, A.R. and Phillips, P.S. (2009). Lorenz Curve and Gini coefficient:
Novel tools for analysing seasonal variation of environmental radon gas. Journal of Environmental Management, 90, 2480-2487.
Hájek, J. (1960).
Limiting distributions in simple random sampling from a finite population. Tud. Akad. Mat. Kutatò Int. Közl., 5,
361-374.
Hájek, J. (1961). Some extensions
of the Wald-Wolfowitz-Noether theorem. Annals of Mathematical Statistics,
32, 506-523.
Hájek, J. (1964).
Asymptotic theory of rejective sampling with varying probabilities from a
finite population. Annals of Mathematical Statistics, 35, 1491-1523.
Isaki, C.T., and Fuller,
W.A. (1982). Survey design under the regression superpopulation model. Journal
of the American Statistical Association, 77, 89-96.
Karagiannis, E., and Kovačević, M.S. (2000). A method to calculate the jackknife variance
estimator for the Gini coefficient. Oxford Bulletin of Economics and
Statistics, 62, 119-122.
Kovačević, M.S., and Binder, D.A. (1997). Variance estimation
for measures of income inequality and polarization - The estimating equation approach. Journal of Official Statistics, 13,
41-58.
Krewski, D., and Rao,
J.N.K. (1981). Inference from stratified samples: Properties of the
linearization, jackknife and balanced repeated replication methods. Annals
of Statistics, 9,
1010-1019.
Lai, D., Huang, J.,
Risser, J.M. and Kapadia, A.S. (2008). Statistical properties of generalized
Gini coefficient with application to health inequality measurement. Social
Indicator Research, 87,
249-258.
Langel, M., and Tillé, Y.
(2013). Variance estimation of the Gini index: Revisiting a result several
times published. Journal of the Royal Statistical Society, Series A, 176, 521-540.
Lisker, T. (2008). Is the
Gini coefficient a stable measure on galaxy structure? The Astrophysical
Journal Supplement Series, 179,
319-325.
Navarro, V., Muntaner, C.,
Borrell, C., Benach, J., Quiroga, A., Rodríguez-Sanz, M., Vergès, N. and
Pasarín, M.I. (2006). Politics and health outcomes. The Lancet, 18, 1033-1037.
Nygård, F., and Sandström,
A. (1985). The estimation of the Gini and the entropy inequality parameters in
finite populations. Journal of Official Statistics, 1, 4, 399-412.
Ohlsson, E. (1986).
Asymptotic normality of the Rao-Hartley-Cochran estimator: An application of
the martingale CLT. Scandinavian Journal of Statistics, 13, 17-28.
Ohlsson, E. (1989).
Asymptotic normality for two-stage sampling from a finite population. Probability Theory and Related Fields, 81, 341-352.
Pires, A.M., and Branco,
J.A. (2002). Partial influence functions. Journal of Multivariate Analysis, 83, 451-468.
Presnell, B., and Booth, J.G.
(1994). Resampling Methods for Sample Surveys.
Technical report.
Qin, Y., Rao, J.N.K. and
Wu, C. (2010). Empirical likelihood confidence intervals for the Gini measure of
income inequality. Economic Modelling, 27, 1429-1435.
Qualité, L., and Tillé,
Y. (2008). Variance estimation of changes in repeated surveys and its
application to the Swiss survey of value added. Survey Methodology, 34, 2, 173-181. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2008002/article/10758-eng.pdf.
Rao, J.N.K., and Wu,
C.F.J. (1988). Resampling inference with complex survey data. Journal of the
American Statistical Association, 83,
231-241.
Reid, N. (1981).
Influence functions for censored data. The
Annals of Statistics, 9,
78-92.
Rosén, B. (1972).
Asymptotic theory for successive sampling with varying probabilities without
replacement. I, II. Annals of Mathematical Statistics, 43, 373-397, 748-776.
Saegusa, T., and Wellner,
J.A. (2013). Weighted likelihood estimation under two-phase sampling. The Annals of Statistics, 41, 269-295.
Sandström, A., Wretman,
J.H. and Waldèn, B. (1985). Variance estimators of the Gini coefficient -
Simple random sampling. Metron, 43,
41-70.
Sandström, A., Wretman,
J.H. and Waldèn, B. (1988). Variance estimators of the Gini coefficient - Probability
sampling. Journal of Business and Economic Statistics, 6, 113-119.
Särndal, C.-E., Swensson,
B. and Wretman, J.H. (1992). Model
Assisted Survey Sampling. Springer-Verlag.
Sen, P.K. (1980). Limit
theorems for an extended coupon collector’s problem and for successive
subsampling with varying probabilities. Calcutta Statistical Association
Bulletin, 29, 113-132.
Shao, J., and Tu, D.
(1995). The Jackknife and the Bootstrap.
Springer.
Sitter, R.R. (1992a). A
resampling procedure for complex survey data. Journal of the American
Statistical Association, 87,
755-765.
Sitter, R.R. (1992b).
Comparing three bootstrap methods for survey data. Canadian Journal of
Statistics, 20, 135-154.
Tam, S.M. (1984). On
covariances from overlapping samples. The American Statistician, 38, 288-289.
Yitzhaki, S. (1991).
Calculating jackknife variance estimators for parameters of the Gini method. Journal
of Business and Economic Statistics, 9, 235-239.