Linearization versus bootstrap for variance estimation of the change between Gini indexes
Section 4. Simulation study

Table of contents

In this section, five artificial populations are first generated as described in Section 4.1. In Section 4.2, the union estimator is compared with the intersection estimator in terms of asymptotic variance. A Monte Carlo experiment is then presented in Section 4.3, and the performances of the linearization and the bootstrap are compared in case of a SI2 sampling design. A similar comparison is made in Section 4.4, in case of the bi-dimensional two-stage sampling design.

4.1 Simulation set-up

We generated 5 finite populations of size $N =$ 40,000, each containing two study variables $y_{1}$ and $y_{2} .$ The $y_{1 k}$ values and the $y_{2 k}$ values were generated according to the lognormal model

$y_{d k} = \exp (α_{d} ε_{k}) . (4.1)$

The $ε_{k} ’ s$ were generated according to a standard normal distribution. The values of the Gini coefficients for the five populations are presented in Table 4.1.

Table 4.1
Gini coefficients for 5 populations
Table summary
This table displays the results of Gini coefficients for 5 populations . The information is grouped by Population (appearing as row headers), Pop. 1 , Pop. 2 , Pop. 3 , Pop. 4 and Pop. 5 (appearing as column headers).
Population	Pop. 1	Pop. 2	Pop. 3	Pop. 4	Pop. 5
$G_{1}$	0.249	0.298	0.348	0.397	0.447
$G_{2}$	0.259	0.318	0.378	0.437	0.496
$Δ G$	0.010	0.020	0.030	0.040	0.049

In each of the 5 populations, the units were grouped into $M =$ 500 clusters of equal size $N_{0} =$ 80. The clusters were built so that the intra-cluster correlation coefficient with respect to the variable $y_{1}$ was approximately equal to 0.20 in each population.

4.2 Comparison of the union estimator and of the intersection estimator

In this section, we compare the union estimator with the intersection estimator for the change between Gini indexes in terms of asymptotic variance. We consider two sampling designs: the SI2 design presented in Section 3.1.1 with $(n_{1 •} , n_{3} , n_{2 •}) =$ (1,000; 1,000; 1,000), (1,000; 2,000; 1,000) or (1,000; 4,000; 1,000); the MULT2 design presented in Section 3.1.2 with $m =$ 300 and $(n_{1 •}^{i} , n_{3}^{i} , n_{2 •}^{i}) =$ (10; 10; 10), (10; 20; 10) or (10; 40; 10).

For each population, we compute the asymptotic variance $V_{lin} ({\hat{Δ G}}^{uni})$ of the union estimator, and the asymptotic variance $V_{lin} ({\hat{Δ G}}^{int})$ of the intersection estimator. So as to compare them, we compute the relative efficiency defined as

$RE {{\hat{Δ G}}^{\cdot}} = \frac{V_{lin} {{\hat{Δ G}}^{(\cdot)}}}{V_{lin} {{\hat{Δ G}}^{opt}}}, (4.2)$

with ${\hat{Δ G}}^{opt}$ the optimal estimator.

The results are presented in Table 4.2. The union estimator is highly inefficient. Its asymptotic variance is 15 to 244 times higher than that of the intersection estimator for SI2, and 2 to 44 times higher than that of the intersection estimator for MULT2. The difference between both estimators tends to decrease when the sample size of the common sample increases and/or when $Δ G$ increases. On the other hand, the intersection estimator is slightly less efficient than the optimal estimator for SI2, with RE ranging from 1.33 to 2.46, and approximately as efficient as the optimal estimator for MULT2, with RE ranging from 1.02 to 1.12. This supports the heuristic reasoning in Section 3.1.1. In view of the poor performance of the union estimator, and of the good performance of the intersection estimator, we confine our attention to the latter in the remainder of the simulation study.

Table 4.2
Relative efficiency of the union estimator and of the intersection variance estimator for 5 populations
Table summary
This table displays the results of Relative efficiency of the union estimator and of the intersection variance estimator for 5 populations . The information is grouped by Design (appearing as row headers), Sample size, Pop. 1, Pop. 2, Pop. 3, Pop. 4 and Pop. 5 (appearing as column headers).
Design	Sample size	Pop. 1		Pop. 2		Pop. 3		Pop. 4		Pop. 5
Design	Sample size	${\hat{Δ G}}^{uni}$	${\hat{Δ G}}^{int}$	${\hat{Δ G}}^{uni}$	${\hat{Δ G}}^{int}$	${\hat{Δ G}}^{uni}$	${\hat{Δ G}}^{int}$	${\hat{Δ G}}^{uni}$	${\hat{Δ G}}^{int}$	${\hat{Δ G}}^{uni}$	${\hat{Δ G}}^{int}$
SI2	$n_{3} =$ 1,000	600.22	2.46	200.23	2.27	96.72	2.10	58.73	1.96	39.35	1.85
	$n_{3} =$ 2,000	410.23	1.84	141.71	1.76	70.71	1.68	44.18	1.61	30.33	1.54
	$n_{3} =$ 4,000	250.02	1.47	88.40	1.43	45.17	1.40	28.86	1.36	20.23	1.33
MULT2	$n_{3}^{i} =$ 10	49.10	1.12	19.89	1.13	11.83	1.14	8.84	1.15	7.28	1.16
	$n_{3}^{i} =$ 20	23.08	1.05	9.75	1.05	6.08	1.05	4.73	1.06	4.04	1.07
	$n_{3}^{i} =$ 40	9.15	1.02	4.25	1.02	2.90	1.02	2.41	1.02	2.16	1.02

4.3 Comparison of linearization and bootstrap for the SI2 design

In this section, we compare the linearization and bootstrap for variance estimation and for producing confidence intervals, in case of the intersection estimator for the change between Gini indexes under the SI2 sampling design. From each population, we selected $B =$ 10,000 two-dimensional samples by means of the SI2 design indexed by $(n_{1 •} , n_{3} , n_{2 •}) =$ (1,000; 1,000; 1,000), $(n_{1 •} , n_{3} , n_{2 •}) =$ (1,000; 2,000; 1,000) or $(n_{1 •} , n_{3} , n_{2 •}) =$ (1,000; 4,000; 1,000). In each sample, we computed the intersection estimator ${\hat{Δ G}}^{int}$ of the change between Gini indexes. For this estimator, we computed (i) the linearization variance estimator $v_{int} ({\hat{Δ G}}^{int})$ given in (3.24), and (ii) the Bootstrap variance estimator $v_{BWO} (\hat{Δ G}),$ following the Bootstrap procedure described in Section 3.4.1.

To measure the bias of a variance estimator $v (\hat{Δ G}),$ we used the Monte Carlo Percent Relative Bias

$RB {v (\hat{Δ G})} = 100 \times \frac{B^{- 1} \sum_{b =1}^{B} v ({\hat{Δ G}}_{b}) - MSE (\hat{Δ G})}{MSE (\hat{Δ G})}, (4.3)$

where $v ({\hat{Δ G}}_{b})$ denotes the estimator $v (\hat{Δ G})$ in the $b^{th}$ sample, and $MSE (\hat{Δ G})$ is a simulation-based approximation of the true mean square error of $\hat{Δ G},$ obtained from an independent run of 100,000 simulations. As a measure of stability of $v (\hat{Δ G}),$ we used the Relative Stability

$RS {v (\hat{Δ G})} = \frac{{[B^{- 1} {\sum_{b =1}^{B} {v (\hat{Δ G}) - MSE (\hat{Δ G})}}^{2}]}^{1 / 2}}{MSE (\hat{Δ G})} . (4.4)$

Finally, we compared the coverage rates of (i) the normality-based confidence interval with use of the linearization variance estimator and (ii) the confidence interval associated to the percentile Bootstrap. The bootstrap variance estimators and the bootstrap confidence intervals are based on $C =$ 1,000 bootstrap replications. Error rates of the confidence intervals (with nominal one-tailed error rate of 2.5% in each tail) are compared. The comparison with nominal error rate of 5% gave no qualitative difference and is thus omitted.

The results are presented in Table 4.3. Both variance estimators are negatively biased. This bias is moderate (less than 5% ) in most cases, except for the smaller sample size $n =$ 1,000, and for the population $U_{5}$ with the highest value of $Δ G .$ The bootstrap variance estimator is systematically slightly more biased than the linearization variance estimator, but the difference decreases as the sample size increases. For both variance estimators, the instability increases with $Δ G .$ The Bootstrap variance estimator is slightly more stable for the smaller sample size $n =$ 1,000, but the situation is reversed when the sample size increases. Turning to the coverage of the confidence intervals, both methods lead to under-coverage which is consistent with the negative bias of both variance estimators. The normality-based confidence intervals show a slightly better coverage than the bootstrap percentile confidence intervals. For both confidence intervals, the under-coverage is more acute when $Δ G$ increases, and reduces when the sample size increases.

Table 4.3
Relative Bias, Relative Stability and Nominal One-Tailed Error Rates for linearization and Bootstrap variance estimation of the intersection estimator of the change between Gini indexes for 5 populations and with the SI2 sampling design
Table summary
This table displays the results of Relative Bias. The information is grouped by Pop. (appearing as row headers), Linearization and Bootstrap, calculated using Sample size $(n_{1 •} , n_{3} , n_{2 •}) =$ (1,000; 1,000; 1,000), Sample size $(n_{1 •} , n_{3} , n_{2 •}) =$ (1,000; 2,000; 1,000) and Sample size $(n_{1 •} , n_{3} , n_{2 •}) =$ (1,000; 4,000; 1,000) units of measure (appearing as column headers).
Pop.	Linearization					Bootstrap
Pop.	RB	RS	L	U	L+U	RB	RS	L	U	L+U
	Sample size $(n_{1 •} , n_{3} , n_{2 •}) =$ (1,000; 1,000; 1,000)
Pop. 1	-1.41	24.6	1.8	4.5	6.3	-1.83	24.6	1.8	4.9	6.7
Pop. 2	-1.98	32.4	1.6	5.2	6.8	-2.64	32.1	1.7	5.9	7.6
Pop. 3	-2.80	41.9	1.3	6.3	7.7	-3.83	40.9	1.3	7.0	8.3
Pop. 4	-4.00	52.5	1.0	7.7	8.7	-5.57	50.6	1.1	8.2	9.3
Pop. 5	-5.80	64.0	1.0	9.2	10.1	-8.11	60.6	0.8	9.9	10.7
	Sample size $(n_{1 •} , n_{3} , n_{2 •}) =$ (1,000; 2,000; 1,000)
Pop. 1	-1.38	17.3	1.6	3.7	5.3	-1.67	17.8	1.8	4.1	5.9
Pop. 2	-1.64	23.0	1.4	4.3	5.8	-2.05	23.2	1.4	4.7	6.1
Pop. 3	-1.99	30.1	1.2	5.0	6.2	-2.58	30.0	1.1	5.3	6.4
Pop. 4	-2.50	38.4	1.0	6.0	6.9	-3.38	37.9	1.0	6.3	7.3
Pop. 5	-3.30	47.9	0.7	7.2	7.9	-4.62	46.7	0.7	7.5	8.2
	Sample size $(n_{1 •} , n_{3} , n_{2 •}) =$ (1,000; 4,000; 1,000)
Pop. 1	-0.60	11.9	2.0	3.4	5.3	-0.68	12.8	2.1	3.4	5.5
Pop. 2	-0.67	15.9	1.8	3.7	5.6	-0.80	16.5	2.0	3.9	5.9
Pop. 3	-0.83	20.8	1.8	4.4	6.2	-1.03	21.3	1.9	4.4	6.3
Pop. 4	-1.13	26.7	1.5	5.0	6.6	-1.46	26.9	1.6	5.0	6.6
Pop. 5	-1.64	33.4	1.4	5.8	7.1	-2.18	33.5	1.4	5.8	7.1

4.4 Comparison of linearization and bootstrap for the MULT2 design

In this section, we compare the linearization and bootstrap for variance estimation and for producing confidence intervals, in case of the intersection estimator for the change between Gini indexes under the MULT2 sampling design presented in Section 3.1.2. From each population, we selected $B =$ 10,000 two-dimensional two-stage samples by means of the MULT2 design indexed by $m =$ 300 and $(n_{1 •}^{i} , n_{3}^{i} , n_{2 •}^{i}) =$ (10; 10; 10), (10; 20; 10) or (10; 40; 10). In each sample, we computed the intersection estimator ${\hat{Δ G}}^{int}$ of the change between Gini indexes. For this estimator, we computed (i) the linearization variance estimator $v^{HH} {{\hat{Δ G}}^{co} (a, b)}$ given in (3.26), and (ii) the Bootstrap variance estimator $v_{BWR} ({\hat{Δ G}}^{int}),$ following the Bootstrap procedure described in Section 3.4.2.

To measure the bias of a variance estimator $v (\hat{Δ G}),$ we used the Monte Carlo Percent Relative Bias defined in equation (4.3), and the Relative Stability defined in equation (4.4). The true mean square error of $\hat{Δ G}$ was obtained from an independent run of 100,000 simulations. Also, we compared the coverage rates of (i) the normality-based confidence interval with use of the linearization variance estimator and (ii) the confidence interval associated to the percentile Bootstrap. The bootstrap variance estimators and the bootstrap confidence intervals are based on $C =$ 1,000 bootstrap replications. Error rates of the confidence intervals (with nominal one-tailed error rate of 2.5% in each tail) are compared. The comparison with nominal error rate of 5% gave no qualitative difference and is thus omitted.

The results are presented in Table 4.4. Both variance estimators are approximately unbiased for small values of $Δ G,$ but show a moderate negative bias which increases with $Δ G .$ The bootstrap variance estimator is more biased than the linearization variance estimator. For both variance estimators, the instability increases with $Δ G .$ The Bootstrap variance estimator is slightly more stable than the linearization variance estimator. Both methods lead to an under-coverage which is consistent with the negative bias of both variance estimators. The normality-based confidence intervals perform slightly better. For both confidence intervals, the under-coverage is more acute when $Δ G$ increases, and reduces when the sample size increases.

Table 4.4
Relative Bias, Relative Stability and Nominal One-Tailed Error Rates for linearization and Bootstrap variance estimation of the intersection estimator of the Gini Coefficient Change for 5 populations and with the MULT2 sampling design
Table summary
This table displays the results of Relative Bias. The information is grouped by Pop. (appearing as row headers), Linearization and Bootstrap, calculated using Sample sizes $m =$ 300 and $(n_{1 •}^{i} , n_{3}^{i} , n_{2 •}^{i}) =$ (10; 10; 10), Sample sizes $m =$ 300 and $(n_{1 •}^{i} , n_{3}^{i} , n_{2 •}^{i}) =$ (10; 20; 10) and Sample sizes $m =$ 300 and $(n_{1 •}^{i} , n_{3}^{i} , n_{2 •}^{i}) =$ (10; 40; 10) units of measure (appearing as column headers).
Pop.	Linearization					Bootstrap
Pop.	RB	RS	L	U	L+U	RB	RS	L	U	L+U
	Sample sizes $m =$ 300 and $(n_{1 •}^{i} , n_{3}^{i} , n_{2 •}^{i}) =$ (10; 10; 10)
Pop. 1	1.23	33.8	0.6	4.9	5.5	1.09	33.2	0.6	6.0	6.6
Pop. 2	0.64	41.1	0.8	5.5	6.3	-0.20	39.7	0.6	6.5	7.1
Pop. 3	-0.42	48.7	0.7	7.1	7.8	-2.05	46.6	0.7	8.4	9.1
Pop. 4	-2.07	56.4	0.8	8.4	9.2	-4.47	53.3	0.6	9.6	10.2
Pop. 5	-4.44	63.7	0.9	9.2	10.1	-7.56	59.5	0.4	10.3	10.7
	Sample sizes $m =$ 300 and $(n_{1 •}^{i} , n_{3}^{i} , n_{2 •}^{i}) =$ (10; 20; 10)
Pop. 1	1.70	32.6	1.5	4.9	6.4	-1.70	32.3	1.5	6.0	7.5
Pop. 2	1.10	39.0	1.4	5.4	6.8	-1.91	38.3	1.5	6.9	8.4
Pop. 3	0.17	45.6	1.2	7.4	8.6	-2.49	44.4	1.1	7.7	8.8
Pop. 4	-1.17	52.0	1.0	9.0	10.0	-3.58	50.3	0.8	9.7	10.5
Pop. 5	-3.03	57.9	0.9	10.4	11.3	-5.35	55.4	0.7	11.0	11.7
	Sample sizes $m =$ 300 and $(n_{1 •}^{i} , n_{3}^{i} , n_{2 •}^{i}) =$ (10; 40; 10)
Pop. 1	-0.99	32.1	1.2	6.1	7.3	-3.21	32.2	1.7	6.7	8.4
Pop. 2	-1.68	38.3	1.4	6.7	8.1	-3.70	38.3	1.4	7.6	9.0
Pop. 3	-2.58	44.6	1.3	7.5	8.8	-4.40	44.5	1.2	8.9	10.1
Pop. 4	-3.78	50.6	1.1	8.9	10.0	-5.50	50.1	0.9	10.6	11.5
Pop. 5	-5.39	55.9	0.8	10.9	11.7	-7.16	54.8	0.6	12.8	13.4

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2018-06-21

Language selection

Search and menus

Search

Linearization versus bootstrap for variance estimation of the change between Gini indexes
Section 4. Simulation study

4.1 Simulation set-up

4.2 Comparison of the union estimator and of the intersection estimator

4.3 Comparison of linearization and bootstrap for the SI2 design

4.4 Comparison of linearization and bootstrap for the MULT2 design

Linearization versus bootstrap for variance estimation of the change between Gini indexes Section 4. Simulation study

4.1 Simulation set-up

4.2 Comparison of the union estimator and of the intersection estimator

4.3 Comparison of linearization and bootstrap for the SI2 design

4.4 Comparison of linearization and bootstrap for the MULT2 design

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Linearization versus bootstrap for variance estimation of the change between Gini indexes
Section 4. Simulation study