Linearization versus bootstrap for variance estimation of the change between Gini indexes
Section 4. Simulation study
In
this section, five artificial populations are first generated as described in
Section 4.1. In Section 4.2, the union estimator is compared with the
intersection estimator in terms of asymptotic variance. A Monte Carlo
experiment is then presented in Section 4.3, and the performances of the
linearization and the bootstrap are compared in case of a SI2 sampling design.
A similar comparison is made in Section 4.4, in case of the bi-dimensional
two-stage sampling design.
4.1 Simulation set-up
We
generated 5 finite populations of size
40,000, each containing
two study variables
and
The
values and the
values were generated according to the
lognormal model
The
were generated according to a standard normal
distribution. The values of the Gini coefficients for the five populations are
presented in Table 4.1.
Table 4.1
Gini coefficients for 5 populations
Table summary
This table displays the results of Gini coefficients for 5 populations . The information is grouped by Population (appearing as row headers), Pop. 1 , Pop. 2 , Pop. 3 , Pop. 4 and Pop. 5 (appearing as column headers).
| Population |
Pop. 1 |
Pop. 2 |
Pop. 3 |
Pop. 4 |
Pop. 5 |
|
0.249 |
0.298 |
0.348 |
0.397 |
0.447 |
|
0.259 |
0.318 |
0.378 |
0.437 |
0.496 |
|
|
0.010 |
0.020 |
0.030 |
0.040 |
0.049 |
In
each of the 5 populations, the units were grouped into
500 clusters of equal
size
80. The clusters were
built so that the intra-cluster correlation coefficient with respect to the
variable
was approximately equal to 0.20 in each
population.
4.2 Comparison of the union estimator and of the
intersection estimator
In
this section, we compare the union estimator with the intersection estimator
for the change between Gini indexes in terms of asymptotic variance. We
consider two sampling designs: the SI2 design presented in Section 3.1.1
with
(1,000; 1,000; 1,000),
(1,000; 2,000; 1,000) or (1,000; 4,000; 1,000); the MULT2
design presented in Section 3.1.2 with
300 and
(10; 10; 10), (10; 20; 10)
or (10; 40; 10).
For
each population, we compute the asymptotic variance
of the union estimator, and the asymptotic
variance
of the intersection estimator. So as to
compare them, we compute the relative efficiency defined as
with
the optimal estimator.
The
results are presented in Table 4.2. The union estimator is highly
inefficient. Its asymptotic variance is 15 to 244 times higher than that of the
intersection estimator for SI2, and 2 to 44 times higher than that of the
intersection estimator for MULT2. The difference between both estimators tends
to decrease when the sample size of the common sample increases and/or when
increases. On the other hand, the intersection
estimator is slightly less efficient than the optimal estimator for SI2, with
RE ranging from 1.33 to 2.46, and approximately as efficient as the optimal
estimator for MULT2, with RE ranging from 1.02 to 1.12. This supports the
heuristic reasoning in Section 3.1.1. In view of the poor performance of
the union estimator, and of the good performance of the intersection estimator,
we confine our attention to the latter in the remainder of the simulation
study.
Table 4.2
Relative efficiency of the union estimator and of the intersection variance estimator for 5 populations
Table summary
This table displays the results of Relative efficiency of the union estimator and of the intersection variance estimator for 5 populations . The information is grouped by Design (appearing as row headers), Sample size, Pop. 1, Pop. 2, Pop. 3, Pop. 4 and Pop. 5 (appearing as column headers).
| Design |
Sample size |
Pop. 1 |
Pop. 2 |
Pop. 3 |
Pop. 4 |
Pop. 5 |
|
|
|
|
|
|
|
|
|
|
| SI2 |
1,000 |
600.22 |
2.46 |
200.23 |
2.27 |
96.72 |
2.10 |
58.73 |
1.96 |
39.35 |
1.85 |
| 2,000 |
410.23 |
1.84 |
141.71 |
1.76 |
70.71 |
1.68 |
44.18 |
1.61 |
30.33 |
1.54 |
| 4,000 |
250.02 |
1.47 |
88.40 |
1.43 |
45.17 |
1.40 |
28.86 |
1.36 |
20.23 |
1.33 |
| MULT2 |
10 |
49.10 |
1.12 |
19.89 |
1.13 |
11.83 |
1.14 |
8.84 |
1.15 |
7.28 |
1.16 |
| 20 |
23.08 |
1.05 |
9.75 |
1.05 |
6.08 |
1.05 |
4.73 |
1.06 |
4.04 |
1.07 |
| 40 |
9.15 |
1.02 |
4.25 |
1.02 |
2.90 |
1.02 |
2.41 |
1.02 |
2.16 |
1.02 |
4.3 Comparison of linearization and bootstrap for
the SI2 design
In
this section, we compare the linearization and bootstrap for variance
estimation and for producing confidence intervals, in case of the intersection
estimator for the change between Gini indexes under the SI2 sampling design.
From each population, we selected
10,000 two-dimensional
samples by means of the SI2 design indexed by
(1,000; 1,000; 1,000),
(1,000; 2,000; 1,000)
or
(1,000; 4,000; 1,000).
In each sample, we computed the intersection estimator
of the change between Gini indexes. For this
estimator, we computed (i) the linearization variance estimator
given in (3.24), and (ii) the Bootstrap
variance estimator
following the Bootstrap procedure described in
Section 3.4.1.
To
measure the bias of a variance estimator
we used the Monte Carlo Percent Relative Bias
where
denotes the estimator
in the
sample, and
is a simulation-based approximation of the
true mean square error of
obtained from an independent run of 100,000
simulations. As a measure of stability of
we used the Relative Stability
Finally, we compared the coverage rates of (i) the normality-based
confidence interval with use of the linearization variance estimator and (ii)
the confidence interval associated to the percentile Bootstrap. The bootstrap
variance estimators and the bootstrap confidence intervals are based on
1,000 bootstrap
replications. Error rates of the confidence intervals (with nominal one-tailed
error rate of 2.5% in each tail) are compared. The comparison with nominal
error rate of 5% gave no qualitative difference and is thus omitted.
The
results are presented in Table 4.3. Both variance estimators are
negatively biased. This bias is moderate (less than 5% ) in most cases, except
for the smaller sample size
1,000, and for the
population
with the highest value of
The bootstrap variance estimator is
systematically slightly more biased than the linearization variance estimator,
but the difference decreases as the sample size increases. For both variance
estimators, the instability increases with
The Bootstrap variance estimator is slightly
more stable for the smaller sample size
1,000, but the situation
is reversed when the sample size increases. Turning to the coverage of the
confidence intervals, both methods lead to under-coverage which is consistent
with the negative bias of both variance estimators. The normality-based
confidence intervals show a slightly better coverage than the bootstrap
percentile confidence intervals. For both confidence intervals, the
under-coverage is more acute when
increases, and reduces when the sample size
increases.
Table 4.3
Relative Bias, Relative Stability and Nominal One-Tailed Error Rates for linearization and Bootstrap variance estimation of the intersection estimator of the change between Gini indexes for 5 populations and with the SI2 sampling design
Table summary
This table displays the results of Relative Bias. The information is grouped by Pop. (appearing as row headers), Linearization and Bootstrap, calculated using Sample size (1,000; 1,000; 1,000), Sample size (1,000; 2,000; 1,000) and Sample size (1,000; 4,000; 1,000) units of measure (appearing as column headers).
| Pop. |
Linearization |
Bootstrap |
| RB |
RS |
L |
U |
L+U |
RB |
RS |
L |
U |
L+U |
|
Sample size (1,000; 1,000; 1,000) |
| Pop. 1 |
-1.41 |
24.6 |
1.8 |
4.5 |
6.3 |
-1.83 |
24.6 |
1.8 |
4.9 |
6.7 |
| Pop. 2 |
-1.98 |
32.4 |
1.6 |
5.2 |
6.8 |
-2.64 |
32.1 |
1.7 |
5.9 |
7.6 |
| Pop. 3 |
-2.80 |
41.9 |
1.3 |
6.3 |
7.7 |
-3.83 |
40.9 |
1.3 |
7.0 |
8.3 |
| Pop. 4 |
-4.00 |
52.5 |
1.0 |
7.7 |
8.7 |
-5.57 |
50.6 |
1.1 |
8.2 |
9.3 |
| Pop. 5 |
-5.80 |
64.0 |
1.0 |
9.2 |
10.1 |
-8.11 |
60.6 |
0.8 |
9.9 |
10.7 |
|
Sample size (1,000; 2,000; 1,000) |
| Pop. 1 |
-1.38 |
17.3 |
1.6 |
3.7 |
5.3 |
-1.67 |
17.8 |
1.8 |
4.1 |
5.9 |
| Pop. 2 |
-1.64 |
23.0 |
1.4 |
4.3 |
5.8 |
-2.05 |
23.2 |
1.4 |
4.7 |
6.1 |
| Pop. 3 |
-1.99 |
30.1 |
1.2 |
5.0 |
6.2 |
-2.58 |
30.0 |
1.1 |
5.3 |
6.4 |
| Pop. 4 |
-2.50 |
38.4 |
1.0 |
6.0 |
6.9 |
-3.38 |
37.9 |
1.0 |
6.3 |
7.3 |
| Pop. 5 |
-3.30 |
47.9 |
0.7 |
7.2 |
7.9 |
-4.62 |
46.7 |
0.7 |
7.5 |
8.2 |
|
Sample size (1,000; 4,000; 1,000) |
| Pop. 1 |
-0.60 |
11.9 |
2.0 |
3.4 |
5.3 |
-0.68 |
12.8 |
2.1 |
3.4 |
5.5 |
| Pop. 2 |
-0.67 |
15.9 |
1.8 |
3.7 |
5.6 |
-0.80 |
16.5 |
2.0 |
3.9 |
5.9 |
| Pop. 3 |
-0.83 |
20.8 |
1.8 |
4.4 |
6.2 |
-1.03 |
21.3 |
1.9 |
4.4 |
6.3 |
| Pop. 4 |
-1.13 |
26.7 |
1.5 |
5.0 |
6.6 |
-1.46 |
26.9 |
1.6 |
5.0 |
6.6 |
| Pop. 5 |
-1.64 |
33.4 |
1.4 |
5.8 |
7.1 |
-2.18 |
33.5 |
1.4 |
5.8 |
7.1 |
4.4 Comparison of linearization and bootstrap for
the MULT2 design
In
this section, we compare the linearization and bootstrap for variance
estimation and for producing confidence intervals, in case of the intersection
estimator for the change between Gini indexes under the MULT2 sampling design
presented in Section 3.1.2. From each population, we selected
10,000 two-dimensional
two-stage samples by means of the MULT2 design indexed by
300 and
(10; 10; 10), (10; 20; 10)
or (10; 40; 10). In each sample, we computed the intersection
estimator
of the change between Gini indexes. For this
estimator, we computed (i) the linearization variance estimator
given in (3.26), and (ii) the Bootstrap
variance estimator
following the Bootstrap procedure described in
Section 3.4.2.
To
measure the bias of a variance estimator
we used the Monte Carlo Percent Relative Bias
defined in equation (4.3), and the Relative Stability defined in equation (4.4).
The true mean square error of
was obtained from an independent run of 100,000
simulations. Also, we compared the coverage rates of (i) the normality-based
confidence interval with use of the linearization variance estimator and (ii)
the confidence interval associated to the percentile Bootstrap. The bootstrap
variance estimators and the bootstrap confidence intervals are based on
1,000 bootstrap
replications. Error rates of the confidence intervals (with nominal one-tailed
error rate of 2.5% in each tail) are compared. The comparison with nominal
error rate of 5% gave no qualitative difference and is thus omitted.
The
results are presented in Table 4.4. Both variance estimators are
approximately unbiased for small values of
but show a moderate negative bias which
increases with
The bootstrap variance estimator is more
biased than the linearization variance estimator. For both variance estimators,
the instability increases with
The Bootstrap variance estimator is slightly
more stable than the linearization variance estimator. Both methods lead to an
under-coverage which is consistent with the negative bias of both variance
estimators. The normality-based confidence intervals perform slightly better.
For both confidence intervals, the under-coverage is more acute when
increases, and reduces when the sample size
increases.
Table 4.4
Relative Bias, Relative Stability and Nominal One-Tailed Error Rates for linearization and Bootstrap variance estimation of the intersection estimator of the Gini Coefficient Change for 5 populations and with the MULT2 sampling design
Table summary
This table displays the results of Relative Bias. The information is grouped by Pop. (appearing as row headers), Linearization and Bootstrap, calculated using Sample sizes 300 and (10; 10; 10), Sample sizes 300 and (10; 20; 10) and Sample sizes
300 and (10; 40; 10) units of measure (appearing as column headers).
| Pop. |
Linearization |
Bootstrap |
| RB |
RS |
L |
U |
L+U |
RB |
RS |
L |
U |
L+U |
|
Sample sizes 300 and (10; 10; 10) |
| Pop. 1 |
1.23 |
33.8 |
0.6 |
4.9 |
5.5 |
1.09 |
33.2 |
0.6 |
6.0 |
6.6 |
| Pop. 2 |
0.64 |
41.1 |
0.8 |
5.5 |
6.3 |
-0.20 |
39.7 |
0.6 |
6.5 |
7.1 |
| Pop. 3 |
-0.42 |
48.7 |
0.7 |
7.1 |
7.8 |
-2.05 |
46.6 |
0.7 |
8.4 |
9.1 |
| Pop. 4 |
-2.07 |
56.4 |
0.8 |
8.4 |
9.2 |
-4.47 |
53.3 |
0.6 |
9.6 |
10.2 |
| Pop. 5 |
-4.44 |
63.7 |
0.9 |
9.2 |
10.1 |
-7.56 |
59.5 |
0.4 |
10.3 |
10.7 |
|
Sample sizes 300 and (10; 20; 10) |
| Pop. 1 |
1.70 |
32.6 |
1.5 |
4.9 |
6.4 |
-1.70 |
32.3 |
1.5 |
6.0 |
7.5 |
| Pop. 2 |
1.10 |
39.0 |
1.4 |
5.4 |
6.8 |
-1.91 |
38.3 |
1.5 |
6.9 |
8.4 |
| Pop. 3 |
0.17 |
45.6 |
1.2 |
7.4 |
8.6 |
-2.49 |
44.4 |
1.1 |
7.7 |
8.8 |
| Pop. 4 |
-1.17 |
52.0 |
1.0 |
9.0 |
10.0 |
-3.58 |
50.3 |
0.8 |
9.7 |
10.5 |
| Pop. 5 |
-3.03 |
57.9 |
0.9 |
10.4 |
11.3 |
-5.35 |
55.4 |
0.7 |
11.0 |
11.7 |
|
Sample sizes 300 and (10; 40; 10) |
| Pop. 1 |
-0.99 |
32.1 |
1.2 |
6.1 |
7.3 |
-3.21 |
32.2 |
1.7 |
6.7 |
8.4 |
| Pop. 2 |
-1.68 |
38.3 |
1.4 |
6.7 |
8.1 |
-3.70 |
38.3 |
1.4 |
7.6 |
9.0 |
| Pop. 3 |
-2.58 |
44.6 |
1.3 |
7.5 |
8.8 |
-4.40 |
44.5 |
1.2 |
8.9 |
10.1 |
| Pop. 4 |
-3.78 |
50.6 |
1.1 |
8.9 |
10.0 |
-5.50 |
50.1 |
0.9 |
10.6 |
11.5 |
| Pop. 5 |
-5.39 |
55.9 |
0.8 |
10.9 |
11.7 |
-7.16 |
54.8 |
0.6 |
12.8 |
13.4 |
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, 2018
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa