On a new estimator for the variance of the ratio estimator with small sample corrections
Section 3. A simulation study
3.1 Set-up and main results
In this section we
apply the above results to eleven populations. Populations 1-5 are taken from
Cochran (1977, pages 152, 182, 203, 325), populations 6 and 7 from
Sukhatme (1954, pages 183-184), population 8 from Kish (1995, page 42)
and populations 9-11 are taken from Koop (1968). The population sizes vary
between 10 and 49. The correlation coefficients between
and
vary
between 0.32 and 0.98, while the coefficients of variation of
vary
between 0.14 and 1.19. For further details, see Table 3.1.
We considered simple random samples without
replacement of sizes
from
these populations (excluding cases where
For each
population, we simulated all
possible
samples of size
provided
that this number is not larger than one million. When
we
restricted ourselves to drawing one million random samples of size
from the
population. From these simulated samples, we computed (an accurate estimate of)
the true mean square error of
for a
given population and a given sample size, to be used as a benchmark.
For each sample, we calculated the standard
variance estimator for
say
based on
(1.2) with
replaced
by
This
estimator is also the standard estimator of the mean square error of
say
with an
error of order
Furthermore, we calculated the new estimators
and
for the
mean square error of
from
(2.11) and (2.14). It is expected that these estimators are more accurate than
the standard estimator, as they have an error of order
Table 3.1
Key features of the eleven populations used in the simulation study
Table summary
This table displays the results of Key features of the eleven populations used in the simulation study Source and (équation) (appearing as column headers).
|
Source |
|
|
|
|
|
|
|
|
|
| 1 |
Cochran, page 152 |
49 |
128 |
103 |
1.24 |
621 |
1.01 |
0.98 |
-0.34 |
0.02 |
| 2 |
Cochran, page 182 |
34 |
2.91 |
8.37 |
0.35 |
5.72 |
1.03 |
0.72 |
-0.24 |
0.56 |
| 3 |
Cochran, page 182 |
34 |
2.59 |
4.92 |
0.53 |
4.81 |
1.02 |
0.73 |
-0.14 |
0.38 |
| 4 |
Cochran, page 203 |
10 |
54.3 |
56.9 |
0.95 |
6.71 |
0.17 |
0.97 |
0.38 |
-0.01 |
| 5 |
Cochran, page 325 |
10 |
101 |
58.8 |
1.72 |
150 |
0.14 |
0.65 |
-0.29 |
-0.29 |
| 6 |
Sukhatme, pages 183-184 |
34 |
201 |
218 |
0.92 |
3,304 |
0.77 |
0.93 |
-0.23 |
0.93 |
| 7 |
Sukhatme, pages 183-184 |
34 |
218 |
765 |
0.29 |
8,735 |
0.62 |
0.83 |
0.05 |
0.44 |
| 8 |
Kish, page 42 |
20 |
12.8 |
21.8 |
0.59 |
17.8 |
1.19 |
0.97 |
0.23 |
0.75 |
| 9 |
Koop, population 1 |
20 |
4.40 |
6.30 |
0.70 |
0.41 |
0.67 |
0.98 |
-0.06 |
0.50 |
| 10 |
Koop, population 2 |
20 |
4.50 |
51.2 |
0.09 |
4.87 |
0.44 |
0.42 |
-0.50 |
-0.85 |
| 11 |
Koop, population 3 |
20 |
15.6 |
30.0 |
0.52 |
36.3 |
0.40 |
0.32 |
-0.88 |
0.11 |
To compare the accuracy of these three
estimators, we evaluated their relative bias with respect to the benchmark
value for the true mean square error of
The mean square error
consists
of
and
For all
populations in this study we found that, in spite of the small sample sizes,
the bias of
as an
estimator for
was more
or less negligible. In fact, the largest relative bias of
always
occurred for
and
varied between -4% and +4%. In other words, in this study the true and
estimated mean square errors were dominated by their variance components.
Table 3.2 gives the results. Firstly, it
is seen that the standard estimator
usually
underestimates the true mean square error. The negative bias of this estimator
can be very large (up to more than -60% for population 8). Secondly, it is
striking that for the three populations in Koop’s paper (populations 9-11),
always
estimates the true MSE of
with a
relative bias of less than 5%. For the other populations, the relative bias is
always less than 7% except for populations 1, 6 and 8 with
and
For
is
always more accurate than
and in
fact this is also true for most cases with
For
nearly
always performs better than
which
shows that correcting for the bias in
is
useful. Furthermore, it can be seen from Table 3.2 that, in general,
suffers
much less from a negative bias than
while
suffers
from a positive bias.
Table 3.2
Relative bias
for the three estimators of
Table summary
This table displays the results of Relative bias
for the three estimators of
. The information is grouped by population (appearing as row headers), estimator,
,
,
,
,
and
(appearing as column headers).
| population |
estimator |
|
|
|
|
|
|
| 1 |
|
-48.2% |
-35.6% |
-27.1% |
-21.6% |
-17.2% |
-14.2% |
|
|
27.4% |
15.8% |
10.9% |
7.7% |
6.3% |
5.1% |
|
|
-30.9% |
-11.7% |
-5.6% |
-3.5% |
-2.1% |
-1.4% |
| 2 |
|
-34.9% |
-27.7% |
-22.3% |
-18.7% |
-16.1% |
-13.6% |
|
|
32.6% |
10.1% |
3.3% |
0.5% |
-0.9% |
-0.9% |
|
|
2.8% |
3.4% |
1.7% |
0.4% |
-0.5% |
-0.5% |
| 3 |
|
-37.2% |
-28.4% |
-22.4% |
-17.9% |
-14.4% |
-11.6% |
|
|
26.1% |
7.7% |
2.6% |
1.0% |
0.6% |
0.7% |
|
|
-2.8% |
-0.6% |
-1.3% |
-1.3% |
-1.1% |
-0.6% |
| 4 |
|
-1.0% |
-0.4% |
-0.1% |
This is an empty cell |
This is an empty cell |
This is an empty cell |
|
|
1.4% |
0.5% |
0.2% |
This is an empty cell |
This is an empty cell |
This is an empty cell |
|
|
0.7% |
0.3% |
0.1% |
This is an empty cell |
This is an empty cell |
This is an empty cell |
| 5 |
|
0.4% |
0.7% |
0.8% |
This is an empty cell |
This is an empty cell |
This is an empty cell |
|
|
2.0% |
1.0% |
0.5% |
This is an empty cell |
This is an empty cell |
This is an empty cell |
|
|
0.8% |
0.4% |
0.2% |
This is an empty cell |
This is an empty cell |
This is an empty cell |
| 6 |
|
-19.2% |
-17.3% |
-15.8% |
-14.7% |
-14.1% |
-13.5% |
|
|
21.1% |
0.8% |
-5.4% |
-7.4% |
-7.9% |
-7.8% |
|
|
20.6% |
10.2% |
4.9% |
2.3% |
0.7% |
-0.3% |
| 7 |
|
-17.8% |
-12.0% |
-8.7% |
-6.7% |
-5.3% |
-4.3% |
|
|
4.9% |
0.3% |
-0.1% |
0.0% |
0.0% |
0.0% |
|
|
0.0% |
-0.6% |
-0.5% |
-0.3% |
-0.3% |
-0.2% |
| 8 |
|
-62.3% |
-45.8% |
-34.9% |
-28.0% |
-23.4% |
-20.3% |
|
|
-11.1% |
-8.2% |
-6.5% |
-5.7% |
-5.3% |
-4.8% |
|
|
-34.4% |
-13.3% |
-6.4% |
-4.0% |
-3.3% |
-3.2% |
| 9 |
|
-20.1% |
-13.2% |
-9.7% |
-7.6% |
-6.2% |
-5.2% |
|
|
7.4% |
1.0% |
-0.5% |
-0.8% |
-0.8% |
-0.7% |
|
|
0.4% |
0.1% |
-0.2% |
-0.3% |
-0.4% |
-0.4% |
| 10 |
|
-8.9% |
-2.0% |
0.9% |
2.5% |
3.5% |
4.2% |
|
|
21.1% |
15.4% |
10.9% |
7.7% |
5.4% |
3.7% |
|
|
0.9% |
2.1% |
2.0% |
1.7% |
1.4% |
1.1% |
| 11 |
|
-17.5% |
-10.1% |
-6.5% |
-4.4% |
-3.0% |
-2.1% |
|
|
3.4% |
3.0% |
2.3% |
1.7% |
1.2% |
0.8% |
|
|
-4.3% |
-1.2% |
-0.3% |
0.0% |
0.0% |
0.1% |
| mean |
|
-24.2% |
-17.4% |
-13.3% |
-13.0% |
-10.7% |
-8.9% |
|
|
12.4% |
4.3% |
1.7% |
0.5% |
-0.1% |
-0.4% |
|
|
-4.2% |
-1.0% |
-0.5% |
-0.6% |
-0.6% |
-0.6% |
3.2 Discussion of two
specific results
Referring back to
Table 3.1, it may be noted that both populations 1 and 8, where the
largest relative negative errors occur for
involve
a strong correlation
in
combination with a relatively large value of
in
comparison to the other populations in our study
and
It is
therefore interesting to examine the effect of these quantities on the accuracy
of the estimated mean square error more closely.
Firstly, suppose
that the following transformation is applied to the values of
and
in a
given population:
with
Under
this transformation, the ratio of the two variables does not change
but
their correlation coefficient does
unless
It is
obvious that
and
Now
using expressions (1.2), (2.8), (2.11) and (2.14), it is not difficult to see
that
for all
Moreover, it can be seen from (2.1) that the
error in
is linear in
and
hence it follows that the identity
holds
exactly. Thus, it is seen that this transformation has no effect on the
relative bias
of any
of the mean square error estimators in this study. This suggests that this bias
is not affected by a change in the correlation
when
other features of the population remain constant. In particular, this suggests
that the large values of
in
populations 1 and 8 alone do not explain the lack of accuracy of
in these
populations.
Secondly, consider
the following alternative transformation:
with
In this
case, it can be shown that
and
Thus,
this transformation can be used to reduce the coefficient of variation of
in a
given population, while holding the ratio and correlation of
and
fixed.
We have applied
this transformation to populations 1 and 8 for
with
Table 3.3
shows the resulting relative bias of
for the
transformed populations, obtained by simulating all
and
possible
samples, respectively. It is seen that all three estimators for the mean square
error tend to become less biased as the coefficient of variation of
is
reduced. In particular,
becomes
reasonably accurate (considering that
once the
coefficient of variation of
drops
below 0.8 for population 1 and below 1 for population 8.
This suggests that
the value of
which is known in practice
is an important factor for the (negative) bias
of our proposed estimator
Assuming
that the set of natural populations in this simulation study contains
sufficient variation to represent most populations that will be encountered in
practice, we may tentatively conclude that even for
is an
accurate estimator of the mean square error of the ratio estimator without a
large negative bias when
For
this
need not be the case.
Table 3.3
Relative bias
for transformed versions of populations 1 and 8, with
Table summary
This table displays the results of Relative bias
for transformed versions of populations 1 and 8. The information is grouped by
(appearing as row headers), Population 1, Population 8,
and relative bias (appearing as column headers).
|
|
Population 1 |
Population 8 |
|
|
relative bias |
|
relative bias |
|
|
|
|
|
|
|
| 1.0 |
1.01 |
-48.2% |
27.4% |
-30.9% |
1.19 |
-62.3% |
-11.1% |
-34.4% |
| 0.9 |
0.91 |
-39.1% |
32.0% |
-16.5% |
1.07 |
-48.4% |
7.6% |
-12.9% |
| 0.8 |
0.81 |
-31.0% |
31.8% |
-6.2% |
0.95 |
-38.3% |
14.2% |
-0.7% |
| 0.7 |
0.71 |
-24.0% |
28.5% |
0.3% |
0.83 |
-30.0% |
15.0% |
5.9% |
| 0.6 |
0.61 |
-17.8% |
23.4% |
3.6% |
0.72 |
-23.1% |
12.5% |
8.4% |
| 0.5 |
0.51 |
-12.5% |
17.6% |
4.6% |
0.60 |
-17.2% |
8.6% |
8.0% |
| 0.4 |
0.40 |
-8.2% |
11.9% |
4.1% |
0.48 |
-12.3% |
4.2% |
6.0% |
| 0.3 |
0.30 |
-4.7% |
6.8% |
2.8% |
0.36 |
-8.1% |
0.4% |
3.5% |
| 0.2 |
0.20 |
-2.1% |
3.0% |
1.4% |
0.24 |
-4.7% |
-1.9% |
1.2% |
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, 2019
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa