A method to find an efficient and robust sampling strategy under model uncertainty
Section 6. Numerical examples
In Sections 2 and 3
we have established that the strategy
is optimal
under a superpopulation model, but it is not robust to misspecifications of
this model. In Subsection 6.1 we present a small Monte Carlo simulation study
carried out to illustrate these results by comparing the optimal strategy and
three alternatives.
In Sections 4 and 5
we introduced a measure that allows for quantifying the risk of implementing a
sampling design, so allowing to guide the choice of design. In Subsection
6.2 we illustrate the use of the risk measure with real survey data.
6.1 Simulation study under a misspecified model
We compare the efficiency
and robustness of four strategies through a simulation study. The strategies to
be compared are
together with
the difference estimator (which is optimal when the model is correct),
together with
the GREG estimator (optimal design), stratified simple random sampling (STSI)
together with the difference estimator (optimal estimator) and STSI together
with the GREG estimator.
Our implementation of
makes use of
Pareto
(Rosén, 1997).
There is a host of other schemes for drawing
samples.
Nevertheless, Pareto
is a convenient
method with good properties, see for example Rosén (2000).
Our implementation of
STSI makes use of model-based stratification (Wright, 1983). We consider
5 strata with
boundaries defined using Dalenius and Hodges (1959)
-rule on
which is well
described in (Särndal et al., 1992, page 463) and the sample is
allocated using Neyman allocation,
. Using the
-rule may be
suboptimal (see Särndal et al., 1992, page 464) but the efficiency of
stratification by a continuous size variable is fairly insensitive to the exact
choice of boundaries.
We consider only
misspecification of the spread. The trend term is of the form
with
1,000,
1 and
0.75, 1 and 1.25.
The true spread is
with
0.5, 0.75 and 1.
The working spread is
with
0.5, 0.75 and 1.
We will use the
difference estimator (2.1) calibrated on
Regarding the
GREG estimator, we will fix
whereas the
coefficients
and
will be
estimated.
The simulation is set out
as follows. The population size is
5,000. The
-values are
independent realizations from a gamma distribution with shape
and scale
1,200 plus one
unit, whereas
is a
realization from a gamma distribution with shape and scale
where
was set in such a way that the correlation
between
and
is
0.95. The design MSE of a sample of size
500 is then computed for each strategy.
Holding the
-values fixed, the process is iterated
5,000 times.
Table 6.1 shows the
results of the simulation study. The first three columns indicate the model
parameters. The fourth column shows the (simulated) model expected MSE of the
strategy
dif, whereas the last three columns show the
(simulated) efficiency of the strategies
GREG, STSI
dif and STSI
GREG compared to
dif (as a percentage), with efficiency defined
as
where the model
expected MSEs are approximated by their simulated counterparts,
in such a way that a value of 100 indicates that the strategy is as
efficient as
dif and values smaller
(larger) than 100 indicate that the strategy is less (more) efficient than
dif.
The upper part of Table 6.1
shows the case when the working model coincides with the true model. As
expected, the strategy that couples
with the
difference estimator
dif) was always more efficient than the
remaining strategies. Nevertheless, the loss in efficiency due to estimating
some parameters through the GREG estimator is negligible. On the other hand, there
is a remarkable loss in efficiency due to the use of STSI instead of
. Finally, it is noted from (2.6) that as the
anticipated MSE for all strategies does not depend on the trend
but only on the
spread
the efficiency
remains constant under the same value of
independently
of the value of
Table 6.1
Efficiency of three strategies as a percentage of the model expected MSE of
)– dif
Table summary
This table displays the results of Efficiency of three strategies as a percentage of the model expected MSE of
– dif
,
– dif,
– dif,
– GREG, STSI – dif and STSI – GREG (appearing as column headers).
|
|
|
|
– dif |
– GREG |
STSI – dif |
STSI – GREG |
| Correct model |
0.75 |
0.50 |
0.50 |
2.78 . 105 |
99.9 |
57.3 |
57.3 |
| 0.75 |
0.75 |
0.75 |
4.82 . 104 |
99.6 |
77.9 |
77.9 |
| 0.75 |
1.00 |
1.00 |
1.90 . 104 |
99.0 |
83.2 |
83.2 |
| 1.00 |
0.50 |
0.50 |
7.64 . 106 |
99.9 |
57.3 |
57.3 |
| 1.00 |
0.75 |
0.75 |
7.20 . 105 |
99.7 |
77.9 |
77.9 |
| 1.00 |
1.00 |
1.00 |
2.14 . 105 |
99.1 |
83.1 |
83.1 |
| 1.25 |
0.50 |
0.50 |
1.46 . 108 |
99.9 |
57.3 |
57.3 |
| 1.25 |
0.75 |
0.75 |
7.85 . 106 |
99.7 |
77.9 |
78.0 |
| 1.25 |
1.00 |
1.00 |
1.81 . 106 |
99.2 |
83.1 |
83.1 |
| Misspecified model |
0.75 |
0.50 |
0.75 |
3.98 . 105 |
99.9 |
98.9 |
98.9 |
| 0.75 |
0.75 |
1.00 |
6.45 . 104 |
99.5 |
114.5 |
114.4 |
| 0.75 |
1.00 |
0.50 |
4.73 . 104 |
100.1 |
133.9 |
134.0 |
| 1.00 |
0.50 |
1.00 |
2.14 . 107 |
99.9 |
185.6 |
185.6 |
| 1.00 |
0.75 |
0.50 |
1.03 . 106 |
100.1 |
93.1 |
93.2 |
| 1.00 |
1.00 |
0.75 |
2.77 . 105 |
99.8 |
88.9 |
89.0 |
| 1.25 |
0.50 |
0.75 |
2.09 . 108 |
99.9 |
98.9 |
98.9 |
| 1.25 |
0.75 |
1.00 |
1.05 . 107 |
99.6 |
114.5 |
114.5 |
| 1.25 |
1.00 |
0.50 |
4.50 . 106 |
100.3 |
134.0 |
134.2 |
The lower part of
Table 6.1 shows some comparisons under a misspecified model, in
particular, a misspecified spread. It can be noted that even under this mild
misspecification of the model,
dif is not necessarily the best strategy anymore
as the strategies using STSI were more efficient in several cases. However, it
is not evident when will STSI be more efficient than
or vice versa.
The risk measure introduced in Section 4 can be used to guide the choice
between designs. The results shown in this section agree with those shown by
for example Holmberg and Swensson (2001).
6.2 Using the risk measure for choosing the design
in a real survey
In this subsection we
illustrate the implementation of the risk measure using data from a real
survey. We want to estimate
where
is the set of
residential properties in Bogotá, Colombia (of size
681,276) and
is the value of
the
property in
2017 in COP.
the built-up
area of the
property in
square meters, is known for every
The auxiliary
variable
has mean 184,
standard deviation 110 and skewness 2.57. The desired sample size is
1,000.
We assume that a model of
the type
with
and
adequately
describes the association between
and
We plan to use
the GREG estimator for estimating
and
i.e.,
As this model
has the form shown in Example 4, the model expected MSE can be
approximated by expression (5.7).
We will use the risk
(4.1) in order to assist the decision between
or STSI using
strata. We take
as a bivariate
normal distribution with no correlation between
and
The integral is
approximated using package
(Narasimhan,
Johnson, Hahn, Bouvier and Kiêu, 2019) developed for the statistical software
environment
(R Core Team,
2020).
We consider two cases
with different degrees of confidence regarding the working model.
Case 1. In this case no information
about
or
is available. Naive values of
1,
1 and
0.75 are considered. In order to reflect
the uncertainty,
should have a large variance, therefore we set
The variance was chosen in such a way that 99% of the mass lies in the
circle of radius 1. Evaluation of (4.1) yields
and
suggesting that a stratified design should be
used.
The design MSE of both
strategies is computed and we get,
and
The strategy
suggested by (4.1) was indeed the best choice.
Case 2. Using a sample from 2010,
prior values of
1.9,
2 and
0.7 are proposed. As the uncertainty here
is smaller than that in Case 1, we set a smaller variance,
The variance was chosen in such a way that 99% of the mass lies in the
circle of radius 0.75. Evaluation of (4.1) yields
and
suggesting that a stratified design should be used.
The design MSE of both strategies
is computed and we get
and
Note that the
use of (4.1) prevented us from using
whose MSE is
almost one thousand times bigger than the one under stratified sampling!
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, 2021
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa