Coordination of spatially balanced samples Section 5. Empirical results

5.1  Overlap performance

Monte Carlo simulation was used to study the overlap performance of the proposed methods. A number of $\begin{array}{c}m={10}^{4}\end{array}$ runs were considered for each of the four settings described below. In each run, samples were drawn using the proposed methods. The same permanent random numbers were employed for all methods. The Euclidean distance between units was used for all spatial sampling designs. In each run, for LPM with PRNs, a matrix of dimension $N×N$ of PRNs was randomly generated; the diagonal elements of this matrix were used as PRNs for Poisson, SCPS and the transformed SCPS with PRNs. All sampling schemes were applied for positive and negative coordination, respectively, using in each run the same PRNs and the same matrix of distances. Samples ${s}_{1}$ and ${s}_{2}$ of following types were drawn in each run:

• two Poisson samples selected respectively independently, positively coordinated with PRNs, and negatively coordinated with PRNs;
• two LP samples selected respectively independently, positively coordinated with PRNs, and negatively coordinated with PRNs;
• two SCP samples selected respectively independently, positively coordinated with PRNs, and negatively coordinated with PRNs;
• two transformed SCP samples selected respectively independently, positively coordinated with PRNs, and negatively coordinated with PRNs; the two strategies shown in Section 4.2 were employed using respectively $\alpha =$0.25, 0.50 and 0.75.

Three measures were used to quantify the performance of the proposed methods, for positive and negative coordination, respectively:

• the Monte Carlo expected overlap

${E}_{\text{sim}}\left(c\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1}{m}\text{\hspace{0.17em}}\sum _{l=1}^{m}\text{ }{c}_{l}^{1,\text{\hspace{0.17em}}2},$

• ${c}_{l}^{1,\text{\hspace{0.17em}}2}=|\text{\hspace{0.17em}}{s}_{1l}\cap {s}_{2l}\text{\hspace{0.17em}}|,$ and ${s}_{1l}\text{​},$ ${s}_{2l}\text{​},$ are the samples drawn in the ${l}^{\text{th}}$ run, where $|\text{\hspace{0.17em}}{s}_{1l}\cap {s}_{2l}\text{\hspace{0.17em}}|$ represents the number of common units of ${s}_{1l}$ and ${s}_{2l};$
• the Monte Carlo variance of the overlap

${V}_{\text{sim}}\left(c\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1}{m-1}\text{\hspace{0.17em}}\sum _{l=1}^{m}\text{\hspace{0.17em}}{\left({c}_{l}^{1,\text{\hspace{0.17em}}2}\text{\hspace{0.17em}}-\text{\hspace{0.17em}}{E}_{\text{sim}}\left(c\right)\right)}^{2}\text{​};$

• the Monte Carlo coefficient of variation of the overlap

${\text{CV}}_{\text{sim}}\left(c\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{\sqrt{{V}_{\text{sim}}\left(c\right)}}{{E}_{\text{sim}}\left(c\right)}.$

The correlation between ${\pi }_{1}={\left({\pi }_{1i}\right)}_{i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}N}$   and ${\pi }_{2}={\left({\pi }_{2i}\right)}_{i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}N}$   is an important factor of the sample coordination degree. This correlation varies and takes extreme values in the following four settings used to study the performance of the proposed methods:

• the static MU284 population: from the MU284 data set (see Appendix B in Särndal et al., 1992), the region 2 was selected. The population size is $N=48,$ and the expected sample sizes are ${n}_{1}=10,\text{\hspace{0.17em}}{n}_{2}=6,$ respectively. The first-order inclusion probabilities ${\pi }_{i1}$ are computed using the variable P75 (population in 1975 in thousands), and ${\pi }_{i2}$ using the variable P85 (population in 1985 in thousands). The elements of the distance matrix were artificially generated using independent draws from the $N\left(0,1\right)$ distribution and taking their absolute values. The correlation coefficient between ${\pi }_{1}$ and ${\pi }_{2}$ is 0.99.
• the Baltimore data set is about house sales prices and hedonics (see Dubin, 1992). The data set is available on-line at the GeoDa Center for Geospatial Analysis and Computation (2017). Information on $N=211$ houses are provided by 17 variables. The geographical coordinates of the houses are available. We use ${n}_{1}={n}_{2}=25.$ The first-order inclusion probabilities ${\pi }_{i1}$ are computed using the variable AGE (the house age) and ${\pi }_{i2}$ using AGE+5. The elements of the distance matrix are the Euclidean distances between the geographical coordinates on the Maryland grid of the houses included in this data set. The correlation coefficient between ${\pi }_{1}$ and ${\pi }_{2}$ is 1.
• the MU284 dynamic population: from the MU284 data set, the regions 2 and 3 were used. A dynamic population was created using on the first occasion 50% of the units randomly selected from the region 2 using simple random sampling without replacement (these units are the “persistents” and the rest of the 50% of the units are “deaths”), and on the second occasion 50% of the units randomly selected from the region 3 using simple random sampling without replacement (these units are the “births”). The elements of the distance matrix were artificially generated using independent draws from the $N\left(0,1\right)$ distribution and taking their absolute values. For a run, the correlation coefficient between ${\pi }_{1}$ and ${\pi }_{2}$ was 0.08.
• one artificial data set, with $N=100,\text{\hspace{0.17em}}{n}_{1}=10,\text{\hspace{0.17em}}{n}_{2}=25,$ ${\pi }_{1}$ and ${\pi }_{2}$ uncorrelated and randomly generated using independent draws from the $U\left(0,1\right)$ distribution and scaled to obtain the sum 10 and 25, respectively. The elements of the distance matrix were artificially generated using independent draws from the $N\left(0,1\right)$ distribution and taking their absolute values.

A number of ${10}^{4}$ simulation runs was used to compute the Monte Carlo overlap measures using the nine methods in each setting. Tables 5.1, 5.2, 5.3, and 5.4 provide the results of the Monte Carlo studies based on the previous four settings. For TSCPS 1 and 2, the value of $\alpha$ is also specified in these tables.

Table 5.1
The static MU284 population, $N=48,$ expected sample sizes ${n}_{1}=10,\text{\hspace{0.17em}}{n}_{2}=6,$ ${\pi }_{i1}$ are computed using the variable P75 (population in 1975 in thousands), and ${\pi }_{i2}$ using the variable P85 (population in 1985 in thousands). The distance matrix was artificially generated. The values of AUB and ALB are 6 and 1.96, respectively
Table summary
This table displays the results of The static MU284 population. The information is grouped by Method (appearing as row headers), independent, positive and negative (appearing as column headers).
Method independent positive negative
${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$ ${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$ ${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$
Poisson 3.04 1.89 0.45 6.03 4.06 0.33 1.96 1.13 0.54
LPM 3.03 1.22 0.36 5.10 0.71 0.17 2.64 1.20 0.41
SCPS 3.06 1.21 0.36 4.91 0.85 0.19 2.33 1.06 0.44
TSCPS 1 $\alpha =$0.25 3.06 1.28 0.37 5.84 0.93 0.17 2.09 1.13 0.51
$\alpha =$0.50 3.04 1.27 0.37 5.54 0.79 0.16 2.21 1.10 0.47
$\alpha =$0.75 3.06 1.25 0.37 5.20 0.80 0.17 2.27 1.06 0.45
TSCPS 2 $\alpha =$0.25 3.07 1.67 0.42 5.75 2.40 0.27 1.97 1.13 0.54
$\alpha =$0.50 3.06 1.45 0.39 5.40 1.57 0.23 2.05 1.10 0.51
$\alpha =$0.75 3.04 1.27 0.37 5.13 1.10 0.20 2.18 1.04 0.47
Table 5.2
Baltimore data, $N=211,$ expected sample sizes ${n}_{1}=25,\text{\hspace{0.17em}}{n}_{2}=25,$ ${\pi }_{i1}$ are computed using the variable AGE and ${\pi }_{i2}$ using AGE+5. The distance matrix uses real data. The values of AUB and ALB are 24.20 and 0.10, respectively
Table summary
This table displays the results of Baltimore data. The information is grouped by Method (appearing as row headers), independent, positive and negative (appearing as column headers).
Method independent positive negative
${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$ ${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$ ${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$
Poisson 4.08 3.93 0.49 24.20 20.63 0.19 0.10 0.09 3.00
LPM 4.09 3.15 0.43 21.50 2.86 0.08 1.76 1.51 0.70
SCPS 4.01 3.22 0.45 22.20 3.14 0.08 0.76 0.70 1.10
TSCPS 1 $\alpha =$0.25 4.05 3.02 0.43 23.10 2.60 0.07 0.26 0.26 1.96
$\alpha =$0.50 4.06 3.06 0.43 22.50 2.93 0.08 0.45 0.43 1.46
$\alpha =$0.75 4.05 3.22 0.44 22.30 3.10 0.08 0.57 0.55 1.30
TSCPS 2 $\alpha =$0.25 4.07 3.56 0.46 23.70 11.75 0.14 0.10 0.09 3.00
$\alpha =$0.50 4.07 3.37 0.45 23.20 6.35 0.11 0.29 0.27 1.79
$\alpha =$0.75 4.04 3.31 0.45 22.70 3.84 0.09 0.58 0.52 1.24
Table 5.3
The dynamic MU284 population $–$ region 2 from the MU284 population, where 50% of the units are new in the second occasion (“births”), and 50% of the units change the stratum (“deaths”), $N=72,$ expected sample sizes ${n}_{1}=10,\text{\hspace{0.17em}}{n}_{2}=6.$ The distance matrix was artificially generated. The values of AUB and ALB are 3.56 and 1.33, respectively
Table summary
This table displays the results of The dynamic MU284 population – region 2 from the MU284 population. The information is grouped by Method (appearing as row headers), independent, positive and negative (appearing as column headers).
Method independent positive negative
${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$ ${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$ ${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$
Poisson 2.02 1.20 0.54 3.56 2.35 0.43 1.32 0.71 0.64
LPM 2.03 0.95 0.48 2.37 1.00 0.42 1.87 0.89 0.50
SCPS 2.02 1.02 0.50 3.01 1.19 0.36 1.54 0.79 0.58
TSCPS 1 $\alpha =$0.25 2.02 0.94 0.48 3.42 1.31 0.33 1.39 0.70 0.60
$\alpha =$0.50 2.03 1.02 0.50 3.27 1.33 0.35 1.42 0.79 0.63
$\alpha =$0.75 2.02 1.02 0.50 3.16 1.26 0.36 1.47 0.80 0.61
TSCPS 2 $\alpha =$0.25 2.02 1.04 0.50 3.36 1.67 0.38 1.33 0.64 0.60
$\alpha =$0.50 2.02 0.96 0.49 3.20 1.37 0.37 1.41 0.66 0.58
$\alpha =$0.75 2.02 0.94 0.48 3.10 1.24 0.36 1.50 0.71 0.56
Table 5.4
Artificial data, $N=100,$ expected sample sizes ${n}_{1}=10,\text{\hspace{0.17em}}{n}_{2}=25,$ ${\pi }_{i1}$ and ${\pi }_{i2}$ randomly generated, uncorrelated. The distance matrix was artificially generated. The values of AUB and ALB are 9.11 and 0, respectively
Table summary
This table displays the results of Artificial data. The information is grouped by Method (appearing as row headers), independent, positive and negative (appearing as column headers).
Method independent positive negative
${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$ ${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$ ${E}_{\text{sim}}\left(c\right)$ ${V}_{\text{sim}}\left(c\right)$ ${\text{CV}}_{\text{sim}}\left(c\right)$
Poisson 2.44 2.34 0.63 9.11 8.08 0.31 $\sim 0$ $\sim 0$ This is an empty cell
LPM 2.45 1.82 0.55 5.42 2.35 0.28 1.03 0.91 0.93
SCPS 2.42 1.82 0.56 6.94 2.07 0.21 0.45 0.42 1.44
TSCPS 1 $\alpha =$0.25 2.44 1.76 0.54 8.53 2.05 0.17 0.06 0.07 4.41
$\alpha =$0.50 2.46 1.79 0.54 7.95 1.90 0.17 0.21 0.22 2.23
$\alpha =$0.75 2.43 1.80 0.55 7.40 1.97 0.19 0.31 0.31 1.80
TSCPS 2 $\alpha =$0.25 2.43 2.09 0.59 8.53 4.86 0.26 $\sim 0$ $\sim 0$ This is an empty cell
$\alpha =$0.50 2.45 1.91 0.56 7.90 3.32 0.23 0.11 0.10 2.87
$\alpha =$0.75 2.44 1.83 0.55 7.34 2.51 0.22 0.28 0.26 1.82

Following the results given in Tables 5.1, 5.2, 5.3, and 5.4, SCPS shows in general better performance than LPM in terms of ${E}_{\text{sim}}\left(c\right),$ ${V}_{\text{sim}}\left(c\right)$ and ${\text{CV}}_{\text{sim}}\left(c\right)$ for both types of coordination; an exception is the case of the static MU284 population and positive coordination. In this setting, the pairs used for the selection of ${s}_{1}$ are also used for the selection of ${s}_{2},$ since deaths or births are not assumed. Without such changes in population, LPM may perform better than SCPS in terms of ${E}_{\text{sim}}\left(c\right),$ but also in terms of ${V}_{\text{sim}}\left(c\right)$ and ${\text{CV}}_{\text{sim}}\left(c\right).$

As expected, Poisson sampling achieves the AUB and ALB (minor differences are due to the sampling error) in all settings, but the overlap variance is very high in positive coordination. This is mainly due to the random sizes of ${s}_{1}$ and ${s}_{2}.$ The large values of ${V}_{\text{sim}}\left(c\right)$ impact the values of ${\text{CV}}_{\text{sim}}\left(c\right).$ In all the examples shown, the latter is in general larger than the values of ${\text{CV}}_{\text{sim}}\left(c\right)$ provided by the other sampling schemes.

Results in Tables 5.1, 5.2, 5.3, and 5.4 confirm that the value of $\alpha$ in the transformed SCPS determines the coordination degree; a smaller value of $\alpha$ provides a better coordination degree, since one gets closer to Poisson sampling (we remind that $\alpha =0$ in the TSCPS designs leads to Poisson sampling).

For a given $\alpha ,$ the new strategies presented in Section 4.2 yield similar values of ${E}_{\text{sim}}\left(c\right)$ in positive coordination, but TSCPS 2 gives larger values of ${V}_{\text{sim}}\left(c\right)$ and ${\text{CV}}_{\text{sim}}\left(c\right).$ For all $\alpha$ used, both TSCPS 1 and TSCPS 2 provides similar values of ${\text{CV}}_{\text{sim}}\left(c\right)$ in positive and negative coordination in our examples, excepting TSCPS 2 with $\alpha =$0.25. The latter performs very close to Poisson sampling in negative coordination as the results in Tables 5.1, 5.2, 5.3, and 5.4 show.

An interesting result for Poisson sampling arises from Tables 5.1, 5.2, 5.3, and 5.4 in terms of ${\text{CV}}_{\text{sim}}\left(c\right).$ While the values of ${V}_{\text{sim}}\left(c\right)$ are large for positive coordination compared to LPM and SCPS, it is not the case for negative coordination. However, in the latter case, if ${E}_{\text{sim}}\left(c\right)\sim {V}_{\text{sim}}\left(c\right)$ and both are small as in Table 5.2, the corresponding value of ${\text{CV}}_{\text{sim}}\left(c\right)$ becomes very large. As we mentioned, that can also be the case for the TSCPS designs with small values of $\alpha .$ The improvement of introducing this new family of designs compared to Poisson sampling is measured for these situations in terms of spatial balance degree as shown in the next section.

5.2  Spatial balance and variance of sample size

The transformed SCPS is compared to the other sampling designs in terms of degree of spatial balance using Monte-Carlo simulation. The degree of spatial balance is measured using the $B$ measure shown in expression (3.3). For the transformed SCPS the two strategies presented in Section 4.2 are used, and the four previous settings are employed. The $B$ measure was computed on the same samples ${s}_{1}$ used to obtain the outcomes given in Tables 5.1, 5.2, 5.3, and 5.4, respectively. The following overall measure was used for each type of sample

${E}_{\text{sim}}\left(B\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1}{m}\text{\hspace{0.17em}}\sum _{l=1}^{m}\text{ }{B}_{l},$

where ${B}_{l}$ represents the $B$ measure computed on a realised sample in the ${l}^{\text{th}}$ run. For comparison, the average of the $B$ measures computed over the Monte-Carlo runs for Poisson sampling and LPM were also reported.

TSCPS is also compared with Poisson sampling in terms of variance of sample size computed over the Monte-Carlo runs using:

${V}_{\text{sim}}\left(\text{size}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1}{m-1}\text{\hspace{0.17em}}\sum _{l=1}^{m}\text{\hspace{0.17em}}{\left(\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{s}_{l}\text{\hspace{0.17em}}|-|\text{\hspace{0.17em}}\overline{s}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}\right)}^{2}\text{​},$

where $|\text{\hspace{0.17em}}{s}_{l}\text{\hspace{0.17em}}|$ represents the sample size of a realised sample ${s}_{l}$ in the ${l}^{\text{th}}$ run and $|\text{\hspace{0.17em}}\overline{s}\text{\hspace{0.17em}}|=\frac{1}{m}{\sum }_{l=1}^{m}|\text{\hspace{0.17em}}{s}_{l}\text{\hspace{0.17em}}|.$

Tables 5.5, 5.6, 5.7 and 5.8 provide the results. Following these results, we note that the choice of $\alpha$ determines the performance of the transformed SCPS in terms of averaged $B$ measure: a larger value of $\alpha$ results in a better spatial balance degree. However, in all settings, the resulting spatial balance degree is worse than for LPM and SCPS, but better than for Poisson sampling as expected, since the latter is not a spatial balanced sampling.

For all four settings, the variance of sample size is much higher for Poisson sampling than for TSCPS 1 and TSCPS 2, for all values of $\alpha .$ While TSCPS 2 with $\alpha =$0.25 performs very close to Poisson sampling in the examples shown in Section 5.1 for negative coordination, we note however that the corresponding values of ${V}_{\text{sim}}\left(\text{size}\right)$ for the former method are much smaller than those provided by Poisson sampling.

As underlined in Section 4.2, TSCPS 1 shows smaller sample size variance than TSCPS 2 for the same $\alpha .$ The results in our settings confirm for both TSCPS 1 and TSCPS 2 that the variance of sample size decreases when $\alpha$ increases.

Table 5.5
The static MU284 population, $N=48,$ expected sample size 10, the inclusion prob. are computed using the variable P75 (population in 1975 in thousands). The distance matrix was artificially generated
Table summary
This table displays the results of The static MU284 population. The information is grouped by Design (appearing as row headers), (équation) (appearing as column headers).
Design ${E}_{\text{sim}}\left(B\right)$ ${V}_{\text{sim}}\left(\text{size}\right)$
Poisson 0.301 4.806
LPM 0.124 0
SCPS 0.131 0
TSCPS 1 $\alpha =$0.25 0.209 0.727
$\alpha =$0.50 0.177 0.405
$\alpha =$0.75 0.146 0.187
TSCPS 2 $\alpha =$0.25 0.215 2.692
$\alpha =$0.50 0.159 1.211
$\alpha =$0.75 0.134 0.399
Table 5.6
Baltimore data, $N=211,$ expected sample size 25, the inclusion prob. are computed using the variable AGE. The distance matrix uses real data
Table summary
This table displays the results of Baltimore data. The information is grouped by Design (appearing as row headers), ${E}_{\text{sim}}\left(B\right)$ and ${V}_{\text{sim}}\left(\text{size}\right)$ (appearing as column headers).
Design ${E}_{\text{sim}}\left(B\right)$ ${V}_{\text{sim}}\left(\text{size}\right)$
Poisson 0.416 21.107
LPM 0.137 0
SCPS 0.137 0
TSCPS 1 $\alpha =$ 0.25 0.256 0.909
$\alpha =$ 0.50 0.198 0.449
$\alpha =$ 0.75 0.162 0.222
TSCPS 2 $\alpha =$ 0.25 0.282 11.382
$\alpha =$ 0.50 0.195 4.811
$\alpha =$ 0.75 0.148 1.227
Table 5.7
The dynamic MU284 population, $N=48,$ expected sample size 10, the inclusion prob. are computed using the variable P75 (population in 1975 in thousands). The distance matrix was artificially generated
Table summary
This table displays the results of The dynamic MU284 population. The information is grouped by Design (appearing as row headers), ${E}_{\text{sim}}\left(B\right)$ and ${V}_{\text{sim}}\left(\text{size}\right)$ (appearing as column headers).
Design ${E}_{\text{sim}}\left(B\right)$ ${V}_{\text{sim}}\left(\text{size}\right)$
Poisson 0.422 5.683
LPM 0.202 0
SCPS 0.210 0
TSCPS 1 $\alpha =$ 0.25 0.306 0.798
$\alpha =$ 0.50 0.255 0.427
$\alpha =$ 0.75 0.224 0.231
TSCPS 2 $\alpha =$ 0.25 0.315 3.128
$\alpha =$ 0.50 0.252 1.370
$\alpha =$ 0.75 0.213 0.446
Table 5.8
Artificial data, $N=100,$ expected sample size 10, the inclusion prob. are randomly generated. The distance matrix was artificially generated
Table summary
This table displays the results of Artificial data. The information is grouped by Design (appearing as row headers), ${E}_{\text{sim}}\left(B\right)$ and ${V}_{\text{sim}}\left(\text{size}\right)$ (appearing as column headers).
Design ${E}_{\text{sim}}\left(B\right)$ ${V}_{\text{sim}}\left(\text{size}\right)$
Poisson 0.485 8.892
LPM 0.134 0
SCPS 0.133 0
TSCPS 1 $\alpha =$ 0.25 0.286 0.938
$\alpha =$ 0.50 0.213 0.446
$\alpha =$ 0.75 0.167 0.230
TSCPS 2 $\alpha =$ 0.25 0.313 4.854
$\alpha =$ 0.50 0.204 2.121
$\alpha =$ 0.75 0.149 0.632

5.3  Variance estimation

In repeated surveys, estimates of net variation, period averages and gross change are of interest. Our proposed methods are suitable to estimate such parameters. Their variance estimation is, however, intractable for our methods and is not addressed here. We study only empirically the impact that each coordinated spatial balancing method has on the quality of the estimates of two of the above parameters. Note that there exist approximative variance estimators for state that can be used for LPM and SCPS (Grafström and Schelin, 2014), but further research is needed to derive an approximative estimator for the covariance between successive state estimators under coordination.

Consider a repeated survey over two time occasions. Let $y$ be the variable of interest, measured in the first and second time occasion, respectively. We denote by ${y}_{it}$ the value of this variable taken by the unit $i\in U$ on the time occasion $t,$ with $t\in \left\{1,\text{\hspace{0.17em}}2\right\}.$ Let ${x}_{it}$ be the value of an auxiliary variable taken by the unit $i\in U$ at occasion $t;$ the variable $x$ is well correlated with $y,$ and available for all units $i\in U$ in both time occasions. It is assumed that ${x}_{it}$ is known for all $i\in U$ from a previous census or that a two-phase sampling is used: in the first phase the value of ${x}_{it}$ is obtained, while the coordination process is addressed in the second phase of the sampling. The notation ${E}_{M}\left(.\right)$ and ${\text{var}}_{M}\left(.\right)$ indicate the expectation and variance under a model. We borrow from Grafström and Tillé (2013) the following cross-sectional superpopulation model with spatial correlation

${y}_{i,\text{\hspace{0.17em}}t-1}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\beta }_{0}+{x}_{i,\text{\hspace{0.17em}}t-1}{\beta }_{1}+{\epsilon }_{i,\text{\hspace{0.17em}}t-1},\text{ }\text{ }\text{ }\text{ }\left(5.1\right)$

where ${\beta }_{0}$ and ${\beta }_{1}$ are parameters, where ${\epsilon }_{i,\text{\hspace{0.17em}}t-1}$ are random variables, with ${E}_{M}\left({\epsilon }_{i,\text{\hspace{0.17em}}t-1}\right)=0,$ ${\text{var}}_{M}\left({\epsilon }_{i,\text{\hspace{0.17em}}t-1}\right)={\sigma }_{i}^{2}\text{​},$ ${\text{cov}}_{M}\left({\epsilon }_{i}\text{​},\text{\hspace{0.17em}}{\epsilon }_{j}\right)={\sigma }_{i}{\sigma }_{j}{\rho }^{d\left(i,\text{\hspace{0.17em}}j\right)}\text{​},$ where $d\left(i,\text{\hspace{0.17em}}j\right)$ represents the distance between the units $i$ and $j,$ for $i,\text{\hspace{0.17em}}j\in U.$ The particular form of ${\text{cov}}_{M}\left({\epsilon }_{i}\text{​},\text{\hspace{0.17em}}{\epsilon }_{j}\right)$ in model (5.1) underlines a decreasing function of the distance between $i$ and $j,$ reflecting that the proximity of units implies a larger spatial correlation. The following autoregressive model is considered

${y}_{it}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\delta }_{0}+{\delta }_{1}{y}_{i,\text{\hspace{0.17em}}t-1}+{\gamma }_{it}\text{​},\text{ }\text{ }\text{ }\text{ }\left(5.2\right)$

with ${\delta }_{0}$ and ${\delta }_{1}$ being parameters, and with ${\gamma }_{it}$ being independent random variables, with ${E}_{M}\left({\gamma }_{it}\right)=0,\text{\hspace{0.17em}}{\text{var}}_{M}\left({\gamma }_{it}\right)={u}^{2}\text{​}.$ The following model is also assumed

${x}_{it}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\alpha }_{0}+{\alpha }_{1}{x}_{i,\text{\hspace{0.17em}}t-1}+{\stackrel{˜}{\gamma }}_{it}\text{​},\text{ }\text{ }\text{ }\text{ }\left(5.3\right)$

where ${\alpha }_{0}$ and ${\alpha }_{1}$ are parameters, where ${\stackrel{˜}{\gamma }}_{it}$ are independent random variables, with ${E}_{M}\left({\stackrel{˜}{\gamma }}_{it}\right)=0,$ ${\text{var}}_{M}\left({\stackrel{˜}{\gamma }}_{it}\right)={\stackrel{˜}{u}}^{2}\text{​}.$ We obtain thus a spatial-temporal dependence of the data through models (5.1), (5.2) and (5.3).

We consider that ${\pi }_{it}$ are constructed using the expression

${\pi }_{it}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{{n}_{t}{x}_{it}}{{\sum }_{j\in U}{x}_{jt}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}t\in \left\{1,\text{\hspace{0.17em}}2\right\},$

that leads to a correlation between ${\pi }_{t-1}$ and ${\pi }_{t}$ due to model (5.3).

The following parameters of interest are considered: the one period change $D={\sum }_{i\in {U}_{1}}{y}_{{i}_{1}}-{\sum }_{i\in {U}_{2}}{y}_{i2}$ and the average over two periods $A=\frac{1}{2}\left({\sum }_{i\in {U}_{1}}{y}_{{i}_{1}}+{\sum }_{i\in {U}_{2}}{y}_{i2}\right).$ The two parameters are estimated by

$\stackrel{^}{D}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\sum _{i\in {s}_{1}}\text{\hspace{0.17em}}\frac{{y}_{{i}_{1}}}{{\pi }_{i1}}-\sum _{i\in {s}_{2}}\text{\hspace{0.17em}}\frac{{y}_{i2}}{{\pi }_{i2}},$

and

$\stackrel{^}{A}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1}{2}\left(\sum _{i\in {s}_{1}}\text{\hspace{0.17em}}\frac{{y}_{{i}_{1}}}{{\pi }_{i1}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+\text{\hspace{0.17em}}\text{\hspace{0.17em}}\sum _{i\in {s}_{2}}\text{\hspace{0.17em}}\frac{{y}_{i2}}{{\pi }_{i2}}\right),$

respectively. We have

$\text{var}\left(\stackrel{^}{D}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{var}\left(\sum _{i\in {s}_{1}}\text{\hspace{0.17em}}\frac{{y}_{{i}_{1}}}{{\pi }_{i1}}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}+\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{var}\left(\sum _{i\in {s}_{2}}\text{\hspace{0.17em}}\frac{{y}_{i2}}{{\pi }_{i2}}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}-\text{\hspace{0.17em}}\text{\hspace{0.17em}}2\text{cov}\left(\sum _{i\in {s}_{1}}\text{\hspace{0.17em}}\frac{{y}_{{i}_{1}}}{{\pi }_{i1}},\text{\hspace{0.17em}}\sum _{i\in {s}_{2}}\text{\hspace{0.17em}}\frac{{y}_{i2}}{{\pi }_{i2}}\right),\text{ }\text{ }\text{ }\text{ }\left(5.4\right)$

$\text{var}\left(\stackrel{^}{A}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1}{4}\text{var}\left(\sum _{i\in {s}_{1}}\frac{{y}_{{i}_{1}}}{{\pi }_{i1}}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}+\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1}{4}\text{var}\left(\sum _{i\in {s}_{2}}\frac{{y}_{i2}}{{\pi }_{i2}}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}+\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1}{2}\text{cov}\left(\sum _{i\in {s}_{1}}\frac{{y}_{{i}_{1}}}{{\pi }_{i1}},\text{\hspace{0.17em}}\sum _{i\in {s}_{2}}\frac{{y}_{i2}}{{\pi }_{i2}}\right),\text{ }\text{ }\text{ }\text{ }\left(5.5\right)$

where $\text{var}\left(.\right)$ and $\text{cov}\left(.,.\right)$ represent the variance and the covariance operators, respectively.

Following expression (5.4), if ${s}_{1}$ and ${s}_{2}$ are positively coordinated, the variance of $\stackrel{^}{D}$ is reduced in general through sample overlap, since a positive covariance between ${\sum }_{i\in {s}_{1}}{y}_{{i}_{1}}/{\pi }_{i1}$ and ${\sum }_{i\in {s}_{2}}{y}_{i2}/{\pi }_{i2}$ is achieved compared to independent samples’ selection. Similarly, from expression (5.5), independent samples’ selection reduces the variance of $\stackrel{^}{A}$ compared to positively coordinated samples because this covariance is zero. Negative coordination of samples can lead to a negative covariance between ${\sum }_{i\in {s}_{1}}{y}_{{i}_{1}}/{\pi }_{i1}$ and ${\sum }_{i\in {s}_{2}}{y}_{{i}_{2}}/{\pi }_{i2},$ and the variance of $\stackrel{^}{A}$ can diminish compared to independent samples’ selection.

A population of size $N=100$ was created using models (5.1), (5.2), and (5.3). No births or deaths were considered in the population. The distance matrix was artificially generated using absolute values of independent runs from the $N\left(0,1\right)$ distribution. We set ${\beta }_{0}=4,$ ${\beta }_{1}=2,$ $\rho =\text{0}\text{.9},$ ${\delta }_{0}=0,$ ${\delta }_{1}=1,$ ${\alpha }_{0}=0,$ ${\alpha }_{1}=1,$ ${\stackrel{˜}{\gamma }}_{i}\sim N\left(0,1\right),\text{\hspace{0.17em}}i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}N,$ iid and ${\gamma }_{i}={\beta }_{1}{\stackrel{˜}{\gamma }}_{i}\text{​},\text{\hspace{0.17em}}i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}N.$ We also generated artificially ${x}_{i1}$ as independent random draws from the $N\left(4,1\right)$ distribution. The correlation between ${y}_{1}$ and ${y}_{2}$ was approximately 0.72, while between ${y}_{t}$ and ${x}_{t}\text{​},\text{\hspace{0.17em}}t=1,\text{\hspace{0.17em}}2$ was approximately 0.9. Based on this population, two different settings were created, by varying ${n}_{1}$ and ${n}_{2}:$ in the first setting ${n}_{1}=10,\text{\hspace{0.17em}}{n}_{2}=25,$ while in the second one ${n}_{1}={n}_{2}=50.$ The correlation between ${\pi }_{1}$ and ${\pi }_{2}$ was approximately 0.7 in both settings.

Monte Carlo simulation was used to study empirically the impact that each proposed method has on $\text{var}\left(\stackrel{^}{D}\right)$ and $\text{var}\left(\stackrel{^}{A}\right).$ For each setting, $\begin{array}{c}m={10}^{4}\end{array}$ samples were drawn as described in the beginning of Section 5.1. Figures 5.1 and 5.2 show boxplots corresponding to the $\stackrel{^}{D}$ values obtained through Monte Carlo simulation, for both settings. The white boxplots correspond to the $\stackrel{^}{D}$ values obtained from independent samples ${s}_{1}$ and ${s}_{2},$ while the grey ones to positively coordinated samples ${s}_{1}$ and ${s}_{2}.$ The sampling design is specified below each boxplot (for example, TSCPS 1_indep_0.25 indicates TSCPS 1 with independent samples’ selection and $\alpha =$0.25 for both selections, while TSCPS 1_pos_0.25 indicates TSCPS 1 with positively coordinated samples and $\alpha =$0.25 for both selections).

Similarly, Figures 5.3 and 5.4 show boxplots corresponding to the $\stackrel{^}{A}$ values obtained through Monte Carlo simulation, for both settings, respectively. The white boxplots correspond to the $\stackrel{^}{A}$ values obtained from independent samples ${s}_{1}$ and ${s}_{2},$ while the grey ones to negatively coordinated samples ${s}_{1}$ and ${s}_{2}.$ In all figures, LPM with PRNs as well SCPS with PRNs show smaller spread of the $\stackrel{^}{D}$ values and $\stackrel{^}{A}$ values compared to Poisson sampling designs since both provide fixed sample sizes and are able to manage the spatial correlation of the data.

Figures 5.1 and 5.2 show a similar pattern of the boxplots: a larger overlap between ${s}_{1}$ and ${s}_{2}$ leads to a smaller spread of the $\stackrel{^}{D}$ values. As expected, the spread of the $\stackrel{^}{D}$ values is reduced for each type of positively coordinated samples compared to independent samples’ selection. For LPM and SCPS designs this reduction is, however, less important. This fact can be explained by the smaller overlap between positively coordinated samples in LPM and SCPS designs compared to the other ones, as the examples in Section 5.1 show it. The larger sample sizes in the second setting reduce the spread of the $\stackrel{^}{D}$ values in the case of positively coordinated samples (grey boxplots) compared to the independent sample selection (white boxplots). In Figures 5.3 and 5.4, negative coordination reduces in general the spread of the $\stackrel{^}{A}$ values compared to independent sample selection. As in Figures 5.1 and 5.2, this reduction is less important for LPM and SCPS compared for example to Poisson sampling and TSCPS 2.

Description for Figure 5.1

Figure showing the boxplots corresponding to the $\stackrel{^}{D}$ values obtained through Monte Carlo simulation, for the first setting, ${n}_{1}=10,\text{\hspace{0.17em}}{n}_{2}=25.$ The boxplots are presented for each sampling design, i.e. Poisson, LPM, SCPS, TSCPS 1 with $\alpha =0.25,$ TSCPS 1 with $\alpha =0.50,$ TSCPS 1 with $\alpha =0.75,$ TSCPS 2 with $\alpha =0.25,$ TSCPS 2 with $\alpha =0.50$ and TSCPS 2 with $\alpha =0.75,$ for independent sample selection positively coordinated samples. Values for $\stackrel{^}{D}$ are on the y-axis, ranging from -1 000 to 1 000. The spread of the $\stackrel{^}{D}$ values is reduced for each type of positively coordinated samples compared to independent samples’ selection. For LPM and SCPS designs this reduction is, however, less important. The spread is larger for Poisson and TSCPS 2 samples, but it’s reduced more by the positive coordination.

Description for Figure 5.2

Figure showing the boxplots corresponding to the $\stackrel{^}{D}$ values obtained through Monte Carlo simulation, for the second setting, ${n}_{1}=50,\text{\hspace{0.17em}}{n}_{2}=50.$ The boxplots are presented for each sampling design, i.e. Poisson, LPM, SCPS, TSCPS 1 with $\alpha =0.25,$ TSCPS 1 with $\alpha =0.50,$ TSCPS 1 with $\alpha =0.75,$ TSCPS 2 with $\alpha =0.25,$ TSCPS 2 with $\alpha =0.50$ and TSCPS 2 with $\alpha =0.75,$ for independent sample selection positively coordinated samples. Values for $\stackrel{^}{D}$ are on the y-axis, ranging from -600 to 600. The spread of the $\stackrel{^}{D}$ values is reduced for each type of positively coordinated samples compared to independent samples’ selection. For LPM and SCPS designs this reduction is, however, less important. The spread is larger for Poisson and TSCPS 2 samples, but it’s reduced more by the positive coordination. The larger sample sizes in this second setting reduce the spread of the $\stackrel{^}{D}$ values in the case of positively coordinated samples compared to the independent sample selection.

Description for Figure 5.3

Figure showing the boxplots corresponding to the $\stackrel{^}{A}$ values obtained through Monte Carlo simulation, for the first setting, ${n}_{1}=10,\text{\hspace{0.17em}}{n}_{2}=25.$ The boxplots are presented for each sampling design, i.e. Poisson, LPM, SCPS, TSCPS 1 with $\alpha =0.25,$ TSCPS 1 with $\alpha =0.50,$ TSCPS 1 with $\alpha =0.75,$ TSCPS 2 with $\alpha =0.25,$ TSCPS 2 with $\alpha =0.50$ and TSCPS 2 with $\alpha =0.75,$ for independent sample selection negatively coordinated samples. Values for $\stackrel{^}{A}$ are on the y-axis, ranging from 500 to 2,000. Negative coordination reduces in general the spread of the $\stackrel{^}{A}$ values compared to independent sample selection. This reduction is less important for LPM and SCPS compared for example to Poisson sampling and TSCPS 2.

Description for Figure 5.4

Figure showing the boxplots corresponding to the $\stackrel{^}{A}$ values obtained through Monte Carlo simulation, for the second setting, ${n}_{1}=50,\text{\hspace{0.17em}}{n}_{2}=50.$ The boxplots are presented for each sampling design, i.e. Poisson, LPM, SCPS, TSCPS 1 with $\alpha =0.25,$ TSCPS 1 with $\alpha =0.50,$ TSCPS 1 with $\alpha =0.75,$ TSCPS 2 with $\alpha =0.25,$ TSCPS 2 with $\alpha =0.50$ and TSCPS 2 with $\alpha =0.75,$ for independent sample selection negatively coordinated samples. Values for $\stackrel{^}{A}$ are on the y-axis, ranging from 1,000 to 1,400. Negative coordination reduces in general the spread of the $\stackrel{^}{A}$ values compared to independent sample selection. This reduction is less important for LPM and SCPS compared for example to Poisson sampling and TSCPS 2. Largest sample sizes seem to reduce the spread of the $\stackrel{^}{A}$ values, especially for negatively coordinated samples.

To quantify the performance of the proposed methods, for positive and negative coordination, respectively, the Monte Carlo variance was used

${\text{Var}}_{\text{MC}}\left(\theta \right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1}{m-1}\sum _{l=1}^{m}\text{\hspace{0.17em}}{\left({\theta }_{l}-{E}_{\text{sim}}\left(\theta \right)\right)}^{2},$

where ${\theta }_{l}$ is the value of $\stackrel{^}{D}$ or $\stackrel{^}{A}$ obtained in the ${l}^{\text{th}}$ run and ${E}_{\text{sim}}\left(\theta \right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1}{m}{\sum }_{j=1}^{m}\text{\hspace{0.17em}}{\theta }_{j}.$ The reduction in variance estimation through overlapped samples of $\stackrel{^}{D}$ is summarized in Table 5.9. The table shows the values of the ratio between ${\text{var}}_{\text{MC}}\left(\stackrel{^}{D}\right)$ obtained using positively coordinated samples and ${\text{var}}_{\text{MC}}\left(\stackrel{^}{D}\right)$ using independent samples for both settings. We note that for all sampling designs this ratio is less than 1, indicating a variance reduction through sample overlap. Table 5.10 shows the values of the ratio between ${\text{var}}_{\text{MC}}\left(\stackrel{^}{A}\right)$ obtained using negatively coordinated samples and ${\text{var}}_{\text{MC}}\left(\stackrel{^}{A}\right)$ using independent samples for both settings. For the first setting, except for Poisson sampling, the ratio is close to 1, showing negligible improvement of the negatively coordinated samples compared to independent selections. Using larger sample sizes, the second setting shows an important improvement for TSCPS 2, but not for LPM and SCPS.

Table 5.9
Ratio between ${\text{var}}_{\text{MC}}\left(\stackrel{^}{D}\right)$ obtained using positively coordinate samples and ${\text{var}}_{\text{MC}}\left(\stackrel{^}{D}\right)$ using independent samples
Table summary
This table displays the results of Ratio between ${\text{var}}_{\text{MC}}\left(\stackrel{^}{D}\right)$ obtained using positively coordinate samples and ${\text{var}}_{\text{MC}}\left(\stackrel{^}{D}\right)$ using independent samples. The information is grouped by Design (appearing as row headers), ${n}_{1}=10,{n}_{2}=25$
Ratio and ${n}_{1}=50,{n}_{2}=50$ Ration (appearing as column headers).
Design ${n}_{1}=10,\text{\hspace{0.17em}}{n}_{2}=25$
Ratio
${n}_{1}=50,\text{\hspace{0.17em}}{n}_{2}=50$
Ratio
Poisson 0.481 0.178
LPM 0.759 0.679
SCPS 0.760 0.778
TSCPS 1 $\alpha =$ 0.25 0.695 0.545
$\alpha =$ 0.50 0.739 0.700
$\alpha =$ 0.75 0.806 0.752
TSCPS 2 $\alpha =$ 0.25 0.513 0.217
$\alpha =$ 0.50 0.571 0.319
$\alpha =$ 0.75 0.634 0.491
Table 5.10
Ratio between ${\text{var}}_{\text{MC}}\left(\stackrel{^}{A}\right)$ obtained using negatively coordinate samples and ${\text{var}}_{\text{MC}}\left(\stackrel{^}{A}\right)$ using independent samples
Table summary
This table displays the results of Ratio between ${\text{var}}_{\text{MC}}\left(\stackrel{^}{A}\right)$ obtained using negatively coordinate samples and ${\text{var}}_{\text{MC}}\left(\stackrel{^}{A}\right)$ using independent samples. The information is grouped by Design (appearing as row headers), ${n}_{1}=10,{n}_{2}=25$
Ratio and ${n}_{1}=50,{n}_{2}=50$
Design ${n}_{1}=10,\text{\hspace{0.17em}}{n}_{2}=25$
Ratio
${n}_{1}=50,\text{\hspace{0.17em}}{n}_{2}=50$
Ratio
Poisson 0.792 0.324
LPM 0.941 0.949
SCPS 0.921 0.901
TSCPS 1 $\alpha =$ 0.25 0.932 0.679
$\alpha =$ 0.50 0.950 0.840
$\alpha =$ 0.75 0.953 0.876
TSCPS 2 $\alpha =$ 0.25 0.828 0.387
$\alpha =$ 0.50 0.834 0.463
$\alpha =$ 0.75 0.919 0.597

In summary, LPM with PRNs, SCPS with PRNs and the TSCPS family reduce the Monte-Carlo variance of the differences through sample overlap compared to independent samples’ selection in both settings. For the independent samples’ selection, these methods are more precise than Poisson sampling because they are able to manage the spatial trend present in the variable of interest, and the sample sizes are fixed (for LPM and SCPS using the “maximal weight strategy”) or less variable than for Poisson sampling. The Monte-Carlo variance of the averages is negligibly reduced by LPM and SCPS using negatively coordinated samples compared to independent samples in both settings. The transformed SCPS family shows a real improvement in the second setting, when ${n}_{1}$ and ${n}_{2}$ are relatively large, for all $\alpha .$

﻿
Date modified: