Using balanced sampling in creel surveys
Section 4. Comparison of the cube method and the rejective algorithm
Chauvet et al. (2015) have studied the cube method and the rejective algorithm by examining different aspects of these balancing techniques. They balanced on continuous auxiliary variables and they documented how the balancing algorithm impacted the selection probabilities and the sampling properties of estimators of population totals. The goal of this section is to compare the two sampling algorithms in a resource inventory where the balancing equations only involve indicator variables. This comparison is carried out in the context of a simplified creel survey with a stratified two stage design. The days represent strata $h\mathrm{=1,}\dots \mathrm{,}\text{\hspace{0.17em}}H,$ the sectors are defined as primary units $i\mathrm{=1,\text{\hspace{0.17em}}2,\text{\hspace{0.17em}}3}$ and sites, indexed by $j,$ are the secondary units. This sampling plan is similar to the design exposed in Section 3.1 except that periods and subperiods do not enter in the sampling design.
On each day two out of 3 sectors are selected and within each one 2 sites are sampled; thus 4 units are selected each day. The site importance variable ${x}_{ij}$ determines the inclusion probabilities ${\pi}_{hij}\mathrm{=}\left(2{x}_{i\u2022}/{x}_{\u2022\u2022}\right)\times \left(2{x}_{ij}/{x}_{i\u2022}\right)\mathrm{=}{\pi}_{hi}\times {\pi}_{hj\text{\hspace{0.05em}}|\text{\hspace{0.05em}}i}$ for the two stages. As two out of three units are selected at each level, the joint selection probabilities are completely determined by $\left\{\left({\pi}_{hi}\mathrm{,}\text{\hspace{0.17em}}{\pi}_{hj\text{\hspace{0.05em}}|\text{\hspace{0.05em}}i}\right)\mathrm{:}i\mathrm{,}\text{\hspace{0.17em}}j\mathrm{=1,\text{\hspace{0.17em}}2,\text{\hspace{0.17em}}3}\right\}$ for the two stages; see the Appendix. If ${Z}_{hij}$ stands for the indicator variables taking the value 1 if site $\left(i\mathrm{,}\text{\hspace{0.17em}}j\right)$ is sampled on day $h$ and 0 otherwise then the entries of $9\times 9$ variance covariance matrix for $\left\{{Z}_{hij}\text{\hspace{0.05em}}\mathrm{:}i\mathrm{,}\text{\hspace{0.17em}}j\mathrm{=1,\text{\hspace{0.17em}}2,\text{\hspace{0.17em}}3}\right\}$ are given by
$$\text{Cov}\left({Z}_{hij}\mathrm{,}\text{\hspace{0.17em}}{Z}_{h{i}^{\prime}{j}^{\prime}}\right)\mathrm{=}\{\begin{array}{ll}{\pi}_{hij}-{\pi}_{hij}^{2}\hfill & \text{if}\text{\hspace{0.17em}}i={i}^{\prime}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}j={j}^{\prime}\hfill \\ {\pi}_{hi}\text{\hspace{0.17em}}{\pi}_{hj{j}^{\prime}\text{\hspace{0.05em}}|\text{\hspace{0.05em}}i}-{\pi}_{hij}\text{\hspace{0.17em}}{\pi}_{hi{j}^{\prime}}\hfill & \text{if}\text{\hspace{0.17em}}i={i}^{\prime}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}j\ne {j}^{\prime}\hfill \\ {\pi}_{hi{i}^{\prime}}\text{\hspace{0.17em}}{\pi}_{hj\text{\hspace{0.05em}}|\text{\hspace{0.05em}}i}\text{\hspace{0.17em}}{\pi}_{h{j}^{\prime}\text{\hspace{0.05em}}|\text{\hspace{0.05em}}{i}^{\prime}}-{\pi}_{hij}\text{\hspace{0.17em}}{\pi}_{h{i}^{\prime}{j}^{\prime}}\hfill & \text{if}\text{\hspace{0.17em}}i\ne {i}^{\prime}\hfill \end{array}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(4.1)$$
where ${\pi}_{hi{i}^{\prime}}$ represents the joint selection probability of sectors $i$ and ${i}^{\prime}$ on a single day, ${\pi}_{hj\text{\hspace{0.05em}}|\text{\hspace{0.05em}}i}$ is the probability for selecting site $j,$ in sector $i,$ at stage 2 and ${\pi}_{hj{j}^{\prime}\text{\hspace{0.05em}}|\text{\hspace{0.05em}}i}$ is the joint selection probability of sites $j$ and ${j}^{\prime}$ in sector $i.$ All these probabilities are evaluated using the size measure $x.$ Details are available in the appendix, see also Ousmane Ida (2016). The corresponding matrix $\text{Var}\left(\tilde{n}\right)$ in (2.3) is singular as one of the 9 constraints is redundant; thus in (2.3) a generalized inverse of the covariance matrix was used and ${\gamma}^{2}\text{},$ in (2.3), was set equal to 2.73 and 7.34, the ${5}^{\text{th}}$ and the ${50}^{\text{th}}$ percentiles of the ${\chi}_{8}^{2}$ distribution.
4.1 Simulations on the comparison of the cube method and of the rejective algorithm
To investigate the impact of the algorithm on the sampling properties of survey estimators we simulated, for each unit, a fishing effort for site $\left(i\mathrm{,}\text{\hspace{0.17em}}j\right)$ on day $h,$ ${y}_{hij},$ using independent Poisson random variables with mean $15\times {x}_{ij}.$ The total fishing effort for site $\left(i\mathrm{,}\text{\hspace{0.17em}}j\right)$ is then
$${Y}_{Uij}={\displaystyle \sum _{h\mathrm{=1}}^{H}{y}_{hij}}\mathrm{.}$$
A calibrated estimator, as defined in Section 3.2, for the fishing effort in site $\left(i\mathrm{,}\text{\hspace{0.17em}}j\right)$ is ${\widehat{Y}}_{ij}\mathrm{=}H\text{\hspace{0.05em}}{\overline{y}}_{sij},$ the average fishing effort for the ${n}_{ij}$ units sampled at site $\left(i\mathrm{,}\text{\hspace{0.17em}}j\right)$ times $H.$
To compare the balancing algorithms, we used designs with $H\mathrm{=12}$ strata and two importance variables $x,$ one with a small variation between site and one with a medium variation. Under each scenario we generated $B\mathrm{=}\text{100,000}$ random replications of a balanced sample by using the cube methods on one hand, and two rejective algorithms on the other. The inclusion probabilities for site $\left(i\mathrm{,}\text{\hspace{0.17em}}j\right)$ was estimated by
$${\widehat{\pi}}_{ij}\mathrm{=}\frac{1}{B\times H}{\displaystyle \sum _{b\mathrm{=1}}^{B}{n}_{ij}^{\left(b\right)}}\mathrm{.}$$
This estimator assumes that the inclusion probabilities ${\pi}_{hij}$ are constant in $h.$ This holds true because the sample design is invariant to a relabelling of the days, see Section 3.1.
As argued in Section 3.2, the calibrated estimator ${\widehat{Y}}_{ij}$ is design unbiased under the two selection algorithms. We compare their standard deviations,
$${\text{Sd}}_{{\widehat{Y}}_{ij}}={\left\{\frac{1}{B-1}{\displaystyle \sum _{b\mathrm{=1}}^{B}{\left({\widehat{Y}}_{ij}^{\left(b\right)}-{\overline{\widehat{Y}}}_{ij}\right)}^{2}}\right\}}^{1/2}\text{}\mathrm{,}$$
where ${\overline{\widehat{Y}}}_{ij}$ is the average of the $B$ simulated values. The sample size standard deviations were also calculated using (3.2). Observe that ${\widehat{\pi}}_{ij}\mathrm{=}{\overline{n}}_{ij}/H.$ The simulation results are presented in Tables 4.1, 4.2 and 4.3.
CM | $R\text{\hspace{0.17em}}\mathrm{5\%}$ | $R\text{\hspace{0.17em}}\mathrm{50\%}$ | |||||||
---|---|---|---|---|---|---|---|---|---|
Sector | Site | ${x}_{ij}$ | ${\pi}_{ij}$ | ${\overline{\widehat{\pi}}}_{ij}$ | ${\text{Sd}}_{{\widehat{Y}}_{ij}}$ | ${\overline{\widehat{\pi}}}_{ij}$ | ${\text{Sd}}_{{\widehat{Y}}_{ij}}$ | ${\overline{\widehat{\pi}}}_{ij}$ | ${\text{Sd}}_{{\widehat{Y}}_{ij}}$ |
$i\mathrm{=1}$ | $j\mathrm{=1}$ | 3 | 0.500 | 0.500 | 16.56 | 0.503 | 16.86 | 0.505 | 17.40 |
$j\mathrm{=2}$ | 2 | 0.333 | 0.333 | 22.20 | 0.329 | 23.35 | 0.328 | 25.07 | |
$j\mathrm{=3}$ | 3 | 0.500 | 0.500 | 23.99 | 0.503 | 24.47 | 0.505 | 25.15 | |
$i\mathrm{=2}$ | $j\mathrm{=4}$ | 2 | 0.333 | 0.333 | 25.80 | 0.329 | 26.93 | 0.326 | 29.11 |
$j\mathrm{=5}$ | 2 | 0.333 | 0.333 | 33.97 | 0.329 | 35.54 | 0.326 | 38.28 | |
$j\mathrm{=6}$ | 2 | 0.333 | 0.333 | 27.65 | 0.329 | 28.87 | 0.326 | 31.10 | |
$i\mathrm{=3}$ | $j\mathrm{=7}$ | 3 | 0.500 | 0.500 | 22.50 | 0.502 | 22.88 | 0.502 | 23.66 |
$j\mathrm{=8}$ | 3 | 0.500 | 0.500 | 20.02 | 0.502 | 20.20 | 0.502 | 20.94 | |
$j\mathrm{=9}$ | 4 | 0.667 | 0.667 | 22.01 | 0.674 | 21.98 | 0.679 | 22.25 |
CM | $R\text{\hspace{0.17em}}\mathrm{5\%}$ | $R\text{\hspace{0.17em}}\mathrm{50\%}$ | |||||||
---|---|---|---|---|---|---|---|---|---|
Sector | Site | ${x}_{ij}$ | ${\pi}_{ij}$ | ${\overline{\widehat{\pi}}}_{ij}$ | ${\text{Sd}}_{{\widehat{Y}}_{ij}}$ | ${\overline{\widehat{\pi}}}_{ij}$ | ${\text{Sd}}_{{\widehat{Y}}_{ij}}$ | ${\overline{\widehat{\pi}}}_{ij}$ | ${\text{Sd}}_{{\widehat{Y}}_{ij}}$ |
$i\mathrm{=1}$ | $j\mathrm{=1}$ | 3 | 0.500 | 0.500 | 25.52 | 0.505 | 25.78 | 0.507 | 26.60 |
$j\mathrm{=2}$ | 2 | 0.333 | 0.333 | 25.25 | 0.330 | 26.26 | 0.329 | 28.16 | |
$j\mathrm{=3}$ | 3 | 0.500 | 0.500 | 21.12 | 0.505 | 21.36 | 0.507 | 22.03 | |
$i\mathrm{=2}$ | $j\mathrm{=4}$ | 1 | 0.167 | 0.167 | 29.17 | 0.158 | 32.45 | 0.149 | 31.19 |
$j\mathrm{=5}$ | 2 | 0.333 | 0.333 | 13.73 | 0.329 | 14.38 | 0.326 | 15.49 | |
$j\mathrm{=6}$ | 2 | 0.333 | 0.333 | 32.82 | 0.329 | 34.22 | 0.326 | 36.91 | |
$i\mathrm{=3}$ | $j\mathrm{=7}$ | 2 | 0.333 | 0.333 | 16.84 | 0.329 | 17.52 | 0.325 | 18.85 |
$j\mathrm{=8}$ | 4 | 0.667 | 0.667 | 18.68 | 0.672 | 18.70 | 0.678 | 18.89 | |
$j\mathrm{=9}$ | 5 | 0.833 | 0.833 | 8.06 | 0.844 | 7.81 | 0.854 | 7.67 |
$x$ has a low variation | $x$ has a medium variation | ||||||||
---|---|---|---|---|---|---|---|---|---|
Sector | Site | $x$ | CM | $R\text{\hspace{0.17em}}\mathrm{5\%}$ | $R\text{\hspace{0.17em}}\mathrm{50\%}$ | $x$ | CM | $R\text{\hspace{0.17em}}\mathrm{5\%}$ | $R\text{\hspace{0.17em}}\mathrm{50\%}$ |
$i\mathrm{=1}$ | $j\mathrm{=1}$ | 3 | 0.000 | 0.894 | 1.371 | 3 | 0.000 | 0.891 | 1.371 |
$j\mathrm{=2}$ | 2 | 0.000 | 0.854 | 1.295 | 2 | 0.000 | 0.831 | 1.294 | |
$j\mathrm{=3}$ | 3 | 0.000 | 0.896 | 1.377 | 3 | 0.000 | 0.891 | 1.374 | |
$i\mathrm{=2}$ | $j\mathrm{=4}$ | 2 | 0.130 | 0.828 | 1.293 | 1 | 0.144 | 0.654 | 1.013 |
$j\mathrm{=5}$ | 2 | 0.195 | 0.832 | 1.298 | 2 | 0.170 | 0.831 | 1.290 | |
$j\mathrm{=6}$ | 2 | 0.179 | 0.826 | 1.296 | 2 | 0.141 | 0.830 | 1.297 | |
$i\mathrm{=3}$ | $j\mathrm{=7}$ | 3 | 0.339 | 0.859 | 1.366 | 2 | 0.342 | 0.835 | 1.294 |
$j\mathrm{=8}$ | 3 | 0.381 | 0.859 | 1.367 | 4 | 0.350 | 0.807 | 1.294 | |
$j\mathrm{=9}$ | 4 | 0.319 | 0.822 | 1.288 | 5 | 0.248 | 0.655 | 1.010 |
In Tables 4.1 and 4.2, the cube method maintains the selection probabilities and yields a total estimator with the smallest standard deviations. Taking ${\gamma}^{2}$ equal to the ${50}^{\text{th}}$ percentile of the ${\chi}_{8}^{2}$ distribution for the rejective algorithm yields the poorer results, both in terms of selection probabilities and of the standard deviations of ${\overline{y}}_{sij}.$ The largest biases for the selection probabilities occur at the extreme $x$ values in Table 4.2. The selection probability for site $j\mathrm{=4}$ is underestimated by 11% with the rejective method based on the ${50}^{\text{th}}$ percentile and by 5% with the ${5}^{\text{th}}$ percentile. The probability is over estimated in the sites with the large values for $x.$
In Tables 4.1 and 4.2, the standard deviation for ${\widehat{Y}}_{ij}$ is, in most cases, smallest for the cube method and largest for the rejection algorithm based on the ${50}^{\text{th}}$ percentile. The standard deviations for the rejective algorithm are up to 10% larger than the ones for the cube method. In Table 4.2, the largest gain in efficiency of the cube method with respect to the $R\text{\hspace{0.17em}}\mathrm{5\%}$ rejective algorithm (equal to the ratio of standard deviations squared) is 23%; it occurs when $j\mathrm{=4}$ and $x\mathrm{=1.}$ These standard deviations are driven by the variability in sample sizes ${n}_{ij}.$ Table 4.3 gives the sample sizes’ standard deviations. Since the expected number of visits to sector 1 and to sites 1, 2, and 3 are integers, the cube method is able to get sample sizes equal to their expectations for this sector and the sample sizes standard deviations are 0. This is not possible in sectors 2 and 3 as the expected sample sizes for these sectors are not integer valued. In general, the rejective algorithms give sample sizes whose standard deviations are much more variable than those for the cube method. This makes the rejective algorithm total estimators more variable than those obtained with the cube method.
The conditional variance estimator for fishing effort ${\widehat{Y}}_{ij}$ in site $\left(i\mathrm{,}\text{\hspace{0.17em}}j\right)$ proposed in Section 3.2 is
$$v\left({\widehat{Y}}_{ij}\right)\mathrm{=}\frac{{H}^{2}\left(1-{n}_{ij}/H\right)}{{n}_{ij}}{\displaystyle \sum _{h\in {s}_{ij}}\frac{{\left({y}_{hij}-{\overline{y}}_{sij}\right)}^{2}}{{n}_{ij}-1}}\mathrm{.}$$
The conditional sampling properties, given ${n}_{ij},$ of this variance estimator were investigated in the Monte Carlo study with $B\mathrm{=}\text{10,000}$ balanced samples for the three sample designs. For each site and for each sample size ${n}_{ij}$ the conditional variance $\text{Var}\left({\widehat{Y}}_{ij}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{n}_{ij}\right)$ and the conditional expectation of the variance estimator $\text{E}\left\{v\left({\widehat{Y}}_{ij}\right)\right\}$ were evaluated using the Monte Carlo samples for which the sample size for site $\left(i\mathrm{,}\text{\hspace{0.17em}}j\right)$ was ${n}_{ij}.$ The conditional relative bias of the variance estimator, $\text{E}\left\{v\left({\widehat{Y}}_{ij}\right)\right\}/\text{Var}\left({\widehat{Y}}_{ij}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{n}_{ij}\right)-1,$ was then calculated. The conditional relative biases were then aggregated by weighting each sample size ${n}_{ij}$ using its frequency in the 10,000 Monte Carlo samples; the results are in Table 5.1.
In Table 5.1, the aggregated relative biases are less than 3% in absolute value for the three selection algorithms. This validates the conditional variance estimator proposed in Section 3.2 for a single cell of the cross-classified table. The conditional variances of sums such as ${\widehat{Y}}_{ij}+{\widehat{Y}}_{i{j}^{\prime}}$ is more complicated as it involves joint selection probabilities; the estimation of these variances is not considered here. See Breidt and Chauvet (2011) for a discussion of variance estimation with the cube method.
$x$ has a low variation | $x$ has a medium variation | ||||||||
---|---|---|---|---|---|---|---|---|---|
Sector | Site | $x$ | CM | $R\text{\hspace{0.17em}}\mathrm{5\%}$ | $R\text{\hspace{0.17em}}\mathrm{50\%}$ | $x$ | CM | $R\text{\hspace{0.17em}}\mathrm{5\%}$ | $R\text{\hspace{0.17em}}\mathrm{50\%}$ |
$i\mathrm{=1}$ | $j\mathrm{=1}$ | 3 | 1 | -3 | 3 | 3 | -1 | 1 | 1 |
$j\mathrm{=2}$ | 2 | 2 | -1 | -2 | 2 | 3 | 1 | -2 | |
$j\mathrm{=3}$ | 3 | -1 | 0 | 1 | 3 | 0 | -1 | 0 | |
$i\mathrm{=2}$ | $j\mathrm{=4}$ | 2 | -2 | 2 | 0 | 1 | 1 | -1 | -2 |
$j\mathrm{=5}$ | 2 | 1 | -1 | -1 | 2 | 2 | 2 | 3 | |
$j\mathrm{=6}$ | 2 | 0 | 3 | -2 | 2 | 0 | 0 | -3 | |
$i\mathrm{=3}$ | $j\mathrm{=7}$ | 3 | 1 | -3 | 2 | 2 | 0 | -3 | -1 |
$j\mathrm{=8}$ | 3 | 2 | 1 | 1 | 4 | 0 | 0 | 0 | |
$j\mathrm{=9}$ | 4 | -1 | 1 | -2 | 5 | -2 | -1 | 1 |
The conclusion of this Monte Carlo investigation is that the rejective algorithm changes the selection probabilities: sites with small importance are under represented in the rejective samples while the cube method is very good at preserving the selection probabilities. Under both algorithms the calibrated estimator for the total of $y$ in a domain is unbiased. Smaller variances are however obtained with the cube algorithm as it gives domain sample sizes that are less variable than the rejective algorithm.
Report a problem on this page
Is something not working? Is there information outdated? Can't find what you're looking for?
Please contact us and let us know how we can help you.
- Date modified: