# Using balanced sampling in creel surveys Section 4. Comparison of the cube method and the rejective algorithm

Chauvet et al. (2015) have studied the cube method and the rejective algorithm by examining different aspects of these balancing techniques. They balanced on continuous auxiliary variables and they documented how the balancing algorithm impacted the selection probabilities and the sampling properties of estimators of population totals. The goal of this section is to compare the two sampling algorithms in a resource inventory where the balancing equations only involve indicator variables. This comparison is carried out in the context of a simplified creel survey with a stratified two stage design. The days represent strata $h=1,\dots ,\text{\hspace{0.17em}}H,$ the sectors are defined as primary units $i=1,\text{\hspace{0.17em}}2,\text{\hspace{0.17em}}3$ and sites, indexed by $j,$ are the secondary units. This sampling plan is similar to the design exposed in Section 3.1 except that periods and subperiods do not enter in the sampling design.

On each day two out of 3 sectors are selected and within each one 2 sites are sampled; thus 4 units are selected each day. The site importance variable ${x}_{ij}$ determines the inclusion probabilities ${\pi }_{hij}=\left(2{x}_{i•}/{x}_{••}\right)×\left(2{x}_{ij}/{x}_{i•}\right)={\pi }_{hi}×{\pi }_{hj\text{ }|\text{ }i}$ for the two stages. As two out of three units are selected at each level, the joint selection probabilities are completely determined by $\left\{\left({\pi }_{hi},\text{\hspace{0.17em}}{\pi }_{hj\text{ }|\text{ }i}\right):i,\text{\hspace{0.17em}}j=1,\text{\hspace{0.17em}}2,\text{\hspace{0.17em}}3\right\}$ for the two stages; see the Appendix. If ${Z}_{hij}$ stands for the indicator variables taking the value 1 if site $\left(i,\text{\hspace{0.17em}}j\right)$ is sampled on day $h$ and 0 otherwise then the entries of $9×9$ variance covariance matrix for $\left\{{Z}_{hij}\text{ }:i,\text{\hspace{0.17em}}j=1,\text{\hspace{0.17em}}2,\text{\hspace{0.17em}}3\right\}$ are given by

$\text{Cov}\left({Z}_{hij},\text{\hspace{0.17em}}{Z}_{h{i}^{\prime }{j}^{\prime }}\right)=\left\{\begin{array}{ll}{\pi }_{hij}-{\pi }_{hij}^{2}\hfill & \text{if}\text{\hspace{0.17em}}i={i}^{\prime }\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}j={j}^{\prime }\hfill \\ {\pi }_{hi}\text{\hspace{0.17em}}{\pi }_{hj{j}^{\prime }\text{ }|\text{ }i}-{\pi }_{hij}\text{\hspace{0.17em}}{\pi }_{hi{j}^{\prime }}\hfill & \text{if}\text{\hspace{0.17em}}i={i}^{\prime }\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}j\ne {j}^{\prime }\hfill \\ {\pi }_{hi{i}^{\prime }}\text{\hspace{0.17em}}{\pi }_{hj\text{ }|\text{ }i}\text{\hspace{0.17em}}{\pi }_{h{j}^{\prime }\text{ }|\text{ }{i}^{\prime }}-{\pi }_{hij}\text{\hspace{0.17em}}{\pi }_{h{i}^{\prime }{j}^{\prime }}\hfill & \text{if}\text{\hspace{0.17em}}i\ne {i}^{\prime }\hfill \end{array}\text{ }\text{ }\text{ }\text{ }\text{ }\left(4.1\right)$

where ${\pi }_{hi{i}^{\prime }}$ represents the joint selection probability of sectors $i$ and ${i}^{\prime }$ on a single day, ${\pi }_{hj\text{ }|\text{ }i}$ is the probability for selecting site $j,$ in sector $i,$ at stage 2 and ${\pi }_{hj{j}^{\prime }\text{ }|\text{ }i}$ is the joint selection probability of sites $j$ and ${j}^{\prime }$ in sector $i.$ All these probabilities are evaluated using the size measure $x.$ Details are available in the appendix, see also Ousmane Ida (2016). The corresponding matrix $\text{Var}\left(\stackrel{˜}{n}\right)$ in (2.3) is singular as one of the 9 constraints is redundant; thus in (2.3) a generalized inverse of the covariance matrix was used and ${\gamma }^{2}\text{​},$ in (2.3), was set equal to 2.73 and 7.34, the ${5}^{\text{th}}$ and the ${50}^{\text{th}}$ percentiles of the ${\chi }_{8}^{2}$ distribution.

## 4.1  Simulations on the comparison of the cube method and of the rejective algorithm

To investigate the impact of the algorithm on the sampling properties of survey estimators we simulated, for each unit, a fishing effort for site $\left(i,\text{\hspace{0.17em}}j\right)$ on day $h,$ ${y}_{hij},$ using independent Poisson random variables with mean $15×{x}_{ij}.$ The total fishing effort for site $\left(i,\text{\hspace{0.17em}}j\right)$ is then

${Y}_{Uij}=\sum _{h=1}^{H}{y}_{hij}.$

A calibrated estimator, as defined in Section 3.2, for the fishing effort in site $\left(i,\text{\hspace{0.17em}}j\right)$ is ${\stackrel{^}{Y}}_{ij}=H\text{ }{\overline{y}}_{sij},$ the average fishing effort for the ${n}_{ij}$ units sampled at site $\left(i,\text{\hspace{0.17em}}j\right)$ times $H.$

To compare the balancing algorithms, we used designs with $H=12$ strata and two importance variables $x,$ one with a small variation between site and one with a medium variation. Under each scenario we generated $B=\text{100,000}$ random replications of a balanced sample by using the cube methods on one hand, and two rejective algorithms on the other. The inclusion probabilities for site $\left(i,\text{\hspace{0.17em}}j\right)$ was estimated by

${\stackrel{^}{\pi }}_{ij}=\frac{1}{B×H}\sum _{b=1}^{B}{n}_{ij}^{\left(b\right)}.$

This estimator assumes that the inclusion probabilities ${\pi }_{hij}$ are constant in $h.$ This holds true because the sample design is invariant to a relabelling of the days, see Section 3.1.

As argued in Section 3.2, the calibrated estimator ${\stackrel{^}{Y}}_{ij}$ is design unbiased under the two selection algorithms. We compare their standard deviations,

${\text{Sd}}_{{\stackrel{^}{Y}}_{ij}}={\left\{\frac{1}{B-1}\sum _{b=1}^{B}{\left({\stackrel{^}{Y}}_{ij}^{\left(b\right)}-{\overline{\stackrel{^}{Y}}}_{ij}\right)}^{2}\right\}}^{1/2}\text{​},$

where ${\overline{\stackrel{^}{Y}}}_{ij}$ is the average of the $B$ simulated values. The sample size standard deviations were also calculated using (3.2). Observe that ${\stackrel{^}{\pi }}_{ij}={\overline{n}}_{ij}/H.$ The simulation results are presented in Tables 4.1, 4.2 and 4.3.

CM $R\text{\hspace{0.17em}}5%$ $R\text{\hspace{0.17em}}50%$ Sector Site ${x}_{ij}$ ${\pi }_{ij}$ ${\overline{\stackrel{^}{\pi }}}_{ij}$ 3 0.500 0.500 16.56 0.503 16.86 0.505 17.40 2 0.333 0.333 22.20 0.329 23.35 0.328 25.07 3 0.500 0.500 23.99 0.503 24.47 0.505 25.15 2 0.333 0.333 25.80 0.329 26.93 0.326 29.11 2 0.333 0.333 33.97 0.329 35.54 0.326 38.28 2 0.333 0.333 27.65 0.329 28.87 0.326 31.10 3 0.500 0.500 22.50 0.502 22.88 0.502 23.66 3 0.500 0.500 20.02 0.502 20.20 0.502 20.94 4 0.667 0.667 22.01 0.674 21.98 0.679 22.25
CM $R\text{\hspace{0.17em}}5%$ $R\text{\hspace{0.17em}}50%$ Sector Site ${x}_{ij}$ ${\pi }_{ij}$ ${\overline{\stackrel{^}{\pi }}}_{ij}$ 3 0.500 0.500 25.52 0.505 25.78 0.507 26.60 2 0.333 0.333 25.25 0.330 26.26 0.329 28.16 3 0.500 0.500 21.12 0.505 21.36 0.507 22.03 1 0.167 0.167 29.17 0.158 32.45 0.149 31.19 2 0.333 0.333 13.73 0.329 14.38 0.326 15.49 2 0.333 0.333 32.82 0.329 34.22 0.326 36.91 2 0.333 0.333 16.84 0.329 17.52 0.325 18.85 4 0.667 0.667 18.68 0.672 18.70 0.678 18.89 5 0.833 0.833 8.06 0.844 7.81 0.854 7.67
$x$ $x$ Sector Site CM $R\text{\hspace{0.17em}}5%$ $R\text{\hspace{0.17em}}50%$ has a low variation has a medium variation $x$ 3 0.000 0.894 1.371 3 0.000 0.891 1.371 2 0.000 0.854 1.295 2 0.000 0.831 1.294 3 0.000 0.896 1.377 3 0.000 0.891 1.374 2 0.130 0.828 1.293 1 0.144 0.654 1.013 2 0.195 0.832 1.298 2 0.170 0.831 1.290 2 0.179 0.826 1.296 2 0.141 0.830 1.297 3 0.339 0.859 1.366 2 0.342 0.835 1.294 3 0.381 0.859 1.367 4 0.350 0.807 1.294 4 0.319 0.822 1.288 5 0.248 0.655 1.010

In Tables 4.1 and 4.2, the cube method maintains the selection probabilities and yields a total estimator with the smallest standard deviations. Taking ${\gamma }^{2}$ equal to the ${50}^{\text{th}}$ percentile of the ${\chi }_{8}^{2}$ distribution for the rejective algorithm yields the poorer results, both in terms of selection probabilities and of the standard deviations of ${\overline{y}}_{sij}.$ The largest biases for the selection probabilities occur at the extreme $x$ values in Table 4.2. The selection probability for site $j=4$ is underestimated by 11% with the rejective method based on the ${50}^{\text{th}}$ percentile and by 5% with the ${5}^{\text{th}}$ percentile. The probability is over estimated in the sites with the large values for $x.$

In Tables 4.1 and 4.2, the standard deviation for ${\stackrel{^}{Y}}_{ij}$ is, in most cases, smallest for the cube method and largest for the rejection algorithm based on the ${50}^{\text{th}}$ percentile. The standard deviations for the rejective algorithm are up to 10% larger than the ones for the cube method. In Table 4.2, the largest gain in efficiency of the cube method with respect to the $R\text{\hspace{0.17em}}5%$ rejective algorithm (equal to the ratio of standard deviations squared) is 23%; it occurs when $j=4$ and $x=1.$ These standard deviations are driven by the variability in sample sizes ${n}_{ij}.$ Table 4.3 gives the sample sizes’ standard deviations. Since the expected number of visits to sector 1 and to sites 1, 2, and 3 are integers, the cube method is able to get sample sizes equal to their expectations for this sector and the sample sizes standard deviations are 0. This is not possible in sectors 2 and 3 as the expected sample sizes for these sectors are not integer valued. In general, the rejective algorithms give sample sizes whose standard deviations are much more variable than those for the cube method. This makes the rejective algorithm total estimators more variable than those obtained with the cube method.

The conditional variance estimator for fishing effort ${\stackrel{^}{Y}}_{ij}$ in site $\left(i,\text{\hspace{0.17em}}j\right)$ proposed in Section 3.2 is

$v\left({\stackrel{^}{Y}}_{ij}\right)=\frac{{H}^{2}\left(1-{n}_{ij}/H\right)}{{n}_{ij}}\sum _{h\in {s}_{ij}}\frac{{\left({y}_{hij}-{\overline{y}}_{sij}\right)}^{2}}{{n}_{ij}-1}.$

The conditional sampling properties, given ${n}_{ij},$ of this variance estimator were investigated in the Monte Carlo study with $B=\text{10,000}$ balanced samples for the three sample designs. For each site and for each sample size ${n}_{ij}$ the conditional variance $\text{Var}\left({\stackrel{^}{Y}}_{ij}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{n}_{ij}\right)$ and the conditional expectation of the variance estimator $\text{E}\left\{v\left({\stackrel{^}{Y}}_{ij}\right)\right\}$ were evaluated using the Monte Carlo samples for which the sample size for site $\left(i,\text{\hspace{0.17em}}j\right)$ was ${n}_{ij}.$ The conditional relative bias of the variance estimator, $\text{E}\left\{v\left({\stackrel{^}{Y}}_{ij}\right)\right\}/\text{Var}\left({\stackrel{^}{Y}}_{ij}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{n}_{ij}\right)-1,$ was then calculated. The conditional relative biases were then aggregated by weighting each sample size ${n}_{ij}$ using its frequency in the 10,000 Monte Carlo samples; the results are in Table 5.1.

In Table 5.1, the aggregated relative biases are less than 3% in absolute value for the three selection algorithms. This validates the conditional variance estimator proposed in Section 3.2 for a single cell of the cross-classified table. The conditional variances of sums such as ${\stackrel{^}{Y}}_{ij}+{\stackrel{^}{Y}}_{i{j}^{\prime }}$ is more complicated as it involves joint selection probabilities; the estimation of these variances is not considered here. See Breidt and Chauvet (2011) for a discussion of variance estimation with the cube method.

$x$ $x$ Sector Site $x$ CM $R\text{\hspace{0.17em}}5%$ $R\text{\hspace{0.17em}}50%$ has a low variation has a medium variation 3 1 -3 3 3 -1 1 1 2 2 -1 -2 2 3 1 -2 3 -1 0 1 3 0 -1 0 2 -2 2 0 1 1 -1 -2 2 1 -1 -1 2 2 2 3 2 0 3 -2 2 0 0 -3 3 1 -3 2 2 0 -3 -1 3 2 1 1 4 0 0 0 4 -1 1 -2 5 -2 -1 1

The conclusion of this Monte Carlo investigation is that the rejective algorithm changes the selection probabilities: sites with small importance are under represented in the rejective samples while the cube method is very good at preserving the selection probabilities. Under both algorithms the calibrated estimator for the total of $y$ in a domain is unbiased. Smaller variances are however obtained with the cube algorithm as it gives domain sample sizes that are less variable than the rejective algorithm.

﻿

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: