# Using balanced sampling in creel surveys Section 3. A creel survey for striped bass in the Gaspé Peninsula

The Gaspé Peninsula is on the Canadian East Coast in the Province of Québec. In 2015 a creel survey for striped bass was conducted in this peninsula as recreational striped bass fishing had just been reintroduced after a long moratorium.

The study area, presented in Figure 3.1, is scattered over more than 250 kms, on the Gaspé Peninsula coast. The survey is carried out by a single wildlife agent; it is not possible for him to visit two distant sites on the same day. For that reason, neighboring sites are grouped into three sectors as shown in Figure 3.1. We consider the survey for the 33 holidays. The survey variable is the fishing effort, in number of hours of fishing. As some sites attract more fishermen than others, the number of visits to site $l$ of sector $i$ has to be proportional to its importance ${x}_{il}$ as given in Table 3.1. In addition, for the purpose of the survey, a day is divided into three periods (AM, PM, EV), where EV stands for evening, and six subperiods (AM1, AM2, PM1, PM2, and EV1, EV2). For instance AM1 goes from 8:00 to 10:00 while AM2 is from 10:00 to 12:00. A working day contains two periods and four subperiods. For instance if the agent works AM and PM, then he has a free evening. Thus during a working day he is able to visit four sites, two per working period.

The survey population on a day consists of 54 quadruplets, $\left(\text{sector}\text{\hspace{0.17em}}\text{×}\text{\hspace{0.17em}}\text{period}\text{\hspace{0.17em}}\text{×}\text{\hspace{0.17em}}\text{subperiod}\text{\hspace{0.17em}}\text{×}\text{\hspace{0.17em}}\text{site}\right),$ 4 of which are sampled. To denote population units the following indices are useful:

1. $h=1,\dots ,\text{\hspace{0.17em}}H=33$ represents the days;
2. $i=1,2,3$ stands for the sectors in Figure 3.1;
3. $j=1,2,3$ denotes a period within a day;
4. $k=1,2$ represents the subperiods within a period;
5. $l=1,2,3$ represents the sites, see Figure 3.1, within a sector.

The goal is to estimate the fishing effort for combination of subperiod (6 levels) and site (9 levels). We want to plan a survey with a predetermined sample size for the 54 cells of the cross-classified table. The basic selection probabilities are

${\pi }_{hijkl}=\frac{2{x}_{il}}{3{x}_{••}},\text{ }\text{ }\text{ }\text{ }\text{ }\left(3.1\right)$

where replacing $i$ or $l$ by $•$ means that a summation is taken on the corresponding index. Observe that the sum of ${\pi }_{hijkl}$ over the indices $\left(i,j,k,l\right)$ is equal to 4, the number of units visited by the wildlife technician on a single day.

At a first glance, the sample could possibly be drawn in a single stage using selection probabilities (3.1) by balancing on the 54 site by subperiod indicator variables. This is not feasible because of operational constraints. The first one is that on a single day the technician visits sites from the same sector to limit the traveling between sites. The second constraint is that on a working day the technician is off duty for the two subperiods of the same period. In order to meet these operational constraints we propose, in the next section, a design having three levels of sampling where sectors are selected at level 1, periods are selected at level 2 and sites are selected at level 3.

Description for Figure 3.1

Geographical map showing the nine sites to be surveyed for striped bass. The map is divided in three sectors: East, Centre and West. There are three surveyed sites in each sector. In the East sector, there are Boom Défense, E. St-Jean and Barachois. In the Centre sector, there are Ste-T. de Gaspé, Chandler and Malbaie. In the West sector, there are Bonaventure, P. Henderson and C. Carleton.

Table 3.1
Average and expected number of visits to each site
Table summary
This table displays the results of Average and expected number of visits to each site. The information is grouped by Sector (appearing as row headers), Site and (équation) (appearing as column headers).
Sector Site ${x}_{il}$ $E\left({n}_{il}\right)$ ${\overline{n}}_{il}$ ${\text{Sd}}_{{n}_{il}}$
East $\left(i=1\right)$ Boom Défense $\left(l=1\right)$ 2 20.308 20.286 0.850
E. St-Jean $\left(l=2\right)$ 1 10.154 10.153 0.621
Barachois $\left(l=3\right)$ 2 20.308 20.296 0.881
Centre $\left(i=2\right)$ Ste-T. de Gaspé $\left(l=4\right)$ 1 10.154 10.176 0.865
Malbaie $\left(l=5\right)$ 1 10.154 10.155 0.880
Chandler $\left(l=6\right)$ 1 10.154 10.162 0.881
West $\left(i=3\right)$ Bonaventure $\left(l=7\right)$ 2 20.308 20.311 1.004
P. Henderson $\left(l=8\right)$ 1 10.154 10.153 0.681
C. Carleton $\left(l=9\right)$ 2 20.308 20.309 1.016

## 3.1  A balanced multi-stage design for creel survey

This section describes the three stages of the survey that ensures that the operational constraints presented in the previous section are met. It also gives, for each stage, the balancing variables.

The first stage is stratified by day; for each day a single sector is drawn with selection probabilities ${x}_{i•}/{x}_{••}.$ At level two, for each sector selected at level 1, two periods are selected out of 3 using simple random sampling (i.e., with selection probabilities 2/3). At level three, a sector*period is stratified by subperiod and one site is selected for each subperiod, the selection probabilities are ${x}_{il}/{x}_{i•}.$ In summary the selection probabilities at the three levels are

${\pi }_{hi}^{\left(1\right)}=\frac{{x}_{i•}}{{x}_{••}},\text{ }{\pi }_{j\text{ }|\text{ }i}^{\left(2\right)}=\frac{2}{3},\text{ }{\pi }_{l\text{ }|\text{ }ijk}^{\left(3\right)}=\frac{{x}_{il}}{{x}_{i•}}.$

As expected the product ${\pi }_{hi}^{\left(1\right)}×{\pi }_{j\text{ }|\text{ }i}^{\left(2\right)}×{\pi }_{l\text{ }|\text{ }ijk}^{\left(3\right)}$ is equal to (3.1), the target selection probability.

The goal is still to get a sample with predetermined sample sizes for the 54 site by subperiod combinations. Thus balanced sampling needs to be implemented at each stage. At level 1 we need to balance on the indicator variables for the three sectors while at level 2 balancing on the 9 indicator variables for the sector by period combinations is needed. Balancing at level 3 is slightly more complicated as it involves several strata.

At level 2, $33×2=66$ sector*periods have been selected. Each one is stratified by subperiod so we are facing 132 strata at level 3 and one site is selected from each one. Balancing is needed with respect to the 54 site by subperiod indicator functions. This is a complex problem and the balancing constraints (2.3) involve the inverse of a large variance covariance matrix. Thus to implement a rejective algorithm in this context one would need an alternative to criterion (2.3) for accepting a sample. For now we discuss the implementation of balanced sampling for this design with the cube method. Comparisons between the cube method and rejective sampling in the context of a simplified creel survey are presented in Section 4.

Among the 132 third stage strata, the number of strata for one subperiod, say AM2, in sector $i$ is an integer close to $22{x}_{i•}/{x}_{••}$ that depends on the stage 2 sample. This integer plays the role of ${\sum }_{i=1}^{N}\text{\hspace{0.17em}}{I}_{i}\left(\omega \right)$ in equation (2.2) for balancing the sites of sector $i$ at stage 3 while, for the ${l}^{\text{th}}$ site, the probability in (2.2) is ${\pi }_{\omega }={x}_{il}/{x}_{i•}.$ The stage 3 calibration equations for the 54 site by subperiod indicator functions can be described in a similar way. Clearly, it is not possible to meet exactly the 54 balancing equations and the cube method will give a sample that is approximately balanced.

The approximation occurs at the landing phase of the algorithm where balancing constraints are dropped in order to complete the selection of the sample, as introduced in Deville and Tillé (2004). As the stage 3 sample is highly stratified, we use the implementation of the landing phase in the function balancedstratification2 developed in Hasler and Tillé (2014), with a small correction that prevents it from stopping when the sample is already balanced at the start of the landing phase. In the matrix of balancing constraints, the site constraints were given more importance than those which make visits to each site equally distributed among subperiods at level 3. They were the last ones to be dropped at the landing phase of the cube method.

To investigate how a failure to meet all balancing equations impacted the sample design, we generated $B=\text{10,000}$ random replications of the balanced sample. The number of visits ${n}_{il}$ to site $\left(i,\text{\hspace{0.17em}}l\right)$ was noted. Table 3.1 compares the average ${\overline{n}}_{il}$ of ${n}_{il}$ over the Monte Carlo replications,

${\overline{n}}_{il}=\frac{1}{B}\sum _{b=1}^{B}{n}_{il}^{\left(r\right)},$

to its expectation, $E\left({n}_{il}\right).$ For all practical purposes, the two are equal and a failure to meet some balancing equations has no impact on the site selection probabilities. Table 3.1 also reports the standard deviations

${\text{Sd}}_{{n}_{il}}={\left\{\frac{1}{B-1}\sum _{b=1}^{B}{\left({n}_{il}^{\left(b\right)}-{\overline{n}}_{il}\right)}^{2}\right\}}^{1/2}\text{​}.\text{ }\text{ }\text{ }\text{ }\text{ }\left(3.2\right)$

Most of the standard deviations are less than 1 in Table 3.1. Thus the absolute differences between target and realized sample sizes are less than or equal to 2 for most Monte Carlo samples.

Table 3.2 gives the expected number of visits in the 6 subperiods; they are all equal to 22, up to two decimal points, with standard deviations less than 0.2. Thus the period and subperiod constraints are met. Table 3.3 gives a realized sample for the first five days of the creel survey. It shows a harmonious permutation of sectors at level 1, periods at level 2, and sites at level 3 through the days because of the way in which the sample design was constructed. Given a balanced sample produced by the cube algorithm, an arbitrary permutation of the days gives an alternative balanced sample. Indeed the sampling design is invariant to a relabeling of the days. For instance, with the sample of Table 3.3 the technician has to travel from the western to the eastern sector between days 4 and 5. To avoid this long trip one could interchange days 1 and 5: the first two days would then be spent in the eastern sector and between days 4 and 5 the technician would travel from the western to the central sector. The alternative and the original samples have the same estimated totals for the calibration variables.

Table 3.2
Average and expected number of visits at each subperiod
Table summary
This table displays the results of Average and expected number of visits at each subperiod. The information is grouped by Period (appearing as row headers), Subperiod and (équation) (appearing as column headers).
Period Subperiod $E\left({n}_{jk}\right)$ ${\overline{n}}_{jk}$ ${\text{Sd}}_{{n}_{jk}}$
Morning $\left(j=1\right)$ 8h00-10h00 $\left(k=1\right)$ 22 22.000 0.000
10h00-12h00 $\left(k=2\right)$ 22 22.000 0.000
Afternoon $\left(j=2\right)$ 12h00-15h00 $\left(k=3\right)$ 22 21.999 0.184
15h00-18h00 $\left(k=4\right)$ 22 21.999 0.184
Evening $\left(j=3\right)$ 18h00-20h30 $\left(k=5\right)$ 22 22.001 0.184
20h30-23h00 $\left(k=6\right)$ 22 22.001 0.184
Table 3.3
Units selected in a balanced sample for the first five days
Table summary
This table displays the results of Units selected in a balanced sample for the first five days. The information is grouped by H (appearing as row headers), Sector, Period, Subperiod and Site (appearing as column headers).
H Sector Period Subperiod Site
1 Centre $\left(i=2\right)$ Afternoon $\left(j=2\right)$ 12h-15h $\left(k=3\right)$ Chandler $\left(l=6\right)$
15h-18h $\left(k=4\right)$ Malbaie $\left(l=5\right)$
Evening $\left(j=3\right)$ 18h-20h30 $\left(k=5\right)$ Chandler $\left(l=6\right)$
20h30-23h $\left(k=6\right)$ Ste-T. de Gaspé $\left(l=4\right)$
2 East $\left(i=1\right)$ Morning $\left(j=1\right)$ 8h-10h $\left(k=1\right)$ E. St-Jean $\left(l=2\right)$
10h-12h $\left(k=2\right)$ Boom Défense $\left(l=1\right)$
Evening $\left(j=3\right)$ 18h-20h30 $\left(k=5\right)$ Barachois $\left(l=3\right)$
20h30-23h $\left(k=6\right)$ E. St-Jean $\left(l=2\right)$
3 Centre $\left(i=2\right)$ Morning $\left(j=1\right)$ 8h-10h $\left(k=1\right)$ Malbaie $\left(l=5\right)$
10h-12h $\left(k=2\right)$ Ste-T. de Gaspé $\left(l=4\right)$
Afternoon $\left(j=2\right)$ 12h-15h $\left(k=3\right)$ Malbaie $\left(l=5\right)$
15h-18h $\left(k=4\right)$ Chandler $\left(l=6\right)$
4 West $\left(i=3\right)$ Morning $\left(j=1\right)$ 8h-10h $\left(k=1\right)$ P. Henderson $\left(l=8\right)$
10h-12h $\left(k=2\right)$ Bonaventure $\left(l=7\right)$
Afternoon $\left(j=2\right)$ 12h-15h $\left(k=3\right)$ C. Carleton $\left(l=9\right)$
15h-18h $\left(k=4\right)$ C. Carleton $\left(l=9\right)$
5 East $\left(i=1\right)$ Afternoon $\left(j=2\right)$ 12h-15h $\left(k=3\right)$ Boom Défense $\left(l=1\right)$
15h-18h $\left(k=4\right)$ Barachois $\left(l=3\right)$
Evening $\left(j=3\right)$ 18h-20h30 $\left(k=5\right)$ Boom Défense $\left(l=1\right)$
20h30-23h $\left(k=6\right)$ Barachois $\left(l=3\right)$

## 3.2  Estimation of the fishing effort and of its variance

Once the survey is completed, the sample is a set of site $×$ subperiod $\left\{\left(h,\text{\hspace{0.17em}}i,\text{\hspace{0.17em}}j,\text{\hspace{0.17em}}k,\text{\hspace{0.17em}}l\right)\right\}$ with sampling weights equal to the inverse of the selection probabilities given in (3.1). As the balancing equations for the 54 cells of the site by subperiod cross-classified table are not met exactly, we propose, following Deville and Tillé (2004), calibrating the survey weights on the total, $H,$ of the indicator variables for these 54 cells. All the sampled units in cell $\left(i,\text{\hspace{0.17em}}j,\text{\hspace{0.17em}}k,\text{\hspace{0.17em}}l\right)$ have the same weight, namely $1/{\pi }_{ijkl}$ where ${\pi }_{ijkl}={\pi }_{hijkl},$ defined in (3.1), does not depend on $h.$ The calibrated weight for a sampled unit in cell $\left(i,\text{\hspace{0.17em}}j,\text{\hspace{0.17em}}k,\text{\hspace{0.17em}}l\right)$ is

${w}_{ijkl}^{\left(c\right)}=\frac{1}{{\pi }_{ijkl}}×\frac{H}{{n}_{ijkl}/{\pi }_{ijkl}}=\frac{H}{{n}_{ijkl}},$

where ${n}_{ijkl}$ is the sample size for cell $\left(i,\text{\hspace{0.17em}}j,\text{\hspace{0.17em}}k,\text{\hspace{0.17em}}l\right);$ it is the number of days for which site $l$ of sector $i$ has been visited during subperiod $k$ of period $j.$ In general ${n}_{ijkl}$ is a random variable. When the samples are perfectly balanced, (2.2) implies that ${n}_{ijkl}=H{\pi }_{ijkl};$ the calibrated and basic weights are then equal. Now if ${y}_{hijkl}$ represents the fishing effort for population unit $\left(h,\text{\hspace{0.17em}}i,\text{\hspace{0.17em}}j,\text{\hspace{0.17em}}k,\text{\hspace{0.17em}}l\right),$ the fishing effort in cell $\left(i,\text{\hspace{0.17em}}j,\text{\hspace{0.17em}}k,\text{\hspace{0.17em}}l\right)$ is ${Y}_{Uijkl}={\sum }_{h}{y}_{hijkl}.$ Its calibrated estimator is ${\stackrel{^}{Y}}_{ijkl}=H\text{ }{\overline{y}}_{sijkl}$ where ${\overline{y}}_{sijkl}$ is the average fishing effort for the ${n}_{ijkl}$ units sampled for that cell of the cross classified table. An estimator for the total fishing effort is obtained by summing the cells’ estimated totals.

The evaluation of a design based variance estimator for the calibrated estimator of the total fishing effort is complex. A simple variance estimator for the estimated total for a single cell of the cross-classified table is available. The sample of days selected for cell $\left(i,\text{\hspace{0.17em}}j,\text{\hspace{0.17em}}k,\text{\hspace{0.17em}}l\right)$ is a Bernoulli sample with selection probabilities ${\pi }_{ijkl},$ neglecting the balancing constraints. Thus by conditioning on the sample size, ${n}_{ijkl},$ ${\stackrel{^}{Y}}_{ijkl}$ is $H$ times the sample mean of a simple random sample. It is a design-unbiased estimator whose variance can be estimated using the formula for the variance of an estimated total in a simple random sampling design. We claim that these results are still valid when the balancing constraints are taken into account since the balanced sample design is invariant to a relabelling of the days. The estimated fishing efforts for the 54 cells of the cross-classified table are however dependent and it seems difficult to come up with a conditionally unbiased design based variance estimator for their total. A model based estimator seems to be only approach available for this total.

For the survey actually conducted in 2015, the methods used to estimate fishing effort and total catch are among those proposed in Pollock et al. (1994). It was a roving survey and the fishing effort at a sampled site was calculated as the average number of anglers on the site during the subperiod times the length, in hours, of the subperiod. Fishing efforts were estimated using calibrated weights; additional results are available in (Daigle, Crépeau, Bujold and Legault, 2015).

﻿

Is something not working? Is there information outdated? Can't find what you're looking for?