Cost optimal sampling for the integrated observation of different populations
Section 4. Informative contexts and optimization problem

Table of contents

Optimization problems as presented in (3.1) are quite theoretical since one needs to know the values of the variables of interest in both populations $U^{A}$ and $U^{B},$ and the values of actual links among the units of the two populations. We now present three more concrete contexts involving various amount of information. We start from two contexts in which the information is very rich, whereas the third context considers a case in which the information is very poor. The latter context is the most common, although the growing availability of administrative registers and statistical software tools for data integration increases the plausibility of the first two contexts.

Context 1. The sampling frames for $U^{A}$ and $U^{B}$ are available. All the values $L_{j}^{A},$ $L_{j, i}^{B}$ and $L_{i}^{B}$ are known and the values of $y_{j, v},$ $y_{i, r}$ are unknown but can be predicted by suitable superpopulation models.

This context may be realistic in countries, such as the Nordic ones, having well established register-based systems (Wallgren and Wallgren, 2014) in which the units of a given statistical register have unique identifiers of good quality, which allows identification of the same unit in the whole systems of registers. For the agricultural example, this means that one can link each farm to one or more rural households, and each rural household to one or more farms.

The working models that we study can be expressed under the following forms:

$\begin{array}{l} U n i t l e v e l & C l u s t e r l e v e l \\ {\begin{array}{l} y_{j, v} = {\tilde{y}}_{j, v} + u_{j, v} = f_{v} (x_{j}; φ_{v}) + u_{j, v} \\ E_{M_{v}} (u_{j, v}) = 0, E_{M_{v}} (u_{j, v}^{2}) = σ_{j, v}^{2}, \forall j \\ E_{M_{v}} (u_{j, v}, u_{l, v}) = 0, \forall j \neq l \end{array}, & {\begin{array}{l} y_{i, r} = {\tilde{y}}_{i, r} + u_{i, r} = f_{r} (x_{i}; φ_{r}) + u_{i, r} \\ E_{M_{r}} (u_{i, r}) = 0, E_{M_{r}} (u_{i, r}^{2}) = σ_{i, r}^{2}, \forall i \\ E_{M_{r}} (u_{i, r}, u_{i^{^{'}} , r}) = 0, \forall i \neq i^{'} \end{array} (4.1) \end{array}$

where, omitting the subscripts for sake of brevity, $x$ are vectors of predictors (available in the two sampling frames), $φ$ are the vectors of regression coefficients and $f (x; φ)$ are known functions, $u$ are the error terms, $\tilde{y}$ are the predicted values and $E_{M} (\cdot)$ denote the expectations under the models. The predictors $x$ in the unit and cluster level models can be different. We assume that the parameters of the models are known, although in practice they are usually estimated.

Even if the model $f_{r} (\cdot)$ is not known, the model expectations at cluster level for the population $U^{B}$ can be derived from a model defined at elementary unit level, indicated with $f_{r e} (\cdot) .$ The elementary unit level model can be stated as $y_{i k, r} = {\tilde{y}}_{i k, r} + u_{i k, r} = f_{r e} (x_{i k}; φ_{r}) + u_{i k, r};$ $E_{M_{r e}} (u_{i k, r}) = 0;$ $E_{M_{r e}} (u_{i k, r}^{2}) = σ_{r}^{2};$ $E_{M_{r e}} (u_{i k, r}, u_{i k^{^{'}} , r}) = σ_{r}^{2} ρ_{r} \forall k \neq k^{'};$ $E_{M_{r e}} (u_{i k, r}, u_{i^{^{'}} k^{^{'}} , r}) = 0 \forall i \neq i^{'};$ where $ρ_{r}$ is the intra-cluster correlation.

The model expectations at cluster level on the right-hand side of (4.1) can be easily derived as:

${\tilde{y}}_{i, r} = \sum_{k = 1}^{M_{i}^{B}} {\tilde{y}}_{i k, r}; σ_{i, r}^{2} = M_{i}^{B} σ_{r}^{2} [1 + (M_{i}^{B} - 1) ρ_{r}]; E_{M_{r}} (u_{i, r}, u_{i^{^{'}} , r}) = 0$ for $i \neq i^{'} .$

Note that the working models (4.1) are variable specific. They are introduced as useful tools for developing the sampling design, but they are not necessarily representing exactly the real models generating the data.

According to (4.1), the model predictions and the variances of the $z$ variables are given by

$E_{M_{r}} (z_{j, r}) = {\tilde{z}}_{j, r} = \sum_{i = 1}^{N^{B}} {\tilde{L}}_{j, i}^{B} {\tilde{y}}_{i, r}$ and $V_{M_{r}} (z_{j, r}) = σ_{j, z r}^{2} = \sum_{i = 1}^{N^{B}} {({\tilde{L}}_{j, i}^{B})}^{2} σ_{i, r}^{2} . (4.2)$

Thus, in the optimization problem (3.1), the variance terms, $V ({\hat{Y}}_{v}^{A} | m^{A})$ and $V ({\hat{Y}}_{r}^{B} | m^{A}),$ are replaced by the Anticipated Variances. Denoting with $E (\cdot)$ the expectation under the sampling design, the anticipated variance (AV) of ${\hat{Y}}_{v}^{A}$ may be reformulated as follows:

$AV ({\hat{Y}}_{v}^{A}) = E_{M_{v}} E {({\hat{Y}}_{v}^{A} - Y_{v}^{A})}^{2} = E_{M_{v}} V ({\hat{Y}}_{v}^{A} - Y_{v}^{A}) + V_{M_{v}} E ({\hat{Y}}_{v}^{A} - Y_{v}^{A}) .$

We have

$E ({\hat{Y}}_{v}^{A} - Y_{v}^{A}) = 0,$

and

$V ({\hat{Y}}_{v}^{A} - Y_{v}^{A}) = V ({\hat{Y}}_{v}^{A} | m^{A}) ≅ \sum_{j \in U^{A}} (\frac{1}{π_{j}^{A}} - 1) η_{j, v}^{2} .$

The same result may be derived for the estimate ${\hat{Y}}_{r}^{B} .$ Thus, we obtain the following expressions:

$AV ({\hat{Y}}_{v}^{A}) = E_{M_{v}} V ({\hat{Y}}_{v}^{A} | m^{A}) ≅ \sum_{j \in U^{A}} (\frac{1}{π_{j}^{A}} - 1) E_{M_{v}} (η_{j, v}^{2}) (4.3)$

$AV ({\hat{Y}}_{r}^{B}) = E_{M_{r}} V ({\hat{Y}}_{r}^{B} | m^{A}) ≅ \sum_{j \in U^{A}} (\frac{1}{π_{j}^{A}} - 1) E_{M_{r}} (η_{j, r}^{2}) (4.4)$

where $E_{M_{v}} (η_{j, v}^{2})$ and $E_{M_{r}} (η_{j, r}^{2})$ are given by expressions (A.2) and (B.2) of Appendices A and B.

The problem (3.1) for searching the optimal $π^{A}$ vector is then reformulated as follows:

${\begin{array}{l} min \sum_{j \in U^{A}} c_{j} π_{j}^{A} \\ E_{M_{v}} V ({\hat{Y}}_{v}^{A} | m^{A}) \leq V_{v}^{*} \forall v = 1, \dots, V \\ E_{M_{r}} V ({\hat{Y}}_{r}^{B} | m^{A}) \leq V_{r}^{*} \forall r = 1, \dots, R \\ 0 < π_{j}^{A} \leq 1 \forall j = 1, \dots, M^{A} . \end{array} (4.5)$

Remark 4.1. The anticipated variances in (4.5) have cumbersome formulae. A conservative simplified expression of $E_{M_{v}} V ({\hat{Y}}_{v}^{A} | m^{A})$ is given in Remark 4.1 of Falorsi and Righi (2015). More simplified conservative approximations of both $E_{M_{v}} V ({\hat{Y}}_{v}^{A} | m^{A})$ and $E_{M_{r}} V ({\hat{Y}}_{r}^{B} | m^{A})$ are obtained by approximating the sampling design variance with the Poisson sampling variance. We then have

$E_{M_{v}} V ({\hat{Y}}_{v}^{A} | m^{A}) \leq \sum_{j \in U^{A}} (\frac{1}{π_{j}^{A}} - 1) E_{M_{v}} (y_{j, v}^{2}), E_{M_{r}} V ({\hat{Y}}_{r}^{B} | m^{A}) \leq \sum_{j \in U^{A}} (\frac{1}{π_{j}^{A}} - 1) E_{M_{r}} (z_{j, r}^{2}),$

replacing $η_{j, υ}$ and $η_{j, r}$ by $y_{j, υ}$ and $z_{j, r},$ respectively, where $E_{M_{v}} (y_{j, v}^{2}) = {\tilde{y}}_{j, v}^{2} + σ_{j, v}^{2}$ and $E_{M_{r}} (z_{j, r}^{2}) = {\tilde{z}}_{j, r}^{2} + σ_{j, z r}^{2}$ (see Appendix B). Conservative approximations are a safe choice in this setting, since they eliminate the risk of defining an insufficient sample size for the expected accuracies.

Remark 4.2. Lavallée and Labelle-Blanchet (2013) deal with the problem of indirect sampling applied to skewed populations by suggesting eight alternative methods for modifying the links, $l_{j, i k},$ to reduce the variance of the estimates in the presence of skewed populations, while keeping estimation unbiased. Using the methods 2 and 3 proposed by these authors, the algorithm can run by simply replacing the links $l_{j, i k}$ by weighted links, $θ_{j, i k},$ in $E_{M_{r}} V ({\hat{Y}}_{r}^{B} | m^{A}) .$

Context 2. The links $l_{j, i k}$ are not known with certainty but the probabilities of links existing, $\Pr (l_{j, i k} = 1) = λ_{j, i k},$ are available.

To include the linkage uncertainty in the optimization, we assume the links follow a Bernoulli model $M_{l},$ $l_{j, i k} \sim B (λ_{j, i k}),$ where $E_{M_{l}} (l_{j, i k}) = λ_{j, i k}$ and $V_{M_{l}} (l_{j, i k}) = λ_{j, i k} (1 - λ_{j, i k}) .$ We assume the parameters $λ_{j, i k}$ to be known, although in practice they are usually estimated with probabilistic record linkage procedures (Lavallée and Caron, 2001). For the agricultural example, such a situation would occur when, for instance, the population of farms is linked to the population of rural households using probabilistic record linkage because no common identifier exists. In this framework, the anticipated variance must take into account both models $M_{l}$ and $M_{r} .$ Since

$E_{M_{l}} E_{M_{r}} E {({\hat{Y}}_{r}^{B} - Y_{r}^{B})}^{2} = E_{M_{l}} E_{M_{r}} V ({\hat{Y}}_{r}^{B} - Y_{r}^{B}) + E_{M_{l}} V_{M_{r}} E ({\hat{Y}}_{r}^{B} - Y_{r}^{B}) + V_{M_{l}} E_{M_{r}} E ({\hat{Y}}_{r}^{B} - Y_{r}^{B})$

and $E ({\hat{Y}}_{r}^{B} - Y_{r}^{B}) = 0,$ the problem (4.5) can be reformulated as follows:

${\begin{array}{l} min \sum_{j \in U^{A}} E_{M_{l}} (c_{j}) π_{j}^{A} \\ E_{M_{v}} V ({\hat{Y}}_{v}^{A} | m^{A}) \leq V_{v}^{*} \forall v = 1, \dots, V \\ E_{M_{l}} E_{M_{r}} V ({\hat{Y}}_{r}^{B} | m^{A}) \leq V_{r}^{*} \forall r = 1, \dots, R \\ 0 < π_{j}^{A} \leq 1 \forall j = 1, \dots, M^{A} \end{array} (4.6)$

where

$E_{M_{l}} E_{M_{r}} V ({\hat{Y}}_{r}^{B} | m^{A}) ≅ \sum_{j \in U^{A}} (\frac{1}{π_{j}^{A}} - 1) E_{M_{l}} E_{M_{r}} (η_{j, r}^{2}), (4.7)$

$E_{M_{l}} (c_{j}) = f_{c} (Λ_{j}^{A}; C^{B}),$

with $Λ_{j}^{A} = \sum_{i = 1}^{N^{B}} Λ_{j, i}^{B}$ and $Λ_{j, i}^{B} = \sum_{k = 1}^{M_{i}^{B}} λ_{j, i k} .$

The main results for the derivation of the expression of $E_{M_{l}} E_{M_{r}} V ({\hat{Y}}_{r}^{B} | m^{A})$ are given in Appendix C. These are derived using Taylor series approximation and postulating the independence of the process which generates the links $l_{j, i k}$ with the one that creates the variables of interest $y_{i, r} .$ Under these approximations, the predicted values ${\tilde{z}}_{j, r}$ are obtained as

${\tilde{z}}_{j, r} ≅ \sum_{i = 1}^{N^{B}} {\tilde{Λ}}_{j, i}^{B} {\tilde{y}}_{i, r} (4.8)$

where

${\tilde{Λ}}_{j, i}^{B} = \frac{Λ_{j, i}^{B}}{Λ_{i}^{B}}$

with

$Λ_{i}^{B} = \sum_{j = 1}^{M^{A}} Λ_{j, i}^{B} . (4.9)$

The uncertainty on total survey costs, which depends both on the selected sample and the model uncertainty on costs, obliges us to consider the expected costs $E_{M_{l}} (c_{j})$ in the optimization problem. Steel and Clark (2014) show how the uncertainty on the expected costs can affect the accuracy of the sample design.

Context 3. Data integration is not possible because the record linkage process does not provide good linkages, or simply because the frame of population $U^{B}$ does not exist.

This is the most common context in developing countries. It may also characterize specific survey contexts in developed countries, for instance in the case of hard-to-reach populations. Returning to the agricultural example, this would mean that one might have a list of farms, but not a list of rural households. In this case, the problem of optimal integrated sampling can be solved by using all the available information, even if of poor quality. In the following, three options for dealing with the optimization problem are illustrated starting from the option which requires the minimum of information to those which need more information that could be expensive to obtain.

Option 3.1. Building the predictions of the $z$ variables and decreasing the variance thresholds $V_{r}^{*}$ by a scale factor. Suppose that from the frame of population $U^{A},$ it is possible to know the values of a size variable $γ$ related to the total links $L_{j}^{A}$ of the units $j .$ For instance, if the population $U^{A}$ is a population of farms and the population $U^{B}$ is a population of households, then the number of workers in the farms $(variable γ_{j})$ can represent a good approximation of the total number of links, $L_{j}^{A},$ of the farm. Suppose further that the totals or the estimated totals, ${\tilde{Y}}_{r (q)}^{B},$ are available at certain domain level, $U_{(q)}^{B}$ $(q = 1, \dots, Q),$ defined at geographic level, with $U^{B} = \cup_{q = 1}^{Q} U_{(q)}^{B}$ and $U_{(q)}^{B} \cap U_{(q^{'})}^{B} = \emptyset$ for $q \neq q^{'} .$ Then the predicted $z$ variables can be defined as:

${\tilde{z}}_{j, r} = \frac{γ_{j}}{\sum_{l \in U_{(q)}^{A}} γ_{l}} {\tilde{Y}}_{r (q)}^{B}$ pour $j \in U_{(q)}^{A}, (4.10)$

where $U_{(q)}^{A}$ denotes the geographic domain $q$ for the population $U^{A} .$ In practice, the ratio approach in (4.10) assumes that unit $j$ can be given a share of the total ${\tilde{Y}}_{r (q)}^{B}$ proportional to the size of the unit itself. Other examples of building the predictions of the $z$ values are illustrated in Section 5.3.2 of Guidelines on Integrated Survey Framework (FAO, 2015).

Having determined the predictions, ${\tilde{z}}_{j, r},$ it may be reasonable to assume that the following relationship holds:

$E_{M_{z r}} (z_{j, r}^{2}) = {\tilde{z}}_{j, r}^{2} + σ_{j, z r}^{2} ≅ k_{r} {\tilde{z}}_{j, r}^{2}, (4.11)$

where $k_{r} > 1.$ Under (4.11), it is straightforward to show that

$E_{M_{z r}} V ({\hat{Y}}_{r}^{B} | m^{A}) ≅ k_{r} V ({\hat{\tilde{Y}}}_{r}^{B} | m^{A}),$

where ${\hat{\tilde{Y}}}_{r}^{B} = \sum_{j \in s^{A}} w_{j}^{A} {\tilde{z}}_{j, r} .$ The sampling variance $V ({\hat{\tilde{Y}}}_{r}^{B} | m^{A})$ may be computed using expressions (2.2), (2.3), (2.4) and (2.5) by substituting the variable $y_{j, v}$ the prediction ${\tilde{z}}_{j, r} .$ The optimization problem for searching for the optimal $π^{A}$ vector can then be reformulated as:

${\begin{array}{l} min \sum_{j \in U^{A}} E_{M_{Λ}} (c_{j}) π_{j}^{A} \\ E_{M_{v}} V ({\hat{Y}}_{v}^{A} | m^{A}) \leq V_{v}^{*} \forall v = 1, \dots, V \\ V ({\hat{\tilde{Y}}}_{r}^{B} | m^{A}) \leq V_{r}^{*} / k_{r} \forall r = 1, \dots, R \\ 0 < π_{j}^{A} \leq 1 \forall j = 1, \dots, M^{A} . \end{array} (4.12)$

The sample designer may find the solution by running the optimization problem (4.12) with alternative reasonable choices of the $k_{r}$ value $(e .g ., k_{r} = 2, 3 or 4),$ and studying the sensitivity of the different solutions. Note that $k_{r} ≅ 1 + {[CV (z_{j, r})]}^{2},$ where ${[CV (z_{j r})]}^{2} = σ_{j, z r}^{2} / {\tilde{z}}_{j, r}^{2} .$ Therefore (4.11) holds if the ${[CV (z_{j, r})]}^{2}$ values are approximately constant.

Option 3.2. Extremal case of Context 2, with uniformity of links in specific domains. If the number or estimated number of clusters and of elementary units $N_{(q)}^{B}$ and $M_{(q)}^{B}$ of the domains $U_{(q)}^{B}$ $(q = 1, \dots, Q)$ are available, then in the absence of information on the links $l_{j, i k},$ it might be reasonable to assume that these are homogeneous over the domains; that is, $l_{j, i k} \sim B (λ_{j, i k}),$ where $λ_{j, i k} = γ_{j} / M_{(q)}^{B} .$

Furthermore, suppose that, in this context, the predictions ${\tilde{y}}_{i, r}$ and the sampling variances $σ_{i, r}^{2}$ could be assumed to be homogeneous within the domains $U_{(q)}^{B},$ i.e., ${\tilde{y}}_{i, r} = {\tilde{y}}_{r (q)}$ and $σ_{i, r}^{2} = σ_{r (q)}^{2}$ for $i \in U_{(q)}^{B} .$ Then, the optimization problem may be dealt with as an extremal case of Context 2, with uniformity of links in specific domains.

Remark 4.3. Note that with this option, the predictions ${\tilde{z}}_{j, r}$ are equivalent to those expressed in (4.10). Indeed, it is reasonable to consider that, in the absence of information, the size in terms of elementary unit of the cluster $U_{i}^{B}$ can be set as equal to its mean defined at the domain level: $M_{i}^{B} ≅ {\bar{M}}_{(q)}^{B} = M_{(q)}^{B} / N_{(q)}^{B}$ for $U_{i}^{B} \in U_{(q)}^{B} .$ Then, the following approximations hold

$Λ_{j, i}^{B} = \sum_{k \in U_{i}^{B}} λ_{j, i k} ≅ {\bar{M}}_{(q)}^{B} \frac{γ_{j}}{M_{(q)}^{B}} = \frac{γ_{j}}{N_{(q)}^{B}}; Λ_{i}^{B} = \sum_{j \in U_{(q)}^{A}} Λ_{j, i}^{A} ≅ \frac{1}{N_{(q)}^{B}} \sum_{j \in U_{(q)}^{A}} γ_{j} .$

Therefore, setting ${\tilde{Y}}_{r (q)}^{B} = {\tilde{y}}_{r (q)}^{B} N_{(q)}^{B}$ and postulating the independence of the process which generates the links $l_{j, i k}$ with the one that creates the variables of interest $y_{i, r},$ we can obtain

${\tilde{z}}_{j, r} ≅ \sum_{i \in U_{(q)}^{B}} \frac{Λ_{j, i}^{B}}{Λ_{i}^{B}} {\tilde{y}}_{r (q)} = \sum_{i \in U_{(q)}^{B}} \frac{γ_{j} / N_{(q)}^{B}}{\sum_{j \in U_{(q)}^{A}} γ_{j} / N_{(q)}^{B}} {\tilde{y}}_{r (q)} = \frac{γ_{j}}{\sum_{j \in U_{(q)}^{A}} γ_{l}} {\tilde{y}}_{r (q)}^{B} N_{(q)}^{B} for j \in U_{(q)}^{A} .$

Option 3.3. Modeling the $z_{j, r}$ values. Another alternative may be carried out by trying to model directly the $z$ -values and the total number of links $L_{j}^{A}$ with models of the type:

${\begin{array}{l} z_{j, r} = {\tilde{z}}_{j, r} + u_{j, z r} = f_{z r} (x_{j}; φ_{r}) + u_{j, z r} \\ E_{M_{z r}} (u_{j, z r}) = 0, E_{M_{z r}} (u_{j, z r}^{2}) = σ_{j, z r}^{2}, \forall j \\ E_{M_{z r}} (u_{j, z r}, u_{j^{^{'}} , z r}) = 0, \forall j \neq j^{'} \end{array}, {\begin{array}{l} L_{j}^{A} = Λ_{j}^{A} + u_{j, Λ} = f_{Λ} (θ_{j}; φ_{Λ}) + u_{j, Λ} \\ E_{M_{Λ}} (u_{j, Λ}) = 0, E_{M_{Λ}} (u_{j, Λ}^{2}) = σ_{j, Λ}^{2}, \forall j \\ E_{M_{r}} (u_{j, Λ}, u_{j^{^{'}} , Λ}) = 0, \forall j \neq j^{'} \end{array} (4.13)$

where $x_{j}$ and $θ_{j}$ are vectors of auxiliary variables. The predictions $Λ_{j}^{A}$ need to be positive. A useful model is the log-linear one (Xu and Lavallée, 2009): $\log (Λ_{j}^{A}) = θ_{j}^{'} φ_{Λ} .$ The model on the right hand side of (4.13) allows the prediction of the total number of links $Λ_{j}^{A}$ of the unit $j,$ thus defining the expected survey cost attached to it. The optimization problem could be carried out using the variances of the predictions of the models (4.13).

Remark 4.4. Option 3.1 requires the minimum of information for the construction of the predictions ${\tilde{z}}_{j, r}$ and needs us to define of plausible values for the constants $k_{r} .$ Option 3.2 involves the same information as Option 3.1 for the construction of the predictions ${\tilde{z}}_{j, r}$ (see Remark 4.3) but requires an estimate of the parameters $σ_{r (q)}^{2} .$ These estimates can be obtained from either pilot or previous surveys conducted directly on the population $U^{B} .$ Option 3.3 is the most complex and expensive, since it involves carrying out indirect pilot surveys on the population $U^{A}$ for building plausible predictions of the parameters ${\tilde{z}}_{j, r},$ $Λ_{j}^{A},$ $σ_{j, z r}^{2}$ and $σ_{j, Λ}^{2} .$

Remark 4.5. A good strategy that should be robust against model failure is to select a balanced sample with respect to the auxiliary variables $x_{j} .$ In this case, the auxiliary variables $d_{j}$ of the balancing equations are replaced by the augmented variables $d_{j}^{*} = {(d_{j}^{'}, x_{j}^{'} / π_{j}^{A})}^{'} .$ For the calculation of the variances, the residuals $η_{j, v}$ are substituted by the modified residuals $η_{j, v}^{*} = y_{j, v} - π_{j}^{A} {(d_{j}^{*})}^{'} β_{v}^{*},$ where $β_{v}^{*} = {(Δ^{*})}^{- 1} \sum_{j \in U^{A}} π_{j}^{A} (\frac{1}{π_{j}^{A}} - 1) d_{j}^{*} y_{j, v}$ with $Δ^{*} = \sum_{j \in U^{A}} d_{j}^{*} {(d_{j}^{*})}^{'} π_{j}^{A} (1 - π_{j}^{A}) .$ For the modified residuals $η_{j, r}^{*},$ similar expressions are used.

Remark 4.6. A proportional-to-population-size allocation may be a reasonable strategy for stratified sampling designs in which the total sample size $m^{A}$ is fixed. In this case the stratum sample size, $m_{h}^{A},$ may be defined as $m_{h}^{A} = m^{A} (\sum_{j \in U_{h}^{A}} x_{j} / \sum_{j \in U^{A}} x_{j}),$ where $x_{j}$ is the measure of the size.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2019-12-17

Language selection

Search and menus

Search

Cost optimal sampling for the integrated observation of different populations
Section 4. Informative contexts and optimization problem

Cost optimal sampling for the integrated observation of different populations Section 4. Informative contexts and optimization problem

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Cost optimal sampling for the integrated observation of different populations
Section 4. Informative contexts and optimization problem