Are probability surveys bound to disappear for the production of official statistics?
Section 4. Model-based approaches

Table of contents

Model-based approaches can eliminate the selection bias of the non-probability source and enable valid statistical inferences, provided that their underlying assumptions hold. The objective of the methods in Sections 4.1, 4.2 and 4.3 is to reduce respondent burden and costs by eliminating data collection for some variables of interest in a probability sample. The greater the number of variables of interest for which the values are not collected, the greater the reduction in data collection costs and respondent burden. However, these methods assume that the variables of interest are measured without error in the non-probability sample $(y_{k}^{*} = y_{k}) .$

From the non-probability sample $s_{NP},$ we can obtain the naive estimator ${\hat{θ}}^{NP} = N \sum_{k \in s_{NP}} y_{k} / n^{NP}$ of the total $θ,$ where $n^{NP}$ is the number of units in $s_{NP}$ and $N$ is the size of the population $U .$ It is well known that the selection bias of the naive estimator may be significant (see, for example, Bethlehem, 2016). The objective of the methods in Sections 4.1, 4.2 and 4.3 is to reduce the bias of the naïve estimator by using a vector of auxiliary variables, $x_{k} .$ We use $X$ to denote the matrix that contains the values of vector $x_{k},$ $k \in U .$ We assume that $x_{k}$ is measured without error in both samples $s_{NP}$ and $s_{P} .$

Section 4.4 briefly discusses small area estimation and the area-level model of Fay and Herriot (1979). Small area estimation methods are generally used to improve the precision of estimates for population sub-groups (domains) that have a small probability sample size. They require collecting the variable $y$ in the probability sample, but not in the non-probability sample. Therefore, they do not require the condition $y_{k}^{*} = y_{k} .$ Ideally, the non-probability sample contains variables correlated to $y .$

4.1 Calibration of the non-probability sample

The most natural approach to correcting the selection bias of a non-probability source is to model the relationship between the variable of interest $y_{k}$ and the auxiliary variables $x_{k}$ and then predict the total $θ$ by predicting the variable $y_{k}$ for each unit outside the non-probability sample. This prediction approach is described in Royall (1970) and generalized in Royall (1976); see also Elliott and Valliant (2017). Readers are referred to Valliant, Dorfman and Royall (2000) for more details. With this approach, inferences are conditional on $δ$ and $X .$ As a result, $Y$ is considered random as well as $Ω$ (unless $Ω = X)$ . If a probability sample is used, $I$ is also considered random. It is usually assumed that the nonprobability sample selection mechanism is not informative:

Assumption 3: $Y$ and $δ$ are independent after conditioning on $X .$

Assumption 3 is the key to eliminating the selection bias. The more access we have to auxiliary variables that are strongly related to both $y_{k}$ and $δ_{k},$ the more plausible assumption 3 becomes. In other words, the richer $X$ is, the more the conditional independence between $Y$ and $δ$ becomes a realistic assumption. This assumption, called the exchangeability assumption, is discussed in Mercer, Kreuter, Keeter and Stuart (2017). Schonlau and Couper (2017) also discuss the selection of auxiliary variables and emphasize their key role in reducing selection bias.

Often, a linear model is considered where it is assumed that the observations $y_{k}$ are mutually independent with $E (y_{k} | X) = x_{k}^{'} β$ and $var (y_{k} | X) \propto v_{k},$ where $β$ is a vector of unknown model parameters and $v_{k}$ is a known function of $x_{k}$ . The best linear unbiased predictor of $θ$ (see, for example, Valliant, Dorfman and Royall, 2000) is given by

${\hat{θ}}^{BLUP} = \sum_{k \in s_{NP}} y_{k} + \sum_{k \in U - s_{NP}} x_{k}^{'} \hat{β} = T_{x}^{'} \hat{β} + \sum_{k \in s_{NP}} (y_{k} - x_{k}^{'} \hat{β}), (4.1)$

where

$\hat{β} = {(\sum_{k \in s_{NP}} v_{k}^{- 1} x_{k} x_{k}^{'})}^{- 1} \sum_{k \in s_{NP}} v_{k}^{- 1} x_{k} y_{k} .$

The predictor ${\hat{θ}}^{BLUP}$ can also be re-written in the weighted form ${\hat{θ}}^{BLUP} = \sum_{k \in s_{NP}} w_{k}^{C} y_{k},$ where

$w_{k}^{C} = 1 + v_{k}^{- 1} x_{k}^{'} {(\sum_{k \in s_{NP}} v_{k}^{- 1} x_{k} x_{k}^{'})}^{- 1} (T_{x} - \sum_{k \in s_{NP}} x_{k}) . (4.2)$

It can easily be shown that $w_{k}^{C}$ is a calibrated weight that satisfies the calibration equation $\sum_{k \in s_{NP}} w_{k}^{C} x_{k} = T_{x} .$ Therefore, the prediction approach is equivalent to calibration when a linear model is used to describe the relationship between $y_{k}$ and $x_{k} .$ The calibration equation satisfies what Mercer et al. (2017) call the composition assumption. This approach requires knowing the vector of control totals $T_{x} .$ If it is unknown, an alternative is to replace it in (4.1) or (4.2) with an estimate, ${\hat{T}}_{x} = \sum_{k \in s_{P}} w_{k} x_{k},$ from a probability survey (Elliott and Valliant, 2017). If assumptions 1 to 3 are satisfied, it can be shown that the predictor ${\hat{θ}}^{BLUP}$ is unbiased, i.e., $E ({\hat{θ}}^{BLUP} - θ | δ, X) = 0,$ whether $T_{x}$ or ${\hat{T}}_{x}$ is used, provided that the latter is design-unbiased, i.e., $E ({\hat{T}}_{x} | Ω_{P}) = T_{x} .$ Of course, the unbiasedness property of the predictor ${\hat{θ}}^{BLUP}$ requires the linear model to be valid.

Remark: In practice, auxiliary variables for which the population total is known are usually few in number and not sufficiently predictive of the variable $y$ for eliminating the selection bias. These may be supplemented with other auxiliary variables for which the total can be estimated using an existing probability survey. Therefore, the vector of population totals may be a blend of known and estimated totals. If the probability survey itself is calibrated to known population totals, then only the estimated totals ${\hat{T}}_{x}$ from the probability survey can be used.

A linear model is not always appropriate. This is the case when the variable $y$ is categorical. Another typical example occurs when it is desired to estimate the total of a quantitative variable in a domain of interest. The variable $y$ is then defined as the product of that quantitative variable and a binary variable indicating domain membership. To model such a variable, it is natural to consider a mixture of a degenerate distribution at 0 and a continuous distribution. When the relationship between $y_{k}$ and $x_{k}$ is not linear, model-assisted calibration of Wu and Sitter (2001) can be used to preserve the weighted form of the predictor $θ$ while taking into account the non-linearity of the relationship. Suppose that we replace the above linear model with a non-linear (or non-parametric) model such that $E (y_{k} | X) = h (x_{k}),$ where $h (\cdot)$ is some function. The Wu and Sitter (2001) calibration first involves predicting $y_{k}$ by ${\hat{y}}_{k} = \hat{h} (x_{k}),$ $k \in U,$ where $\hat{h} (x_{k})$ is a model-based estimate of $h (x_{k}) .$ Then, the total $T_{\hat{y}} = \sum_{k \in U} {\hat{y}}_{k}$ is calculated, and weights, $w_{k}^{MC},$ $k \in s_{NP},$ are found that satisfy the calibration equation:

$\sum_{k \in s_{NP}} w_{k}^{MC} (\begin{matrix} 1 \\ {\hat{y}}_{k} \end{matrix}) = (\begin{matrix} N \\ T_{\hat{y}} \end{matrix}) .$

In other words, the equation (4.2) can be used, where $x_{k}^{'}$ is replaced with $(1, {\hat{y}}_{k}) .$ This method requires knowing the population size $N$ as well as the vector $x_{k}$ for all units in the population $U .$ If $N$ and $T_{\hat{y}}$ are unknown, they can be replaced with estimates from a probability survey. For example, we can replace $N$ with $\hat{N} = \sum_{k \in s_{P}} w_{k}$ and $T_{\hat{y}}$ with ${\hat{T}}_{\hat{y}} = \sum_{k \in s_{P}} w_{k} {\hat{y}}_{k} .$ The approach can also be extended to the case of multiple variables of interest.

We mentioned that the selection bias may be considerably reduced if $x_{k}$ is rich and contains variables that are related to both $δ_{k}$ and $y_{k},$ which makes assumption 3 more realistic. It can therefore be useful in practice to consider a large number of potential auxiliary variables and select the most relevant ones using a variable selection technique. Chen, Valliant and Elliott (2018) suggest the LASSO technique for selecting auxiliary variables and show its good properties.

It should be noted that the predictor ${\hat{θ}}^{BLUP}$ reduces to the naive estimator, ${\hat{θ}}^{NP},$ in the simplest case possible where only one constant auxiliary variable is used: $x_{k} = 1,$ $k \in U .$ The naive estimator is usually highly biased. Its bias can be significantly reduced if the population $U$ can be subdivided into $H$ disjoint and exhaustive post-strata, $U_{h},$ $h = 1, \dots, H,$ of size $N_{h} .$ The post-stratification model, $E (y_{k} | X) = β_{h},$ $k \in U_{h},$ is then postulated, which is an important special case of the above linear model. Assuming that the variance $var (y_{k} | X)$ is constant for $k \in U_{h},$ the predictor ${\hat{θ}}^{BLUP}$ is written: ${\hat{θ}}^{BLUP} = \sum_{h = 1}^{H} N_{h} {\hat{β}}_{h},$ where ${\hat{β}}_{h} = \sum_{k \in s_{NP , h}} y_{k} / n_{h}^{NP},$ $s_{NP, h}$ is the set of units in $U_{h}$ that are part of the sample $s_{NP}$ and $n_{h}^{NP}$ is the size of $s_{NP, h} .$ If the population sizes $N_{h}$ are unknown, they can be replaced with estimates, ${\hat{N}}_{h} = \sum_{k \in s_{P , h}} w_{k},$ from a probability survey, where $s_{P , h}$ is the set of units in $U_{h}$ that are part of the sample $s_{P}$ . Regression trees could prove to be an interesting approach for forming post-strata, especially when the auxiliary variables are categorical.

If multiple categorical auxiliary variables are available, it can be useful to form a large number of post-strata to reduce the selection bias. If many auxiliary variables are crossed, the sample sizes $n_{h}^{NP}$ could become very small, thereby making the estimators ${\hat{β}}_{h}$ very unstable. Gelman and Little (1997) suggest using a multi-level regression model to obtain estimators ${\tilde{β}}_{h}$ more stable than ${\hat{β}}_{h} .$ They then consider the post-stratified predictor: ${\hat{θ}}^{MRP} = \sum_{h = 1}^{H} N_{h} {\tilde{β}}_{h} .$ Nowadays, this method is known as Mr.P or MRP (Multilevel Regression and Poststratification); see, for example, Mercer et al. (2017). A similar approach would use small area estimation methods (Rao and Molina, 2015) to stabilize the estimators ${\hat{β}}_{h} .$ Although such methods are likely to produce much more precise estimates of the average of variable $y$ over the population $U_{h},$ it remains to be determined whether such methods can produce significant efficiency gains for estimating the overall total $θ$ compared to the simple post-stratified predictor ${\hat{θ}}^{BLUP} = \sum_{h = 1}^{H} N_{h} {\hat{β}}_{h} .$ It seems that regression trees provide another way to control the instability of the estimators ${\hat{β}}_{h}$ since a criterion is generally used to prevent an overly narrow subdivision of the population. These various methods warrant further investigation in future research. Precise estimation of population sizes $N_{h},$ if not known, is also a problem not to be overlooked when the population is divided into a large number of post-strata.

4.2 Statistical matching

Statistical matching, or data fusion, is an approach developed for combining data from two different sources that contain both source-specific variables and common variables. Readers are referred to D’Orazio, Di Zio and Scanu (2006) or Rässler (2012) for a review of statistical matching methods. In the context of this article, statistical matching involves modelling the relationship between $y_{k}$ and the auxiliary variables $x_{k},$ which are common to both sources, using data from the non-probability sample. As with calibration, the non-probability sample selection mechanism is assumed to be non-informative, and the auxiliary variables must be chosen carefully in order to make assumption 3 as plausible as possible. Once a model has been determined, it is used to predict the $y$ values in a probability sample. Statistical matching can be viewed as an imputation problem with an imputation rate of 100%. The predictor of $θ,$ obtained from the probability sample, takes the form: ${\hat{θ}}^{SM} = \sum_{k \in s_{P}} w_{k} y_{k}^{imp},$ where $y_{k}^{imp}$ is the imputed value for the unit $k \in s_{P} .$ As in calibration, inferences are conditional on $δ$ and $X .$ Assumption 3, in a statistical matching context, can be viewed as analogous to the Population Missing At Random (PMAR) assumption introduced by Berg, Kim and Skinner (2016) in a non-response context.

If the linear regression model $E (y_{k} | X) = x_{k}^{'} β$ is used, the imputed value for the unit $k \in s_{P}$ is $y_{k}^{imp} = x_{k}^{'} \hat{β}$ and the resulting predictor is given by ${\hat{θ}}^{SM} = {\hat{T}}_{x}^{'} \hat{β} .$ If assumptions 1 to 3 are satisfied and $E ({\hat{T}}_{x} | Ω_{P}) = T_{x},$ statistical matching produces an unbiased predictor, ${\hat{θ}}^{SM},$ i.e., $E ({\hat{θ}}^{SM} - θ | δ, X) = 0.$ Also, if $v_{k} = x_{k}^{'} λ,$ for a certain known vector $λ,$ it can be shown that $\sum_{k \in s_{NP}} (y_{k} - x_{k}^{'} \hat{β}) = 0,$ and the predictor ${\hat{θ}}^{SM}$ is equivalent to the predictor ${\hat{θ}}^{BLUP}$ if we replace $T_{x}$ in (4.1) with ${\hat{T}}_{x} .$ It can also be shown that, for a post-stratification model where we impute $y_{k},$ $k \in s_{P , h},$ with $y_{k}^{imp} = {\hat{β}}_{h},$ the predictor ${\hat{θ}}^{SM}$ reduces to ${\hat{θ}}^{SM} = \sum_{h = 1}^{H} {\hat{N}}_{h} {\hat{β}}_{h} .$ Therefore, statistical matching and calibration produce similar predictors, even identical in some cases, when a linear model is postulated and the totals $T_{x}$ are estimated.

Choosing between statistical matching or calibration can depend on the user’s perspective. For example, if it is the content of the non-probability source, in terms of variables of interest, that is relevant to the user, then it seems natural to weight the non-probability sample in the hopes of reducing the selection bias for all variables of interest. The calibration technique or the methods in Section 4.3 are obvious choices for such weighting. Conversely, if instead it is the content of the probability survey that is relevant, then statistical matching is the appropriate choice. This method enriches the probability survey by imputing the missing variables of interest.

Statistical matching is easily generalized to non-linear or non-parametric models such that $E (y_{k} | X) = h (x_{k}) .$ The imputed values $y_{k}^{imp}$ are simply obtained by predicting the missing values $y_{k},$ $k \in s_{P},$ using the chosen model. The predictor ${\hat{θ}}^{SM} = \sum_{k \in s_{P}} w_{k} y_{k}^{imp}$ remains unbiased if assumptions 1 to 3 are satisfied and if $E (y_{k}^{imp} - y_{k} | δ, X) = 0.$ Donor or nearest neighbour imputation is a non-parametric imputation method commonly used for handling non-response (see, for example, Beaumont and Bocci, 2009) that does not require a linear relationship between $y_{k}$ and $x_{k} .$ In the context of matching non-probability and probability samples, donor imputation was popularized by Rivers (2007). For a given unit $k \in s_{P},$ the method involves finding the nearest donor, with respect to the auxiliary variables $x,$ among the units of the non-probability sample and replacing the missing value $y_{k}$ with the $y$ value from this donor. For donor imputation, the condition $E (y_{k}^{imp} - y_{k} | δ, X) = 0$ is satisfied if, for each recipient $k \in s_{P},$ the donor has exactly the same values of $x$ as the recipient. When one or more auxiliary variables are continuous, this condition is satisfied only asymptotically in general. A very large non-probability sample provides a large pool of donors, which should help to approximately satisfy this condition.

Remark: In some applications, a very large non-probability panel of volunteers, $s_{NP},$ is available, which contains a few auxiliary variables for matching, $x,$ but no variable of interest. Ideally, the variables of interest would be collected for all units of the panel $s_{NP},$ but that is impossible due to the cost and the burden on the panel members. Therefore, in practice, a sub-sample $s_{NP}^{*}$ of $s_{NP}$ is selected using random or non-random sampling methods. Quota sampling (e.g., Deville, 1991) is often considered in this context. In addition to collecting the variables of interest for all units of $s_{NP}^{*},$ there may also be interest in collecting other auxiliary variables for matching in order to enhance the vector $x .$ The matching can then be done to the probability sample, often much smaller in size, as long as the latter contains the same auxiliary variables as those of the non-probability sub-sample $s_{NP}^{*} .$ By carefully choosing the auxiliary variables for the matching, the potential for bias reduction is increased (Schonlau and Cooper, 2017). The implementation proposed by Rivers (2007) is slightly different. Rivers (2007) suggests conducting the matching between the probability sample and the panel $s_{NP}$ using the auxiliary variables available in both sources. The variables of interest are collected only for the set of donors in $s_{NP}$ who have been matched to a unit in the probability sample, which allows for a significant reduction of data collection costs and burden. The implicit assumption is that the panel members, initially volunteers, are more likely to respond than individuals chosen at random in the population. Obviously, non-response is unavoidable, and this problem must be dealt with, potentially through imputation. The advantage of this method is that the matching is carried out using the panel $s_{NP}$ rather than a sub-sample of this panel; the pool of donors is larger. However, the matching cannot be done using the enhanced vector of auxiliary variables because it is not available for the units in $s_{NP},$ which limits the potential for bias reduction.

Lavallée and Brisbane (2016) point out the connection between statistical matching and indirect sampling (Lavallée, 2007; Deville and Lavallée, 2006). They propose an estimator obtained by imputing each missing value $y_{k},$ $k \in s_{P},$ by a weighted average of the $y$ values of nearest donors. In reality, their estimator can also be obtained equivalently by imputing the missing values using fractional donor imputation (for example, Kim and Fuller, 2004). The use of more than one donor to impute the missing values yields a typically modest variance reduction.

Several imputation methods used in practice can be considered linear (Beaumont and Bissonnette, 2011). This is the case for linear regression imputation, donor imputation and fractional donor imputation. An imputation method is said to be linear if the imputed value $y_{k}^{imp},$ $k \in s_{P},$ can be written as $y_{k}^{imp} = \sum_{l \in s_{NP}} ω_{k l} y_{l},$ where $ω_{k l}$ is a function of $δ$ or $X$ but not of $Y .$ For example, for donor or nearest-neighbour imputation, $ω_{k l} = 1$ if the unit $l \in s_{NP}$ is the donor for the recipient $k \in s_{P};$ otherwise $ω_{k l} = 0.$ For a linear imputation method, the estimator ${\hat{θ}}^{SM} = \sum_{k \in s_{P}} w_{k} y_{k}^{imp}$ can be rewritten as a weighted sum over the non-probability sample: ${\hat{θ}}^{SM} = \sum_{l \in s_{NP}} W_{l} y_{l},$ where $W_{l} = \sum_{k \in s_{P}} w_{k} ω_{k l} .$ Therefore, for linear imputation methods, statistical matching is an alternative to calibration and to the methods in Section 4.3 if the objective is to properly weight the non-probability sample.

So far, we have considered only the estimation of the total $θ = \sum_{k \in U} y_{k} .$ However, the probability sample contains other variables, and there may be interest in the relationship between two or more variables, some from the probability survey and others imputed from the non-probability sample. As an example, suppose that the estimation of the total $θ = \sum_{k \in U} {\tilde{y}}_{k} y_{k}$ is of interest, where ${\tilde{y}}_{k}$ is a variable collected in the probability survey, but not available in the non-probability sample. It could, for example, define membership in a domain of interest. Statistical matching can be used to estimate this parameter by ${\hat{θ}}^{SM} = \sum_{k \in s_{P}} w_{k} {\tilde{y}}_{k} y_{k}^{imp} .$ We use $\tilde{Y}$ to denote the vector that contains the values of the variable ${\tilde{y}}_{k},$ $k \in U .$ It can be shown that ${\hat{θ}}^{SM}$ is unbiased, $E ({\hat{θ}}^{SM} - θ | δ, X, \tilde{Y}) = 0,$ if assumptions 1 to 3 are satisfied in addition to the following assumption:

Assumption 4: $Y$ and $\tilde{Y}$ are independent after conditioning on $δ$ and $X .$

Assumption 4 is known as the conditional independence assumption in the statistical matching literature.

4.3 Inverse propensity score weighting

Instead of modelling the relationship between $y_{k}$ and $x_{k},$ the relationship between $δ_{k}$ and $x_{k}$ could be modelled. The main advantage of this approach is to simplify the modelling effort when there are multiple variables of interest since there is always only one variable $δ_{k}$ . With this approach, inferences are conditional on $Y$ and $X .$ Also, it is usually assumed that assumption 3 is valid and thus $\Pr (δ_{k} = 1 | Y, X) = \Pr (δ_{k} = 1 | X) .$ The probability of participation $p_{k} = \Pr (δ_{k} = 1 | X)$ is then estimated by ${\hat{p}}_{k},$ and the estimate ${\hat{θ}}^{PS} = \sum_{k \in s_{NP}} w_{k}^{PS} y_{k}$ is calculated, where $w_{k}^{PS} = 1 / {\hat{p}}_{k} .$ The assumption that $p_{k} > 0,$ $k \in U,$ must be made. It is called the positivity assumption by Mercer et al. (2017). It may also be required in the calibration and statistical matching approaches. For example, empty post-strata $(n_{h}^{NP} = 0)$ may occur if it is not satisfied. To fix this issue, these empty post-strata are usually collapsed with other non-empty post-strata. This collapsing may jeopardize the validity of assumption 3 if the collapsed post-strata are different.

The estimation of $p_{k}$ can be achieved by postulating a parametric model $p_{k} = g (x_{k}; α),$ where $g$ is some function, normally bounded by 0 and 1, and $α$ is a vector of unknown model parameters. The logistic function $g (x_{k}; α) = \exp (x_{k}^{'} α) / [1 + \exp (x_{k}^{'} α)]$ predominates in the applications (see Kott, 2019, for a recent application). The estimator of $α$ is denoted by $\hat{α}$ and the estimated probability by ${\hat{p}}_{k} = g (x_{k}; \hat{α}) .$ Ideally, $α$ would be estimated using $x_{k}$ for all the units in the population $U$ similar to what would be done in a non-response context. For example, assuming the logistic function is used, $α$ could be estimated by solving the maximum likelihood equation:

$\sum_{k \in U} [δ_{k} - p_{k} (α)] x_{k} = \sum_{k \in s_{NP}} x_{k} - \sum_{k \in U} p_{k} (α) x_{k} = 0 . (4.3)$

This is impossible when $x_{k}$ is not known for all units $k \in U - s_{NP},$ which is almost always the case in practice. Iannacchione, Milne and Folsom (1991) proposed another unbiased estimation equation for $α$ (see also Deville and Dupont, 1993):

$\sum_{k \in s_{NP}} \frac{x_{k}}{p_{k} (α)} - \sum_{k \in U} x_{k} = 0 . (4.4)$

The main advantage of equation (4.4) is that it does not require knowing $x_{k}$ for each unit $k \in U - s_{NP} .$ However, it is necessary to have access to the vector of totals $\sum_{k \in U} x_{k}$ from an external source. An interesting property of equation (4.4) is that the resulting weights $w_{k}^{PS} = 1 / p_{k} (\hat{α})$ satisfy the calibration equation $\sum_{k \in s_{NP}} w_{k}^{PS} x_{k} = \sum_{k \in U} x_{k},$ just like the weights $w_{k}^{C}$ given in (4.2). Indeed, it can be shown that solving (4.4) yields $w_{k}^{PS} = w_{k}^{C}$ if the model $p_{k} (α) = {(1 + v_{k}^{- 1} x_{k}^{'} α)}^{- 1}$ is used. However, this is a less natural model than the above logistic model for modelling a probability.

To get around the problem of missing values $x_{k},$ $k \in U - s_{NP},$ Chen et al. (2019) suggest estimating $\sum_{k \in U} p_{k} (α) x_{k}$ in (4.3) using a probability survey. The equation to be solved becomes:

$\sum_{k \in s_{NP}} x_{k} - \sum_{k \in s_{P}} w_{k} p_{k} (α) x_{k} = 0 . (4.5)$

Equation (4.5) is unbiased conditionally on $Y$ and $X$ provided that the probability survey allows for unbiased estimation, conditionally on $Y$ and $Ω,$ of any population total that is not a function of $δ$ such as $\sum_{k \in U} p_{k} (α) x_{k} .$ Assumptions 1 and 3 are required, but not assumption 2. Using the idea of Iannacchione et al. (1991), an alternative to (4.5) is obtained by solving:

$\sum_{k \in s_{NP}} \frac{x_{k}}{p_{k} (α)} - \sum_{k \in s_{P}} w_{k} x_{k} = 0 . (4.6)$

Equation (4.6) produces weights $w_{k}^{PS} = 1 / p_{k} (\hat{α})$ that satisfy the calibration equation $\sum_{k \in s_{NP}} w_{k}^{PS} x_{k} = \sum_{k \in s_{P}} w_{k} x_{k}$ (see also Lesage, 2017; Rao, 2020). The estimators of $α$ obtained using (4.5) or (4.6) are likely less efficient than those obtained using (4.3) or (4.4). If $x_{k},$ $k \in U - s_{NP},$ or the vector $\sum_{k \in U} x_{k}$ is known, then using (4.3) or (4.4) is preferable. Otherwise, the estimating equations (4.5) or (4.6) can be used provided that $x_{k}$ is collected in a probability survey. Note that the indicators $δ_{k}$ do not need to be observed in the probability sample.

Equations (4.5) and (4.6) may be more difficult to solve than equations (4.3) and (4.4) and may not have a solution. Consider, for example, the case where there is only one auxiliary variable: $x_{k} = 1.$ Using (4.5) or (4.6), it can be seen that the estimated probability reduces to: ${\hat{p}}_{k} = n^{NP} / \sum_{k \in s_{P}} w_{k} .$ If the size of the probability sample is sufficiently large, it is expected that $0 < {\hat{p}}_{k} < 1.$ For small sample sizes, it may happen that ${\hat{p}}_{k} > 1$ due to the variability of $\sum_{k \in s_{P}} w_{k} .$ In that case, equations (4.5) and (4.6) would not have a solution if the logistic function is used since it requires that $0 < {\hat{p}}_{k} < 1.$ To avoid this issue, it may be helpful to consider other functions not bounded by 1, such as $g (x_{k}; α) = \exp (x_{k}^{'} α) .$

Kim and Wang (2019) suggest using the probability sample to estimate the participation probability. Assuming the logistic function is used, the equation to be solved is:

$\sum_{k \in s_{P}} w_{k} [δ_{k} - p_{k} (α)] x_{k} = \sum_{k \in s_{P}} w_{k} δ_{k} x_{k} - \sum_{k \in s_{P}} w_{k} p_{k} (α) x_{k} = 0 .$

The method requires knowing the indicators $δ_{k}$ in the probability sample and the validity of assumptions 1, 2 and 3 to ensure the estimating equation is unbiased. Also, the probability sample size is usually small relative to the non-probability sample size, and it can be numerically difficult to estimate $α,$ especially when $x_{k}$ contains a large number of variables and the overlap between the two samples is small.

Lee (2006), see also Rivers (2007), Valliant and Dever (2011) and Elliott and Valliant (2017), proposes to combine the two samples and then estimate $p_{k}$ using logistic regression. It seems that the author implicitly assumes that the two samples do not overlap, i.e., that $δ_{k} = 0$ for all units in $s_{P} .$ Using again the logistic function, the resulting estimating equation is:

$\sum_{k \in s_{NP}} η_{k}^{NP} [1 - p_{k} (α)] x_{k} - \sum_{k \in s_{P}} w_{k} p_{k} (α) x_{k} = 0, (4.7)$

where $η_{k}^{NP}$ is a certain weight for the units in the non-probability sample. The method is somewhat similar to the one proposed by Chen et al. (2019), but the estimating equation (4.7) is not unbiased, conditionally on $Y$ and $X,$ unlike equations (4.5) and (4.6). However, if we assume $η_{k}^{NP} = 1$ and if $\max {p_{k}; k \in U}$ is small, equation (4.7) becomes approximately equivalent to equation (4.5). Yet Lee (2006) does not directly use the estimated probabilities resulting from (4.7). The author uses them only to order the union of the two samples and then create homogeneous classes. Using homogeneous classes brings some robustness to model misspecification and can help prevent very small estimated probabilities and thus very large weights. In the context of non-response, forming homogeneous imputation or reweighting classes was studied by Little (1986), Eltinge and Yansaneh (1997), and Haziza and Beaumont (2007), among others. Haziza and Lesage (2016) illustrate the robustness of the method when the function $g (x_{k}; α)$ is misspecified. The method is used regularly in Statistics Canada surveys for dealing with non-response.

Rather than using (4.7), homogeneous classes could be formed by starting with the unbiased equations (4.5) or (4.6). These initial estimated probabilities are denoted by ${\hat{p}}_{k}^{0} = g (x_{k}; \hat{α}) .$ The sample $s = s_{P} \cup s_{NP}$ can then be sorted by ${\hat{p}}_{k}^{0}$ and divided into $C$ homogeneous classes of equal or unequal sizes. The set of units in $s_{P}$ that are part of class $c$ is denoted by $s_{P , c}$ whereas the set of units in $s_{NP}$ that are part of class $c$ is denoted by $s_{NP , c} .$ The weight $w_{k}^{PS}$ for a unit $k \in s_{NP , c}$ is equal to the inverse of the estimated participation rate in class $c$ and is given by $w_{k}^{PS} = {\hat{N}}_{c} / n_{c}^{NP},$ where ${\hat{N}}_{c} = \sum_{k \in s_{P , c}} w_{k}$ and $n_{c}^{NP}$ is the number of units in $s_{NP , c}$ This weight ensures the calibration property: $\sum_{k \in s_{NP , c}} w_{k}^{PS} = {\hat{N}}_{c} .$ The number of classes must be large enough to capture a high percentage of the variability of the initial probabilities ${\hat{p}}_{k}^{0},$ thereby reducing the bias. On the other hand, it must not be too large to prevent the occurrence of empty classes since the weights $w_{k}^{PS} = {\hat{N}}_{c} / n_{c}^{NP}$ cannot be calculated if $n_{c}^{NP} = 0.$ Regression trees can prove to be an effective alternative for forming classes. In a non-response context, they have been studied by Phipps and Toth (2012). The estimator ${\hat{θ}}^{PS} = \sum_{k \in s_{NP}} w_{k}^{PS} y_{k}$ obtained after forming homogeneous classes has exactly the same form as the post-stratified estimator described in the calibration approach in Section 4.1; the only difference is that the classes are built by modelling $δ_{k}$ rather than $y_{k} .$

Assumption 3 may not be realistic in some contexts so that $\Pr (δ_{k} = 1 | Y, X) \neq \Pr (δ_{k} = 1 | X) .$ In this case, the participation probability $p_{k} = \Pr (δ_{k} = 1 | Y, X)$ might be modelled using a vector of explanatory variables $x_{k}^{*},$ defined using the variable of interest $y_{k}$ (or variables of interest if there are several) and potentially other auxiliary variables $x_{k} .$ A parametric model, $p_{k} = g (x_{k}^{*}; α),$ can be considered for modelling the participation probability. Equations (4.5) and (4.6) cannot be used to estimate $α$ because $y_{k}$ (and therefore $x_{k}^{*})$ is not available in the probability sample. However, an equation similar to (4.6) can be used:

$\sum_{k \in s_{NP}} \frac{x_{k}^{I}}{g (x_{k}^{*}; α)} - \sum_{k \in s_{P}} w_{k} x_{k}^{I} = 0 . (4.8)$

The vector $x_{k}^{I},$ of the same size as $α,$ contains calibration variables, also called instrumental variables in the econometric literature. We use $X^{I}$ to denote the matrix that contains the values of vector $x_{k}^{I},$ $k \in U .$ Equation (4.8) requires knowing the calibration variables $x_{k}^{I}$ for both samples. However, the explanatory variables $x_{k}^{*}$ can be observed only for the units in the non-probability sample. Equation (4.8) produces weights $w_{k}^{PS} = 1 / g (x_{k}^{*}; \hat{α})$ that satisfy the calibration equation $\sum_{k \in s_{NP}} w_{k}^{PS} x_{k}^{I} = \sum_{k \in s_{P}} w_{k} x_{k}^{I} .$ An equation similar to (4.8) was originally proposed by Deville (1998) to deal with non-response (see also Kott, 2006; Haziza and Beaumont, 2017). Equation (4.8) is unbiased, conditionally on $Y,$ $X$ and $X^{I},$ if the instrumental variables $x_{k}^{I}$ can be selected such that the following assumption is satisfied:

Assumption 5: $δ$ and $X^{I}$ are independent after conditioning on $Y$ and $X .$

Assumption 3 is no longer required, but is replaced with another assumption. The choice of instrumental variables $x_{k}^{I}$ that satisfy assumption 5 is not always obvious in practice. They must not be predictive of $δ_{k}$ after conditioning on $x_{k}^{*} .$ Ideally, for efficiency reasons, the instrumental variables are selected so as to be predictive of $x_{k}^{*}$ without compromising assumption 5. Unlike equations (4.5) and (4.6), equation (4.8) cannot be used to form homogeneous classes because the participation probabilities ${\hat{p}}_{k} = g (x_{k}^{*}; \hat{α})$ cannot be calculated for the units in the probability sample. As such, the property of robustness that comes with homogeneous classes is lost. Because of these drawbacks, equation (4.8) should be considered only when there are strong reasons to believe that assumption 3 is not appropriate.

Once weights $w_{k}^{PS}$ have been calculated using one of the methods in this section, they can still be adjusted through calibration. The objective of this calibration is to improve the precision of the estimator ${\hat{θ}}^{PS}$ and also obtain a double robustness property (see Chen et al., 2019).

In general, the variable $y$ is observed for the entire non-probability sample, and the inverse propensity-score weighted estimator, ${\hat{θ}}^{PS} = \sum_{k \in s_{NP}} w_{k}^{PS} y_{k},$ or a weighted estimator obtained by calibration or statistical matching can be used. Sometimes, the non-probability sample is too large and the variable $y$ can only be collected for a sub-sample of $s_{NP} .$ Quota sampling (e.g., Deville, 1991) is a commonly used method for drawing the sub-sample if auxiliary variables are available for $k \in s_{NP} .$ An alternative to quota sampling is to calculate the weights $w_{k}^{PS}$ for the entire non-probability sample and use them to select a random sub-sample with probabilities proportional to the weights. The variable $y$ is then collected only for the sub-sample, and the estimates are obtained as if the sub-sample was drawn from the population using an equal probability design. This approach is called inverse sampling in the literature on probability surveys (see, for example, Hinkins, Oh and Scheuren, 1997; or Rao, Scott and Benhin, 2003) and was proposed by Kim and Wang (2019) for non-probability samples.

4.4 Small area estimation

In most surveys, it is desired to estimate the total of the variable $y$ not just for the entire population $U,$ but also for different subgroups of the population, called domains. Probability surveys conducted by national statistical agencies generally produce reliable estimates for domains with a sufficient number of sample units. Their bias is controlled through the various sampling and data collection procedures, and their variance is typically small enough to draw accurate conclusions. When the domain of interest contains few sample units, the survey estimates may become unstable to the point of being unusable even when their bias stays under control. To remedy a lack of data in a domain of interest, small area estimation methods may be considered. These methods offset the lack of observed data in a domain through model assumptions that link auxiliary data to survey data. Two types of models are commonly used: unit-level models and area-level models. The area-level model of Fay and Herriot (1979) is undoubtedly the most popular. It requires auxiliary data to be available at the domain level only, unlike unit-level models, which require auxiliary variables for each unit of the population $U .$ Readers are referred to Rao and Molina (2015) for an excellent coverage of the various approaches. Below, we focus on the Fay-Herriot model.

Suppose it is desired to estimate $D$ totals, $θ_{d} = \sum_{k \in U_{d}} y_{k},$ $d = 1, \dots, D,$ where $U_{d}$ are $D$ disjoint subsets of the population. Using a probability survey, $θ_{d}$ can be estimated by ${\hat{θ}}_{d} = \sum_{k \in s_{P , d}} w_{k} y_{k},$ where $s_{P , d}$ is the set of sample units that fall within domain $d .$ The estimator ${\hat{θ}}_{d}$ is called the direct estimator of $θ_{d}$ because it only uses $y$ values of units belonging to domain $d .$ Small area estimation techniques generally lead to indirect estimators that combine the sample $y$ values of domain $d$ with $y$ values of units outside domain $d .$ We assume that a vector of auxiliary variables is available at the area level, and these variables come from sources independent of the probability sample. This vector for domain $d$ is denoted by $x_{d} .$ For example, the vector $x_{d}^{'} = (N_{d}, N_{d} {\hat{μ}}_{d}^{NP})$ could be considered, where $N_{d}$ is the population size in domain $d,$ ${\hat{μ}}_{d}^{NP} = \sum_{k \in s_{NP , d}} y_{k}^{*} / n_{d}^{NP}$ is the average of variable $y^{*}$ in a non-probability sample, $s_{NP , d}$ is the set of units in the non-probability sample that are in domain $d$ and $n_{d}^{NP}$ is the size of the non-probability sample in domain $d .$ If the population size $N_{d}$ is unknown, it can be replaced with an estimate independent of the probability survey. We use $X$ to denote the matrix that contains the values of vector $x_{d},$ $d = 1, \dots, D .$ Note that the vector $δ$ is hidden in the matrix $X$ in this section.

The Fay-Herriot model has two components: the sampling model and the linking model. The sampling model is based on the assumption that, conditionally on $Ω_{P},$ the direct estimators ${\hat{θ}}_{d}$ are independent and unbiased, i.e., $E ({\hat{θ}}_{d} | Ω_{P}) = θ_{d} .$ Their design variance is denoted by $ψ_{d} = var ({\hat{θ}}_{d} | Ω_{P}) .$ The sampling model is usually written in the form:

${\hat{θ}}_{d} = θ_{d} + e_{d}, (4.9)$

where $e_{d}$ is the sampling error such that $E (e_{d} | Ω_{P}) = 0$ and $var (e_{d} | Ω_{P}) = ψ_{d} .$ The independence assumption of the estimators ${\hat{θ}}_{d}$ (and therefore of the sampling errors $e_{d})$ can be questioned when the strata do not coincide with the domains of interest. Section 8.2 of Rao and Molina (2015) discusses methods that take into account correlated sampling errors. In practice, it is often assumed that these correlations are weak, and they are ignored.

The linking model assumes that, conditionally on $X,$ the totals $θ_{d}$ are independent, $E (θ_{d} | X) = x_{d}^{'} β$ and $var (θ_{d} | X) = b_{d}^{2} σ_{v}^{2},$ where $b_{d}$ are known constants used for controlling heteroscedasticity and $β$ and $σ_{v}^{2}$ are unknown model parameters. The linking model is usually written in the form:

$θ_{d} = x_{d}^{'} β + b_{d} v_{d}, (4.10)$

where $v_{d}$ is the model error such that $E (v_{d} | X) = 0$ and $var (v_{d} | X) = σ_{v}^{2} .$ When the parameters of interest, $θ_{d},$ are totals, it is often appropriate to let $b_{d} = N_{d} .$ From (4.9) and (4.10), we obtain the combined model:

${\hat{θ}}_{d} = x_{d}^{'} β + a_{d}, (4.11)$

where $a_{d} = b_{d} v_{d} + e_{d}$ is the combined error. When using the Fay-Herriot model (4.11), inferences are usually made conditionally on $X .$ It can easily be shown that $E (a_{d} | X) = 0$ and $var (a_{d} | X) = b_{d}^{2} σ_{v}^{2} + {\tilde{ψ}}_{d},$ where ${\tilde{ψ}}_{d} = E (ψ_{d} | X)$ is called the smooth design variance (Beaumont and Bocci, 2016; and Hidiroglou, Beaumont and Yung, 2019).

Now suppose that it is desired to predict the total $θ_{d}$ using a linear predictor ${\hat{θ}}_{d}^{LIN} = \sum_{i = 1}^{D} λ_{d i} {\hat{θ}}_{i},$ where $λ_{d i}$ are constants to be determined. A linear predictor uses all the data from the probability sample for predicting $θ_{d},$ not just the data from domain $d .$ This explains how it derives its efficiency. However, not all linear predictors are appropriate for predicting $θ_{d} .$ A strategy often used for determining the constants $λ_{d i}$ is to minimize the variance of the prediction error, $var ({\hat{θ}}_{d}^{LIN} - θ_{d} | X),$ subject to the constraint that the predictor must be unbiased, $E ({\hat{θ}}_{d}^{LIN} - θ_{d} | X) = 0.$ The resulting predictor, called the Best Linear Unbiased Predictor (BLUP), is denoted by ${\hat{θ}}_{d}^{BLUP},$ and can be written in the form (see, for example, Rao and Molina, 2015):

${\hat{θ}}_{d}^{BLUP} = γ_{d} {\hat{θ}}_{d} + (1 - γ_{d}) x_{d}^{'} \hat{β}, (4.12)$

where $γ_{d} = b_{d}^{2} σ_{v}^{2} / (b_{d}^{2} σ_{v}^{2} + {\tilde{ψ}}_{d})$ is bounded by 0 and 1, and

$\hat{β} = {(\sum_{d = 1}^{D} \frac{x_{d} x_{d}^{'}}{b_{d}^{2} σ_{v}^{2} + {\tilde{ψ}}_{d}})}^{- 1} \sum_{d = 1}^{D} \frac{x_{d}}{b_{d}^{2} σ_{v}^{2} + {\tilde{ψ}}_{d}} {\hat{θ}}_{d} .$

The predictor (4.12) is a weighted average of the direct estimator ${\hat{θ}}_{d}$ and a prediction, $x_{d}^{'} \hat{β},$ often called the synthetic estimator. More weight is given to the direct estimator when the smooth design variance, ${\tilde{ψ}}_{d},$ is small relative to the variance of the linking model, $b_{d}^{2} σ_{v}^{2} .$ The predictor ${\hat{θ}}_{d}^{BLUP}$ is then similar to the direct estimator. This situation normally occurs when the sample size in the domain is large. Conversely, if the direct estimator is unstable and has a large smooth design variance, more weight is given to the synthetic estimator. If the number of domains is large, the prediction variance of ${\hat{θ}}_{d}^{BLUP},$ $var ({\hat{θ}}_{d}^{BLUP} - θ_{d} | X),$ is approximately equal to $γ_{d} {\tilde{ψ}}_{d} .$ Since $var ({\hat{θ}}_{d} - θ_{d} | X) = {\tilde{ψ}}_{d},$ the constant $γ_{d}$ can be interpreted as being a variance reduction factor resulting from using ${\hat{θ}}_{d}^{BLUP}$ instead of ${\hat{θ}}_{d} .$ Therefore, the variance reduction is greater when $γ_{d}$ is small, i.e., when the direct estimator is not precise. On the other hand, if the linking model is not properly specified, there is greater risk of significant bias when $γ_{d}$ is small. To better understand this point, suppose that the real linking model is such that $E (θ_{d} | X) = μ (x_{d})$ for some function $μ (\cdot) .$ Under this model, it can be shown that the bias of the predictor ${\hat{θ}}_{d}^{BLUP}$ is given by

$E ({\hat{θ}}_{d}^{BLUP} - θ_{d} | X) = - (1 - γ_{d}) (μ (x_{d}) - x_{d}^{'} β_{0}), (4.13)$

where

$β_{0} = {(\sum_{d = 1}^{D} \frac{x_{d} x_{d}^{'}}{b_{d}^{2} σ_{v}^{2} + {\tilde{ψ}}_{d}})}^{- 1} \sum_{d = 1}^{D} \frac{x_{d}}{b_{d}^{2} σ_{v}^{2} + {\tilde{ψ}}_{d}} μ (x_{d}) .$

If the linear model $μ (x_{d}) = x_{d}^{'} β$ is valid, the bias disappears. Otherwise, the bias is not zero and increases as $γ_{d}$ decreases or as the specification error of the linking model, $μ (x_{d}) - x_{d}^{'} β_{0},$ increases. When $γ_{d}$ is close to 1, the bias is usually negligible, but so is the variance reduction.

Remark: Note that the predictor ${\hat{θ}}_{d}^{BLUP}$ and the bias (4.13) depend on the variance $σ_{v}^{2} .$ If the linear model (4.10) is not valid, the parameters $β$ and $σ_{v}^{2}$ no longer exist. Yet, the linking model (4.10) can still be postulated and its parameters can be estimated from the observed data as if the model were valid. The model variance $σ_{v}^{2},$ which enters in the calculation of the predictor ${\hat{θ}}_{d}^{BLUP}$ and the bias (4.13), can be viewed as being the value towards which an estimator of $σ_{v}^{2}$ converges.

The predictor (4.12) cannot be calculated because it depends on the unknown variances $σ_{v}^{2}$ and ${\tilde{ψ}}_{d} .$ When $σ_{v}^{2}$ and ${\tilde{ψ}}_{d}$ in (4.12) are replaced with estimators ${\hat{σ}}_{v}^{2}$ and ${\hat{\tilde{ψ}}}_{d},$ the BLUP (4.12) becomes the empirical best linear unbiased predictor, denoted as ${\hat{θ}}_{d}^{EBLUP} .$ There are a number of methods for estimating $σ_{v}^{2}$ (see Rao and Molina, 2015). One of the most commonly used methods is restricted maximum likelihood. To estimate ${\tilde{ψ}}_{d},$ we assume that a design-unbiased estimator of $ψ_{d}$ is available, denoted by ${\hat{ψ}}_{d} .$ This assumption is formally written: $E ({\hat{ψ}}_{d} | Ω_{P}) = ψ_{d} .$ It follows that $E ({\hat{ψ}}_{d} | X) = {\tilde{ψ}}_{d} .$ Therefore, the estimator ${\hat{ψ}}_{d}$ is unbiased for ${\tilde{ψ}}_{d},$ but can be very unstable when the domain sample size is small. A more efficient approach for estimating ${\tilde{ψ}}_{d}$ involves modelling ${\hat{ψ}}_{d}$ given the auxiliary variables $x_{d} .$ In practice, a linear model is often used for $\log ({\hat{ψ}}_{d}),$ and it is assumed that the model errors follow a normal distribution (for example, Rivest and Belmonte, 2000). Beaumont and Bocci (2016), see also Hidiroglou et al. (2019), provide a method of moments for estimating ${\tilde{ψ}}_{d}$ that does not require the normality assumption.

The Fay-Herriot model requires the availability of auxiliary data only at the domain level. The variable $y$ must be measured without error in the probability survey, but it is not essential for the auxiliary source to be perfect. This leaves the door open to all kinds of files external to the probability survey such as big data files. Kim, Wang, Zhu and Cruze (2018) is a recent example where an extension of the Fay-Herriot model was used with auxiliary data from satellite images. Small area estimation methods often achieve significant and sometimes impressive variance reductions (see, for example, Hidiroglou et al., 2019). The trade-off for obtaining these gains is the introduction of model assumptions and the risk that these assumptions do not hold. Therefore, model validation is a critical step in producing small area estimates, as in any model-based approach.

Small area estimation methods are generally used to improve the efficiency of estimators for domains with a small sample size. They could also be used to reduce the data collection costs and respondent burden by reducing the overall sample size of a probability survey for a few, if not all, survey variables. The estimates obtained from the reduced sample and the Fay-Herriot model, for example, could thus have a precision similar to the direct estimates from the probability survey obtained from the full sample. In this context, small area estimation methods would not be used to improve the precision for domains containing few units, but instead to reduce the overall data collection effort while preserving the quality of the estimates.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2020-06-30

Language selection

Search and menus

Search

Are probability surveys bound to disappear for the production of official statistics?
Section 4. Model-based approaches

4.1 Calibration of the non-probability sample

4.2 Statistical matching

4.3 Inverse propensity score weighting

4.4 Small area estimation

Are probability surveys bound to disappear for the production of official statistics? Section 4. Model-based approaches

4.1 Calibration of the non-probability sample

4.2 Statistical matching

4.3 Inverse propensity score weighting

4.4 Small area estimation

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Are probability surveys bound to disappear for the production of official statistics?
Section 4. Model-based approaches