Model-assisted calibration of non-probability sample survey data using adaptive LASSO
Section 2. Calibration

2.1 Traditional calibration

For an analytical sample $s_{A}$ (the sample which requires weight calibration) of size $n$ drawn from sample design $A$ with design weights $\underset{n \times 1}{d},$ and the diagonal matrix of design weights $D,$ calibrated weights $\underset{n \times 1}{w}$ minimize a distance measure

$E_{A} [\sum_{i \in s_{A}} g (w_{i}, d_{i}) / q_{i}] (2.1)$

under the constraint:

$\sum_{i \in s_{A}} w_{i} x_{i}^{T} = T^{X} (2.2)$

where $E_{A}$ is expectation with respect to the analytic (probability) design, $g (w_{i}, d_{i})$ is a differentiable function with respect to $w_{i},$ strictly convex on an interval containing $d_{i},$ and $g (d_{i}, d_{i}) =0,$ and where $T^{X}$ is a row vector of known population totals of sample calibration variables $X$ (Deville and Särndal, 1992). The constant $q_{i}$ is independent of design weight $d_{i} .$ The commonly used generalized regression (GREG) estimator uses the chi-square distance: $g (w_{i}, d_{i}) = {(w_{i} - d_{i})}^{2} / d_{i}$ with $q_{i} =1.$ Under this distance measure:

$w^{GREG} = d + D X {(X^{T} D X)}^{- 1} {(T^{X} - d^{T} X)}^{T} . (2.3)$

The estimate of population total of outcome $y$ is based on calibrated weights:

$\begin{array}{l} {\hat{T}}_{y}^{GREG} & = w^{(GREG) T} y \\ = d^{T} y + (T^{X} - d^{T} X) {(X^{T} D X)}^{- 1} X^{T} D y \\ = {\hat{T}}_{y}^{HT} + (T^{X} - d^{T} X) \hat{β} (2.4) \end{array}$

where ${\hat{T}}_{y}^{HT} = \sum_{i \in s_{A}} d_{i} y_{i}$ is the standard (weighted) design-based estimator, $\hat{β} = {(X^{T} D X)}^{- 1} X^{T} D y$ is the weighted least squares estimate of the linear regression $E_{ξ} [y_{i} | x_{i}, β] = x_{i}^{T} β,$ given weights $D .$ (This corresponds to the poststratified estimator when $X$ consists entirely of cell totals for categorical variables.) The calibrated weights defined in equation (2.3) do not rely on any outcome variable. Thus the same set of weights can be applied to all variables in the survey. Note that GREG assumes a working model that is linear. Although ${\hat{T}}_{y}^{GREG}$ is asymptotically design-unbiased for $T_{y},$ when the relationship between $y$ and $X$ is non-linear, such as in the case when $y$ is binary, the design variance of ${\hat{T}}_{y}^{GREG}$ can be larger than the design variance ${\hat{T}}_{y}^{HT} .$

2.2 Model-assisted calibration

Model-assisted calibration estimators can have significant advantage over ${\hat{T}}_{y}^{GREG}$ because model-assisted calibration allows for non-linear models to assist in the construction of calibrated weights. In model-assisted calibration, we assume a relationship between an outcome $y$ and $X$ through first two moments (Wu and Sitter, 2001):

$E_{ξ} (y_{i} | x_{i}) = μ (x_{i}, β), V_{ξ} (y_{i} | x_{i}) = ν_{i}^{2} σ^{2} (2.5)$

where $β = {(β_{1}, \dots, β_{p})}^{T}$ and $σ$ are unknown superpopulation parameters, $μ (x_{i}, β)$ is a known function of $x_{i}$ and $β,$ and $ν_{i}$ is a known function of $x_{i}$ or $μ (x_{i}, β) .$ $E_{ξ}$ and $V_{ξ}$ are expectation and variance with respect to the model $ξ .$ Let $B$ be the finite population (or census) estimate of $β$ (i.e., the quasilikelihood estimator of $β$ based on the entire finite population), and ${\hat{μ}}_{i} = μ (x_{i}, \hat{B}),$ where $\hat{B}$ is the sample estimate of $B .$ The model-assisted calibrated weights $w$ then minimize a distance measure $E_{A} [\sum_{i \in s_{A}} g (w_{i}, d_{i}) / q_{i}]$ under the constraints $\sum_{i \in s_{A}} w_{i} = N$ and $\sum_{i \in s_{A}} w_{i} {\hat{μ}}_{i} = \sum_{i}^{N} {\hat{μ}}_{i} .$ The main conceptual difference between traditional calibration and model-assisted calibration is that in model-assisted calibration, the constraints are based on two quantities: (1) population size, and (2) population total of predicted values ${\hat{μ}}_{i} .$ In traditional calibration, the constraint is a vector of population totals of $X$ (see equation (2.2)). Under chi-square distance measure with $q_{i} =1,$ the model-assisted calibrated weights are:

$w^{MC} = d + D M {(M^{T} D M)}^{- 1} {(T^{M} - d^{T} M)}^{T} (2.6)$

where $T^{M} = [N, \sum_{i}^{N} {\hat{μ}}_{i}]$ and $M = [d, {({\hat{μ}}_{i})}_{i \in s_{A}}] .$ (In the non-probability setting the vector of design weights $d$ can be replaced with $(N / n) 1 .)$ The estimate for the population total based on model-assisted calibrated weights is then:

$\begin{array}{l} {\hat{T}}_{y}^{MC} & = {(w^{MC})}^{T} y \\ = d^{T} y + (T^{M} - d^{T} M) {(X^{T} D X)}^{- 1} X^{T} D y \\ = {\hat{T}}_{y}^{HT} + (\sum_{i}^{N} {\hat{μ}}_{i} - \sum_{i \in s_{A}} d_{i} {\hat{μ}}_{i}) {\hat{B}}^{MC} (2.7) \end{array}$

where ${\hat{B}}^{MC}$ is the calibration slope to satisfy the calibration constraints (different from the model parameter estimates $\hat{B}) :$

${\hat{B}}^{MC} = \frac{\sum_{i \in s_{A}} d_{i} ({\hat{μ}}_{i} - \hat{\bar{μ}}) (y_{i} - \bar{y})}{\sum_{i \in s_{A}} d_{i} {({\hat{μ}}_{i} - \hat{\bar{μ}})}^{2}}, \hat{\bar{μ}} = \sum_{i \in s_{A}} d_{i} {\hat{μ}}_{i} / \sum_{i \in s_{A}} d_{i} , \bar{y} = \sum_{i \in s_{A}} d_{i} y_{i} / \sum_{i \in s_{A}} d_{i} .$

Unbiasedness and small variances of ${\hat{T}}_{y}^{MC}$ both rely on how well the ${\hat{μ}}_{i}$ approximates the true expected value of $y_{i} .$

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2018-06-21

Language selection

Search and menus

Search

Model-assisted calibration of non-probability sample survey data using adaptive LASSO
Section 2. Calibration

2.1 Traditional calibration

2.2 Model-assisted calibration

Model-assisted calibration of non-probability sample survey data using adaptive LASSO Section 2. Calibration

2.1 Traditional calibration

2.2 Model-assisted calibration

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Model-assisted calibration of non-probability sample survey data using adaptive LASSO
Section 2. Calibration