How to decompose the non-response variance: A total survey error approach
Section 2. Inference framework
Assume a sample
of size
is drawn from a population
of size
Define the population total by
for a
variable,
and a
domain indicator,
which takes
the value
if unit
belongs to
the domain
and
otherwise.
In the context of full response,
is
estimated by
where
could be the sampling
weight or a calibrated weight if calibration is performed. Because surveys are generally subject to non-response,
both unit or item, a sample unit is classified into either a responding or a
nonresponding unit with regard to the variable
at any
given point during data collection. The subset
contains
item-responding units whereas
contains
item-nonresponding units. Note that
and
respectively of size
and
form a
partition of the sample
with
and
The approach
proposed in this paper assumes that imputation is used in case of non-response,
which is the common approach in business surveys. Moreover, this approach can
be considered for both item and unit non-response as long as imputation is
used. However, since only one variable of interest
is
considered here for simplicity, then no distinction is made if the
variable is
imputed because of item or unit non-response. Also, the set
and
are not
indexed by an item number for simplicity without loss of generality. However,
the action following the calculation of a unit score might be different
depending on whether the unit is responding or not.
2.1
Estimation under imputation
The framework
requires linear imputation methods. In other
words, the imputed value,
can be
written as a linear combination of the values reported by the other units. This
linear combination is given by
The quantities,
and
do not
depend on the values of variable of interest,
but they
may depend on
and auxiliary data from the nonrespondents
available on the frame, registers or elsewhere. Linear imputation methods cover most methods used in practice like
auxiliary value imputation (Beaumont, Haziza and Bocci, 2011) and linear
regression imputation, as well as donor imputation, which is often used to
impute categorical variables.
It is common practice
to use several
imputation methods, referred to as composite imputation, applied sequentially to the
same variable. More than one linear imputation method can be used to
impute nonresponding units. Section 2 of Beaumont and Bissonnette (2011)
defines composite imputation in detail. Briefly, suppose that the set of
nonrespondents is broken down into two or more groups and that a different
imputation method is used within each group. For example, let
be the
complete vector of auxiliary variables for unit
and suppose
regression imputation is used to impute the variable of interest. However, if,
for some cases,
were
incomplete, another imputation method, based on the available subset of
would be
used. The approach presented
in our paper can be generalized to include composite imputation as long as
linear imputation methods are used. For simplicity of notation, the case of a
single linear imputation method is presented.
The estimator
of the domain total after imputation is given by
where
is the sampling weight or a calibrated weight.
The estimator presented in equation (2.2) can be rewritten as
The quantities
and
denote the compensatory weights (or adjustment
weights) defined as
They represent the effect of the non-response in the domain,
carried by the respondent unit,
with a reported value,
2.2 Variance
estimation
Consider an imputation model,
describing the relationship between variable
and the vector of observed auxiliary variables
Let
and
denote respectively the expectation, the
variance, and the covariance
with respect to the imputation model
The imputation model is
where
and
The matrix
contains all observed vectors
The quantities
and
can be estimated by
and
respectively. We assume that these estimators
are unbiased with respect to the imputation model
These estimators will be useful later for
estimating the total variance components and the unit decompositions of those
components.
The
total error of the estimator (2.2) can be expressed as
where
is the estimator under complete response given
by (2.1). The first term on the right-hand side of (2.3) is usually referred to
as the sampling error and the second term is called the non-response error. As
proposed in Särndal (1992) and in Beaumont and Bissonnette (2011), the mean
square error of
using (2.3) can be decomposed in three
components and is given by
under imputation model,
sampling design,
and response mechanism,
is approximately equivalent to the variance
assuming that the overall bias is negligible.
Thus, the equation (2.4) is equivalent to
where:
-
is the sampling variance;
-
is the non-response variance;
-
is the covariance between sampling and non-response
error terms, also called the mixed variance component.
Beaumont
and Bissonnette (2011) proposed the following estimators for
and
-
where:
-
is the naive sampling variance estimator using
the imputed values as though they were reported values.
-
is a correction to
in order to reduce the bias of
as proposed by Beaumont and Bocci (2009),
since the variance component
relies on the use of imputed values, usually
more homogeneous than the reported values.
-
is the estimator of the non-response component
of variance.
-
is the estimator of the mixed variance
component.
Under complete response,
the compensation weights are
and the variance components,
and
are also equal to 0, leaving the total
variance as
Under a census,
the variance components,
and
are equal to 0, leaving the total variance as
2.3 Non-response bias
The reduction of non-response bias is always a desirable
goal. It can be achieved through an adaptive design and/or through an appropriate
method of dealing with missing values. Our framework assumes that the non-response
bias is removed through imputation methods that use relevant auxiliary
information. In practice, it is likely that imputation will only reduce non-response
bias, not eliminate it. We may then wonder whether adaptive designs could be
used to reduce further the bias. In the context of non-response weighting,
Beaumont, Bocci and Haziza (2014) argued that auxiliary information used in an
adaptive design to reduce non-response bias can also be used in non-response
weighting to reduce the same amount of bias. Their argument can also be made in
the context of imputation. This justifies our focus on variance reduction
rather than bias reduction. We acknowledge that some bias may remain after
imputation but ignore this bias because it may not be possible to reduce it
further through an adaptive design without the availability of additional
auxiliary information. However, it is possible to reduce the variance through
an adaptive design.
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, 2018
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa