3. Estimating response probabilities
Alina Matei and M. Giovanna Ranalli
Previous | Next
3.1 Using logistic regression to estimate
Different methods to estimate
are proposed in
the literature. All of these methods are based on the use of auxiliary
information known on the population or sample level. In the case of
non-ignorable nonresponse, the variable of interest is itself the cause (or one
of the causes) of the response behavior, and a covariance between the former
and the response probability is produced through a direct causal relation (see
Groves 2006). In such a case, the response probability
could be modeled
for
using logistic
regression as follows
or as
follows
where
is a vector with
the values taken by
covariates on
unit
and
and
are parameters.
Nonresponse
bias in the unadjusted respondent total of the variable of interest
depends on the
covariance between the values
and
(see Bethlehem
1988). An example of a covariate that reduces the covariance between
and
is the interest
in the survey topic, such as knowledge, attitudes, and behaviors related to the
survey topic (see Groves, Couper,
Presser, Singer, Tourangeau, Acosta and Nelson 2006).
The set of covariates
could be also
related to the variable of interest
to reduce
sampling variance (Little and Vartivarian 2005).
Since
is only observed
on respondents, Models (3.1) and (3.2) cannot be estimated. Therefore, usually,
the values of
that are known
for both respondents and nonrespondents and are related to the
by a ‘hopefully
strong regression’ (Cassel,
Särndal and Wretman 1983) are used in the
following model
Then, maximum likelihood can be used to fit Model (3.3) using the data
for
This leads to
estimate
and
and to the
estimated response probabilities
to be used in
(2.1). This procedure provides some protection against nonresponse bias if
is a powerful
predictor of the response probability and/or of the variable of interest (Kim
and Kim 2007).
In
what follows, we propose a reweighting adjustment system based on an auxiliary
variable that measures the propensity of each unit to participate to the
survey. To this end, further assumptions on the response model are introduced
in order to assume a dependence of the
on one latent
auxiliary variable that is connected to the propensity scores of Rosenbaum and
Rubin (1983). The proposed approach can be used when no other auxiliary
information is available on
3.2
Latent variables as auxiliary information
To
obtain a measure of response propensities, we consider the case in which item
nonresponse on the variables of interest is also present. Then, following
Chambers and Skinner (2003, page 278) ‘from a theoretical perspective the
difference between unit and item nonresponse is unnecessary. Unit nonresponse
is just an extreme form of item nonresponse’, we assume that item response
on the variables of interest is driven on respondents by the same attitude and
factors that drive unit response. Latent variable models can be used to
estimate such factors that, therefore, can be used as covariates in a logistic
response model.
As
we have already mentioned we assume that item nonresponse affects
survey variables
of particular interest. A second response indicator is introduced for each item
For each item
and each unit
a binary
variable
is defined that
takes value 1 if unit
answers to item
and 0 otherwise.
Let
denote the
vector of response indicators for unit
to the
items and let
be the study
variable vector for unit
Thus
is the response
value of unit
to item
and
is its response
indicator.
Suppose
the
are related to
an assumed underlying latent continuous scale; they are the indicators of a
latent variable denoted by
De Menezes and
Bartholomew (1996) call the variable
the ‘tendency to
respond’ to the survey. We call it here the ‘will to respond to the survey’ of
unit
A latent trait
model with a single latent variable is used to compute
for each
(we will see
later how; see Section 4.4). Assume for the moment that
is known on all
sample units and, as with usual auxiliary information, can be used as a
covariate. In the absence of other covariates, Model (3.3) is rewritten as
Covariate
can be viewed as
a variable explaining the behavior related to the survey topic, and thus having
good properties to reduce the covariance between
and
and, therefore,
nonresponse bias. If other suitable auxiliary information is available, it can
be inserted in the model as supplementary covariates. Now, to estimate the
parameters of Model (3.4), the value of
has to be
available for all units in the sample. The following sections provide details
on how to obtain estimated values of
for both
respondents and nonrespondents.
Previous | Next