The anchoring method: Estimation of interviewer effects in the absence of interpenetrated sample assignment
Section 3. The anchoring method
As noted in Section 2.3, existing methods adjust
for possible interviewer effects introduced at the recruitment and measurement
stages of data collection by including respondent- and area- or
interviewer-level covariates in multilevel models (Hox, 1994), but such
adjustment may be erroneous if part of the interviewer variance is simply
arising due to non-interpenetrated sampling. As noted by Elliott and West
(2015), if subjects with similar values on a variable of interest are assigned
to interviewers in a non-random fashion for example, if a telephone interviewer
working day shifts tends to interview older respondents, where age may be
correlated with main variables of interest these variables will be correlated with specific
interviewers. However, we are just re-ordering the random sample, not
introducing measurement error in the manner described in Section 1, e.g.,
West and Blom (2017). Thus the actual data are not being altered, and there are
no true interviewer effects: we term the resulting within-interviewer
correlation “spurious” from a variance inflation perspective. Thus estimating
interviewer effects while failing to account for differential sample assignment
can lead to conservative inferences, resulting in misleadingly large estimates
of interviewer variance, -values and confidence intervals that are too
wide, and incorrect operational decisions based on predicted effects for
individual interviewers.
To address this important gap in the literature, we
describe an “anchoring” method that analysts can use to estimate the unique
components of variance due to interviewer effects on selection and measurement.
The method aims to leverage correlations between variables where interviewer measurement error is of concern and variables known ‒ or reasonably believed ‒ to be free of measurement error to remove the fraction of the within-interviewer correlation that is due to non-interpenetrated sample assignment. In the simplest case, if we have two variables,
one
treated as measurement error-free (the
“anchor”) and one treated as possibly having interviewer-induced
measurement error, and our objective is to estimate a mean of we fit a multilevel model to the observed data
for the two variables that includes a random interviewer effect only for the
variable subject to measurement error:
In (3.1), indexes interviewers, indexes respondents within interviewers, indexes the variable (1 = anchor, 2 = variable
of interest), is the interviewer effect, and
Our focus of inference in this manuscript is although or may also be of interest if the focus is on
interviewer variance or determining individual interviewers who are
contributing to that variance.
To provide a heuristic explanation of why this proposed
“anchoring” approach works, assume that and net of are almost perfectly correlated. Since lacks measurement error, it can serve as a
proxy for the non-measurement error component of absorbing artificial error in induced in the ordering of the data. Lack of
interpenetration means that estimating a linear mixed model using only will yield an upwardly biased estimate of
If information will be available to reduce the
bias in with large samples and high correlations
between and yielding increasingly accurate estimates of and thus of the true impact of the
interviewer-induced measurement error on the variance of
This approach easily extends to the setting where “anchoring” variables free from measurement
error are available:
In this case, the first variables are assumed to be free of
interviewer measurement error and the variable is the variable of interest, and where is an unstructured covariance matrix. Alternatively, instead of
using (3.2) directly, we can reduce (3.2) back to the bivariate setting in
(3.1) by replacing with the best linear predictor of using the anchoring variables: where and
3.1 Estimation remarks
One can use standard linear mixed model software (e.g.,
SAS PROC MIXED) to fit the models in (3.1) or (3.2) and obtain a restricted
maximum likelihood (REML) point estimate together with an associated variance estimate.
We have provided an annotated example of such code in the supplemental
materials. Weights used to account for unequal probabilities of selection,
non-response adjustment, and calibration to known population values can be
incorporated using pseudo-maximum likelihood estimation (PML; Pfeffermann,
Skinner, Holmes, Goldstein and Rasbash, 1998; Rabe-Hesketh and Skrondal, 2006)
when fitting the models in (3.1) or (3.2). We would generally recommend that
interviewers be assigned a weight of 1 when fitting weighted multilevel models
of these forms, to mimic the notion of simple random sampling of interviewers
from a hypothetical population of interviewers. The weights for respondents
should be rescaled to sum to the final respondent count for each interviewer
(Carle, 2009), and extensions of the PML method outlined by Veiga, Smith and
Brown (2014) and Heeringa, West and Berglund (2017, Chapter 11) can be
used to incorporate the rescaled weights in estimation of the residual
covariance structure in (3.1) or (3.2). In multistage samples where
interviewers cross geographic areas, cross-classified random effects models
(Rasbash and Goldstein, 1994) can also be utilized.
3.2 The Bayesian anchoring
method
In the presence of prior information on the parameters
of interest in this model (e.g., in a repeated cross-sectional survey using
interviewer administration), the models in (3.1) or (3.2) can also be fitted
using a Bayesian approach to incorporate the prior information. In repeated
surveys that carefully monitor interviewer performance, good predictions of
individual interviewer effects based on the estimated variance component are
important. Given historical data from a survey with the same essential design
conditions, one can estimate the parameters of interest in (3.1) using this
historical data, and then define informative prior distributions for these
parameters. (Examples of these types of surveys would include high-quality
government sponsored surveys with repeated cross-sectional data collections,
such as the National Health Interview Survey or, for the example considered in
this paper, the Behavioral Risk Factor Surveillance System.) Specifically, we
consider a prior distribution on the interview effect standard deviation that follows a half distribution (Gelman, 2006) with degrees of
freedom and standard deviation
Following Gelman, we assume and we estimate based on prior estimates of interviewer
effects. We consider standard weak priors for the fixed effect means: and for the residual variance:
This approach offers advantages relative to likelihood
ratio testing approaches that rely on asymptotic theory, particularly for
smaller samples. By using prior information to constrain the resulting
posterior distribution for the interviewer variance components, it generally
prevents extremely large draws of the variance component while not constraining
the means or residual variances. It also constrains posterior draws of variance
components to be greater than zero, enabling inference based on small
components of variance, while frequentist model-fitting procedures generally
fix such estimates of variance components to be exactly zero (which equates to
a rather unreasonable assumption that each interviewer produces exactly the
same survey estimate; West and Elliott, 2014). In such cases, the effects of
interviewers (even if they are small) would be ignored completely; the Bayesian
approach would still enable small effects to be integrated into the inference.
The Bayesian approach also yields credible intervals for the interviewer
variance components based on posterior draws.
3.3 Choosing anchoring
variables
A key assumption underlying both the standard
regression-based approach and the “anchoring” method is that selected variables
are free from interviewer-induced error. Like the “missing at random”
assumption in the missing data literature, we do not expect that there will
often be cases where we can be certain of this, but that approximations may be
available based on simple demographic measures (e.g., age) or other factual
questions with simple response options (e.g., current employment) and little
room for the introduction of interviewer error. The identification of
error-free covariates in advance of data collection is an important substantive
and methodological component of this approach, and prior methodological
literature on the variables most prone to interviewer effects (West and Blom,
2017) can be consulted for this component of the approach.
As we note above, if we have multiple error-free
covariates measured on the respondents, we can preserve their predictive power
(and thus the correlation of the anchor’s residuals with the residuals of the
variable of interest) by computing a linear predictor of the variable of
interest from a linear model that includes fixed effects of all of the
error-free covariates. We consider such an approach in our simulation studies
and applications, and compare it with the “standard” approach of simply
adjusting for these covariates in a multilevel model in an effort to improve
the estimate of the interviewer variance component (Hox, 1994).
Finally, the anchoring approach employs mixed-effects models that should
yield correct estimates with a sufficient amount of data. However, these models
may be more difficult to fit, especially for smaller samples, and we therefore
also consider alternative Bayesian approaches when evaluating the anchoring
approach.
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© His Majesty the King in Right of Canada as represented by the Minister of Industry, 2022
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa