The anchoring method: Estimation of interviewer effects in the absence of interpenetrated sample assignment
Section 3. The anchoring method

Table of contents

As noted in Section 2.3, existing methods adjust for possible interviewer effects introduced at the recruitment and measurement stages of data collection by including respondent- and area- or interviewer-level covariates in multilevel models (Hox, 1994), but such adjustment may be erroneous if part of the interviewer variance is simply arising due to non-interpenetrated sampling. As noted by Elliott and West (2015), if subjects with similar values on a variable of interest are assigned to interviewers in a non-random fashion $-$ for example, if a telephone interviewer working day shifts tends to interview older respondents, where age may be correlated with main variables of interest $-$ these variables will be correlated with specific interviewers. However, we are just re-ordering the random sample, not introducing measurement error in the manner described in Section 1, e.g., West and Blom (2017). Thus the actual data are not being altered, and there are no true interviewer effects: we term the resulting within-interviewer correlation “spurious” from a variance inflation perspective. Thus estimating interviewer effects while failing to account for differential sample assignment can lead to conservative inferences, resulting in misleadingly large estimates of interviewer variance, $p$ -values and confidence intervals that are too wide, and incorrect operational decisions based on predicted effects for individual interviewers.

To address this important gap in the literature, we describe an “anchoring” method that analysts can use to estimate the unique components of variance due to interviewer effects on selection and measurement. The method aims to leverage correlations between variables where interviewer measurement error is of concern and variables known ‒ or reasonably believed ‒ to be free of measurement error to remove the fraction of the within-interviewer correlation that is due to non-interpenetrated sample assignment. In the simplest case, if we have two variables, one $(Y_{1})$ treated as measurement error-free (the “anchor”) and one $(Y_{2})$ treated as possibly having interviewer-induced measurement error, and our objective is to estimate a mean of $Y_{2},$ we fit a multilevel model to the observed data for the two variables that includes a random interviewer effect only for the variable subject to measurement error:

$y_{i j k} = μ_{k} + I (k = 2) b_{i} + ε_{i j k} . (3.1)$

In (3.1), $i = 1, \dots, I$ indexes interviewers, $j = 1, \dots, J_{i}$ indexes respondents within interviewers, $k = 1, 2$ indexes the variable (1 = anchor, 2 = variable of interest), $b_{i} ~ N (0, σ_{b}^{2})$ is the interviewer effect, and

$(\begin{matrix} ε_{i j 1} \\ ε_{i j 2} \end{matrix}) ~ N ((\begin{matrix} 0 \\ 0 \end{matrix}), [\begin{matrix} σ_{1}^{2} & σ_{12} \\ σ_{12} & σ_{2}^{2} \end{matrix}]) .$

Our focus of inference in this manuscript is $μ_{2},$ although $σ_{b}^{2}$ or $b_{i}$ may also be of interest if the focus is on interviewer variance or determining individual interviewers who are contributing to that variance.

To provide a heuristic explanation of why this proposed “anchoring” approach works, assume that $y_{i j 1}$ and $y_{i j 2}$ net of $b_{i}$ are almost perfectly correlated. Since $y_{i j 1}$ lacks measurement error, it can serve as a proxy for the non-measurement error component of $y_{i j 2},$ absorbing artificial error in $y_{i j 2}$ induced in the ordering of the data. Lack of interpenetration means that estimating a linear mixed model using $y_{i j 2}$ only will yield an upwardly biased estimate of $σ_{b}^{2} .$ If $σ_{12} > 0,$ information will be available to reduce the bias in ${\hat{σ}}_{b}^{2},$ with large samples and high correlations between $ε_{i j 1}$ and $ε_{i j 2}$ yielding increasingly accurate estimates of $σ_{b}^{2}$ and thus of the true impact of the interviewer-induced measurement error on the variance of ${\hat{μ}}_{2} .$

This approach easily extends to the setting where $K - 1 \geq 2$ “anchoring” variables free from measurement error are available:

$y_{i j k} = μ_{k} + I (k = K) b_{i K 0} + ε_{i j k}, i = 1, \dots, I, j = 1, \dots, J_{i}, k = 1, \dots, K . (3.2)$

In this case, the first $K - 1$ variables are assumed to be free of interviewer measurement error and the $K^{th}$ variable is the variable of interest, $b_{i K 0} ~ N (0, τ^{2}),$ and ${(ε_{i j 1} \dots ε_{i j K})}^{T} ~ N_{K} (0, Σ),$ where $Σ$ is an unstructured $K \times K$ covariance matrix. Alternatively, instead of using (3.2) directly, we can reduce (3.2) back to the bivariate setting in (3.1) by replacing $Y_{1 i}$ with the best linear predictor of $Y_{K i}$ using the anchoring variables: ${\hat{Y}}_{K i} = E (Y_{K i} | Y_{1 i}, \dots, Y_{K - 1, i}) = {\hat{β}}^{T} X_{i}$ where ${\hat{X}}_{i} = {(Y_{1 i}, \dots, Y_{K - 1, i})}^{T}$ and $\hat{β} = {(X^{T} X)}^{- 1} X^{T} Y_{K} .$

3.1 Estimation remarks

One can use standard linear mixed model software (e.g., SAS PROC MIXED) to fit the models in (3.1) or (3.2) and obtain a restricted maximum likelihood (REML) point estimate ${\hat{μ}}_{2}$ together with an associated variance estimate. We have provided an annotated example of such code in the supplemental materials. Weights used to account for unequal probabilities of selection, non-response adjustment, and calibration to known population values can be incorporated using pseudo-maximum likelihood estimation (PML; Pfeffermann, Skinner, Holmes, Goldstein and Rasbash, 1998; Rabe-Hesketh and Skrondal, 2006) when fitting the models in (3.1) or (3.2). We would generally recommend that interviewers be assigned a weight of 1 when fitting weighted multilevel models of these forms, to mimic the notion of simple random sampling of interviewers from a hypothetical population of interviewers. The weights for respondents should be rescaled to sum to the final respondent count for each interviewer (Carle, 2009), and extensions of the PML method outlined by Veiga, Smith and Brown (2014) and Heeringa, West and Berglund (2017, Chapter 11) can be used to incorporate the rescaled weights in estimation of the residual covariance structure in (3.1) or (3.2). In multistage samples where interviewers cross geographic areas, cross-classified random effects models (Rasbash and Goldstein, 1994) can also be utilized.

3.2 The Bayesian anchoring method

In the presence of prior information on the parameters of interest in this model (e.g., in a repeated cross-sectional survey using interviewer administration), the models in (3.1) or (3.2) can also be fitted using a Bayesian approach to incorporate the prior information. In repeated surveys that carefully monitor interviewer performance, good predictions of individual interviewer effects based on the estimated variance component are important. Given historical data from a survey with the same essential design conditions, one can estimate the parameters of interest in (3.1) using this historical data, and then define informative prior distributions for these parameters. (Examples of these types of surveys would include high-quality government sponsored surveys with repeated cross-sectional data collections, such as the National Health Interview Survey or, for the example considered in this paper, the Behavioral Risk Factor Surveillance System.) Specifically, we consider a prior distribution on the interview effect standard deviation $σ_{b}$ that follows a half $t$ distribution (Gelman, 2006) with degrees of freedom $ν$ and standard deviation $s :$

$p (σ_{b} | ν, s) = \frac{2 Γ (\frac{ν + 1}{2})}{Γ (\frac{v}{2}) \sqrt{ν π s^{2}}} {(1 + \frac{σ_{b}^{2}}{ν s^{2}})}^{- \frac{ν + 1}{2}}, τ \geq 0. (3.3)$

Following Gelman, we assume $ν = 3,$ and we estimate $s$ based on prior estimates of interviewer effects. We consider standard weak priors for the fixed effect means: $p (μ_{k}) \overset{ind}{~} N (0, 10^{6})$ and for the residual variance:

$p (\begin{matrix} σ_{1}^{2} & σ_{12} \\ σ_{12} & σ_{2}^{2} \end{matrix}) ~ INV - WISHART (2, I) .$

This approach offers advantages relative to likelihood ratio testing approaches that rely on asymptotic theory, particularly for smaller samples. By using prior information to constrain the resulting posterior distribution for the interviewer variance components, it generally prevents extremely large draws of the variance component while not constraining the means or residual variances. It also constrains posterior draws of variance components to be greater than zero, enabling inference based on small components of variance, while frequentist model-fitting procedures generally fix such estimates of variance components to be exactly zero (which equates to a rather unreasonable assumption that each interviewer produces exactly the same survey estimate; West and Elliott, 2014). In such cases, the effects of interviewers (even if they are small) would be ignored completely; the Bayesian approach would still enable small effects to be integrated into the inference. The Bayesian approach also yields credible intervals for the interviewer variance components based on posterior draws.

3.3 Choosing anchoring variables

A key assumption underlying both the standard regression-based approach and the “anchoring” method is that selected variables are free from interviewer-induced error. Like the “missing at random” assumption in the missing data literature, we do not expect that there will often be cases where we can be certain of this, but that approximations may be available based on simple demographic measures (e.g., age) or other factual questions with simple response options (e.g., current employment) and little room for the introduction of interviewer error. The identification of error-free covariates in advance of data collection is an important substantive and methodological component of this approach, and prior methodological literature on the variables most prone to interviewer effects (West and Blom, 2017) can be consulted for this component of the approach.

As we note above, if we have multiple error-free covariates measured on the respondents, we can preserve their predictive power (and thus the correlation of the anchor’s residuals with the residuals of the variable of interest) by computing a linear predictor of the variable of interest from a linear model that includes fixed effects of all of the error-free covariates. We consider such an approach in our simulation studies and applications, and compare it with the “standard” approach of simply adjusting for these covariates in a multilevel model in an effort to improve the estimate of the interviewer variance component (Hox, 1994).

Finally, the anchoring approach employs mixed-effects models that should yield correct estimates with a sufficient amount of data. However, these models may be more difficult to fit, especially for smaller samples, and we therefore also consider alternative Bayesian approaches when evaluating the anchoring approach.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2022-06-21

Language selection

Search and menus

Search

The anchoring method: Estimation of interviewer effects in the absence of interpenetrated sample assignment
Section 3. The anchoring method

3.1 Estimation remarks

3.2 The Bayesian anchoring method

3.3 Choosing anchoring variables

The anchoring method: Estimation of interviewer effects in the absence of interpenetrated sample assignment Section 3. The anchoring method

3.1 Estimation remarks

3.2 The Bayesian anchoring method

3.3 Choosing anchoring variables

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

The anchoring method: Estimation of interviewer effects in the absence of interpenetrated sample assignment
Section 3. The anchoring method