Considering interviewer and design effects when planning sample sizes
Section 4. Conclusions
Using a design
effect to select a sample size is a commonly used method to account for the
loss of efficiency that a complex sampling design might entail. However, the
design effect can be inflated by an interviewer effect in face-to-face surveys.
This can lead to erroneous conclusions about the effect that complex sampling
has on the efficiency of a sampling strategy. As a consequence, this could lead
to misallocation of resources. The planned sample size might be too high, if it
is based on an overestimated design effect.
Therefore, we propose to consider both the design and the interviewer effect
simultaneously when planning a sample size. The survey effect, which we develop
in Section 2, accounts both for interviewer and PSU variance to assess the
efficiency of a survey design. Based on the survey effect we introduce a
corrected design effect, which uses as a reference design a simple random
sample with an interviewer effect. As a result, the corrected design effect is
no longer conflated with the interviewer effect and can be used to better base
the decision on the samples size on the effect the sampling design has on the
precision of survey estimates.
For ESS6, our
empirical findings in Section 3.2 show that high design effects are
related to high interviewer effects. The average corrected design effects that
we observe suggest that the sampling design influences the variance of an
estimator to a lesser degree than interviewers for many countries in the ESS6.
The ability to estimate the corrected design effect, e.g., from historical data
as guide for the survey planner, depends mainly on the PSU-interviewer
structure and the allocation of interviewer workloads and cluster sizes. We
find a partially interpenetrated survey design, i.e., on a regional level, can
be sufficient to disentangle PSU and interviewer variance. In our simulation
study an average number of 1.5 PSUs per interviewer or interviewers per PSU was
enough to estimate the variance components of measurement model
For actual survey data, that is categorical,
this level of interpenetration might not be high enough, but a high number of
PSUs, interviewers, and a large sample size might off-set a low
interpenetration. For practical applications, we recommend testing via
simulation if the assumed measurement model can be estimated with the given
PSU-interviewer structure, as we did in Section 3.1.
When using the survey effect and corrected design effect for the
planning of a sample size it can be helpful to work with the upper and lower
bounds of these statistics. In Section 2, we derive such bounds, but under
somewhat unrealistic assumptions regarding the distribution of survey weights,
interviewer workloads and PSU sizes. However, if realistic assumptions about
the concentration of survey weights, interviewer workloads and PSU sizes can be
made, then we propose to use a linear optimization, as shown in the Appendix,
to derive bounds that are of much higher practical relevance and can serve as
valuable guidance for survey planners. Generally, we recommend to have lowly
concentrated distributions of interviewer workloads and PSU cluster sizes in
order to increase the precision of survey estimates. Thus, interviewer
workloads and PSU cluster sizes should be as equal as possible for any given
number of interviewer and PSUs.
The measurement models we introduce in Section 2 are arguably
simplistic. This makes the models applicable to most survey designs. The only
information, besides the survey data, used to compute the estimates for Table 3.3
were the PSU and interviewer indicators. However, there are certain aspects of
survey measurements that could be incorporated into a practical measurement
model, such as stratification, which, in general, increases the efficiency of
an estimation strategy (Särndal et al., 1992, Section 3.7). This was
neglected in our analysis, despite the fact that many ESS6 countries used a
stratified design for their PSU sample. Gabler, Häder and Lynn (2006) develop a
design effect for estimation strategies that combine different sampling designs
for sampling domains. This approach could possibly be adapted to add a
stratification effect to the PSU variance. Furthermore, it might be plausible
to assume that interviewers differ with regard to the degree of homogeneity
that they add to their measurements. This interviewer heterogeneity could be
incorporated into a measurement model by allowing groups of interviewers to
have different distributions of
i.e., values for
(West and Elliott, 2014). However, a procedure
to classify interviewers would be needed. Preferably one that does mainly rely
on the survey data and not so much on information available about the
interviewers, which might differ from survey to survey.
A future application for the presented framework of the survey effect
would be to find an optimal budget allocation with respect to the number of
PSUs and interviewers, for a given effective sample size. Such an optimization
requires a cost model for the deployment of interviewers to a possible set of
PSUs. Fieldwork institutes could possibly provide the necessary information to
calculate such a model for a particular country. Such a method could help
survey planners to conduct face-to-face surveys more effectively, which is of
increasing importance as surveys based on probability samples are under
pressure from the comparably cheap alternative of recruiting respondents from
online-access panels.
Further research could also focus on the development of survey effect
for other estimators than the weighted sample mean. For estimators that can be
described as functions of estimated totals, which includes the Ordinary Least
Square Estimator for regression coefficients (Särndal et al., 1992,
Section 5.10), it should be possible to derive survey effects, under the
framework shown in Section 2, that allow for a similar factorization as
the survey effect presented in this work.
Appendix
For the Appendix we will introduce a short notation of multiple sums,
where, for example,
will be shorthand for
Results 1
Proof: We need to show
that
and
hold, if
and
for all
and
As shown in Gabler et al. (1999), if
for all
using the Cauchy-Schwarz inequality, we know
that
If we have
for all
then it follows that
The proof for inequality (A.2) is analogous to the one above, which
completes the proof of Result 1.
Upper bounds for
and
For given
and
and
with
for all
and
we can construct an upper bound for
and
We
know that
Now we need to find a sufficiently high value for
For this we define
and
Thus we have to solve the following problem:
where
where
means rounded to the nearest lower integer.
The problem formulated in equation (A.4) can be solved using a solver for
linear programs, e.g., with the solveLP function from the R package
Henningsen (2012). Function
gives a maximum of
given the upper and lower bounds of the
weights
and
and the fact that the weights are scaled to
i.e.,
The sum of squares is maximized by giving as
many weights their highest possible value
under the condition that each weight must have
at least a value of
and that
The problem can then be solved using a simplex
algorithm. An upper bound for
can be determined in the same fashion.
Changing the problem to minimization and a lower bound for
can be found. However, it is not guaranteed
that separate optimization of
and
will yield values of
that allow for a value of
that jointly maximizes (or minimizes)
and
Although, if,
and
are the vectors that optimizes
and
respectively, it should be possible to find a
possible value for
e.g., using iterative proportional fitting.
For
we have under the same assumptions as made
above
Result 2
Proof: The upper bound
in Result 2 can be shown by using the Cauchy-Schwarz inequality, which
gives us
With a some algebra we can formulate the upper bound of
To prove the lower bound in Result 2 we solve the following
problem:
A solution to the problem formulated in (A.7) can be found by
considering that if we have
and
it follows that
Thus for
we can increase
if we reduce any
by one and add one to
Hence, if
for all
and
then
is at its maximum, with
Result 4
Proof: Given Result 2,
to prove the right-hand side of Result 4 we need to show that
To prove inequality (A.8) we only need to show that
The rest follows from the proofs of inequalities (A.1) and (A.2). Thus
it is sufficient to show that
if
for all
which also follows from the Cauchy-Schwarz inequality.
Inequality (A.8) then follows if
for
and
The left-hand side of Result 4 follows from the proof of Result 6
in Gabler and Lahiri (2009) and Result 2.
ESS6 variables used for empirical evaluation
Table A.1
ESS6 variables used for empirical evaluation
Table summary
This table displays the results of ESS6 variables used for empirical evaluation % (appearing as column headers).
pplfair |
trstprt |
stfdem |
imueclt |
iorgact |
pplhlp |
trstep |
stfedu |
imwbcnt |
agea |
polintr |
trstun |
stfhlth |
happy |
gndr |
trstprl |
lrscale |
gincdif |
aesfdrk |
This is an empty cell |
trstlgl |
stflife |
freehms |
health |
This is an empty cell |
trstplc |
stfeco |
euftf |
rlgdgr |
This is an empty cell |
trstplt |
stfgov |
imbgeco |
wkdcorga |
This is an empty cell |
The definition of these variables including question text can be found
in ESS (2013).
References
Bates,
D.M., Mächler, M., Bolker, B.M. and Walker, S.C. (2015). Fitting linear
mixed-effects models using lme4. Journal
of Statistical Software, 67, 1, 1-48. https://doi.org/10.18637/jss.v067.i01.
Bates, D.M.,
Mächler, M., Bolker, B.M. and Walker, S.C. (2019). Lme4: Linear Mixed-Effects Models Using ‘Eigen’ and S4. https://CRAN.R-project.org/package=lme4.
Beullens,
K., and Loosveldt, G. (2016). Interviewer effects in the European social survey. Survey Research Methods, 10, 2, 103-118.
Biemer, P.P.
(2010). Total survey error: Design, implementation, and evaluation. Public Opinion Quarterly, Oxford
University Press, 74, 5, 817-848.
Chambers,
R.L., and Skinner, C.J. (2003). Analysis
of Survey Data. New York: John Wiley & Sons, Inc.
Chaudhuri,
A., and Stenger, H. (2005). Survey
Sampling: Theory and Methods. CRC Press.
Davis,
P., and Scott, A.(1995). The effect of interviewer variance on domain comparisons. Survey Methodology, 21, 2, 99-106. Paper
available at https://www150.statcan.gc.ca/n1/pub/12-001-x/1995002/article/14405-eng.pdf.
Ellis,
P.D. (2010). The Essential Guide to
Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of
Research Results. Cambridge University Press.
European
Social Survey (ESS) (2013). ESS6 Data Protocol.
1.4. London: ESS ERIC. http://www.europeansocialsurvey.org/data/download.html?r=6.
European
Social Survey (ESS) (2014a). European
Social Survey Round 6 Interviewer Questionnaire. Dataset edition: 2.1.
London: ESS ERIC.
European
Social Survey (ESS) (2014b). Weighting
European Social Survey Data. London: ESS ERIC. www.europeansocialsurvey.org/docs/methodology/ESS_weighting_data_1.pdf.
European Social Survey (ESS) (2014c). ESS6
- 2012 Documentation
Report. Edition: 2.3. London: ESS ERIC. http://www.europeansocialsurvey.org/docs/round6/survey/ESS6_data_documentation_report_e02_3.pdf.
European
Social Survey (ESS) (2016). European
Social Survey Round 6 Data. Dataset edition: 2.2. London: ESS ERIC.
European
Social Survey (ESS) (2018a). Countries by
Round (Year). London: ESS ERIC. http://www.europeansocialsurvey.org/data/country_index.html.
European
Social Survey (ESS) (2018b). Data and
Documentation by Round European Social Survey (ESS). London: ESS ERIC. http://www.europeansocialsurvey.org/data/download.html?r=6.
Fahrmeir, L., Heumann, C., Künstler, R., Pigeot, I. and
Tutz, G. (1997). Statistik: Der Weg Zur
Datenanalyse. 1sted. Berlin: Springer-Verlag.
Fischer, M., West, B.T., Elliott, M.R. and
Kreuter, F. (2018). The impact of interviewer effects
on regression coefficients. Journal of
Survey Statistics and Methodology, May. https://doi.org/10.1093/jssam/smy007.
Gabler,
S., Häder, S. and Lahiri, P. (1999). A model based justification of Kish’s formula
for design effects for weighting and clustering. Survey Methodology, 25, 1, 105-106. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/1999001/article/4718-eng.pdf.
Gabler,
S., Häder, S. and Lynn, P. (2006). Design effects for multiple design samples. Survey Methodology, 32, 1, 115-120. Paper
available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2006001/article/9256-eng.pdf.
Gabler,
S., and Lahiri, P. (2009). On the definition and interpretation of interviewer variability
for a complex sampling design. Survey
Methodology, 35, 1, 85-99. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2009001/article/10886-eng.pdf.
Ganninger,
M. (2010). Design Effects: Model-Based
Versus Design-Based Approach. Edited by
GESIS -Leibniz-Institut für Sozialwissenschaften. Array 3.
Genz, A.,
Bretz, F., Miwa, T., Mi, X. and Hothorn, T. (2019). Mvtnorm: Multivariate Normal and T Distributions. https://CRAN.R-project.org/package=mvtnorm.
Groves,
R.M. (2009). Survey Methodology. 2nd ed. Wiley Series in Survey Methodology. Hoboken, New York: John Wiley&
Sons, Inc.
Groves, R.M.,
and Lyberg, L. (2010). Total survey error: Past, present, and future. Public Opinion Quarterly, Oxford
University Press, 74, 5, 849-879.
Henningsen, A. (2012). Linprog: Linear Programming/Optimization.
https://CRAN.R-project.org/package=linprog.
Kish,
L. (1962). Studies of interviewer variance for attitudinal variables. Journal of the American Statistical
Association, 57, 297, 92-115. https://doi.org/10.1080/01621459.1962.10482153.
Kish,
L. (1965). Survey Sampling. New York: John
Wiley & Sons, Inc.
Lohr,
S.L. (2014). Design effects for a regression slope in a cluster sample. Journal of Survey Statistics and Methodology,
2, 2, 97-125. https://doi.org/10.1093/jssam/smu003.
Lynn,
P., and Gabler, S. (2004). Approximations
to B* in the Prediction of Design Effects Due to Clustering. ISER Working
Paper Series.
Lynn,
P., Häder, S., Gabler, S. and Laaksonen, S. (2007). Methods for achieving equivalence
of samples in cross-national surveys: The European social survey experience. Journal of Official Statistics, 23, 1, 107.
O’Muircheartaigh,
C., and Campanelli, P. (1998). The relative impact of interviewer effects and sample
design effects on survey precision. Journal
of the Royal Statistical Society: Series A (Statistics in Society), 161, 1,
63-77.
Raudenbush,
S.W. (1993). A crossed random effects model for unbalanced data with applications
in cross-sectional and longitudinal research. Journal of Educational Statistics, 18, 4, 321-349. https://doi.org/10.2307/1165158.
R Core Team (2019). R: A Language and Environment for Statistical Computing. Vienna,
Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Särndal,
C.-E., Swensson, B. and Wretman, J. (1992). Model
Assisted Survey Sampling. New York: Springer-Verlag.
Scheipl, F., Greven, S. and Kuechenhoff,
H. (2008). Size and power of tests for a zero random effect variance or polynomial
regression in additive and linear mixed models. Computational Statistics & Data Analysis, 52, 7, 3283-3299.
Schnell,
R., and Kreuter, F. (2005). Separating interviewer and sampling-point effects. Journal of Official Statistics, 21, 3, 389-410.
The
ESS Sampling Expert Panel (2016). Sampling
Guidelines: Principles and Implementation for the European Social Survey.
London: ESS ERIC Headquarters. http://www.europeansocialsurvey.org/docs/round8/methods/ESS8_sampling_guidelines.pdf.
Vassallo,
R., Durrant, G. and Smith, P. (2017). Separating interviewer and area effects by
using a cross-classified multilevel logistic model: Simulation findings and implications
for survey designs. Journal of the Royal Statistical
Society: Series A (Statistics in Society), 180, 2, 531-550.
Von
Sanden, N.D. (2004). Interviewer Effects
in Household Surveys: Estimation and Design. Ph.d. Thesis, Wollongong:
University of Wollongong. http://ro.uow.edu.au/theses/312.
West,
B.T., and Blom, A.G. (2017). Explaining interviewer effects: A research synthesis. Journal of Survey Statistics and
Methodology, 5, 2, 175-211. https://doi.org/10.1093/jssam/smw024.
West,
B.T., and Elliott, M.R. (2014). Frequentist and Bayesian approaches for comparing
interviewer variance components in two groups of survey interviewers. Survey Methodology, 40, 2, 163-188. Paper available at https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2014002/article/14092-eng.pdf.