Considering interviewer and design effects when planning sample sizes
Section 1. Introduction

Determining the sample size of a survey can be very demanding. The complexity of the task is often exacerbated by a lack of information and data on which to plan the survey. That is why survey planners seek to reduce the complexity of the problem using simplifications and statistical models. One such approach is to use the so-called design effect to select a sample size. The design effect is then defined as the ratio between the variance of an estimator under the sampling design of the planned survey and the variance of the same estimator under a simple random sample design. As such, the design effect is a property of an estimation strategy, i.e., a sampling design and an estimator (Chaudhuri and Stenger, 2005, page 4), not of the survey. The weighted sample mean of a single variable is usually used as a reference estimator. However, for reasons of simplification, if we speak in the following of the design effect of a sampling design, then we do this always with respect to the sampling variance of a weighted sample mean.

To plan the sample size, an effective sample size target can be set, meaning that the planned sample size divided by the planned design effect should be above a certain value. The effective sample size of a sampling design is the simple random sample equivalent of its sample size, in terms of efficiency, i.e., if a sampling design has an effective sample size of 1,000, then its sampling variance is equal to that of a simple random sample of size 1,000.

Ideally, a survey planner designs a survey with a specific analysis or hypotheses test in mind and formulates their opinion about tolerable sampling error levels or type II error probabilities. These opinions should be based on two things. First, some level of experience with the substantial research question, and second, on assumptions over target population parameters necessary for sampling error planning and power calculations. Assumptions about target population parameters can stem from previous rounds of a survey, or be based on data collected during the field test for the survey. Power calculations and sampling error planning are much less complex and require less information about the target population if done under the assumption of a simple random sampling design. That is why most methods addressing sample size planning found in textbooks are suited for determining an effective sample size. The effect of complex sampling is then factored in by multiplying the planned effective sample size with a planned design effect. Determining a design effect can thus be separated from selecting an effective sample size. For example, if a simple random sample of size 1,000 ensures the following: The sampling error of an estimator does not exceed a given value with a probability of 95%, or that the power of a statistical test is 80%, that is, the probability of rejecting a null hypothesis in case the alternative is true should be 80% (Ellis, 2010, Chapter 3). Then multiplying 1,000 by the assumed design effect of the a study will give the survey planner the required net sample size to achieve set precision targets.

The decision on an effective sample size also has to reflect a certain trade-off between the cost of the survey and the precision of survey estimates. Regarding this trade-off, the survey planner should, for example, consider what the consequences are if a type II error is committed, i.e., if a null hypothesis is not rejected even though the alternative hypothesis is true.

For surveys that are primarily intended for secondary analysis, i.e., they provide data to the research community with no single application in mind, like the European Social Survey (ESS) or the European Value Study (EVS), the decision on an effective sample size cannot be planned for a single research question or hypothesis test. For that reason, the ESS uses an average effective sample size. This means that ESS sample designs are planned such that the average design effect for a set of items from the ESS core questionnaire should have a certain value. The planned average design effect is multiplied by the required average effective sample size to calculate the planned net sample size. The net sample size is the sample size after unit-nonresponse, i.e., the number of completed interviews. To plan the gross sample size - that is, the sample size before unit-nonresponse - the net sample size is divided by the product of the assumed response rate and eligibility rate. The eligibility rate is the fraction of sampled persons that belong to the target population, which can be lower than 100% because of sampling frame imperfections.

However, design effects can still be difficult to quantify, given the complexity of the sampling design. Hence, to reduce complexity, statistical models for survey data are used to approximate the design effect. Such models commonly try to incorporate the effect of cluster sampling, which can have a large effect on the sampling variance of estimates. Clusters can be spatial areas like settlements, organizational units like municipalities, or institutions such as hospitals and schools. They are either used as so-called Primary Sampling Units (PSUs), which are selected first and then an additional sampling takes place within them, or they are surveyed in their entirety. For example, the German ESS round 6 (ESS6) sampling design has two sampling stages. The PSUs are municipalities, and the secondary sampling units are persons registered within the municipalities. Variables of interest can often not be considered as identically distributed over all clusters in the population. In fact, it can be assumed that respondents within the same cluster are usually more similar to one another than those belonging to a different cluster. Kish (1965), page 162, gives the following formula for a design effect due to clustering:

deff = 1 + ( b 1 ) ρ . ( 1.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpgpC0xe9GqFf0xc9 qqpeuf0xe9q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr 0=vr0=edbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaqGKbGaaeyzaiaabAgacaqGMbGaaGjbVlabg2da9iaaysW7caaI XaGaaGjbVlabgUcaRiaaysW7daqadeqaaiaadkgacaaMe8UaeyOeI0 IaaGjbVlaaigdaaiaawIcacaGLPaaacaaMe8UaeqyWdiNaaiOlaiaa ywW7caaMf8UaaGzbVlaaywW7caaMf8UaaiikaiaaigdacaGGUaGaaG ymaiaacMcaaaa@58B9@

This design effect deff consists of two parameters, b MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpgpC0xe9GqFf0xc9 qqpeuf0xe9q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr 0=vr0=edbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGIbaaaa@36B0@ is typically an average cluster size in terms of realized respondents, and ρ , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpgpC0xe9GqFf0xc9 qqpeuf0xe9q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr 0=vr0=edbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaHbpGCcaGGSaaaaa@3839@ the intra-cluster correlation coefficient, which is a measure for the homogeneity of the measurements of a variable within the same cluster. ρ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpgpC0xe9GqFf0xc9 qqpeuf0xe9q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr 0=vr0=edbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaHbpGCaaa@3789@ can be defined using variance decomposition as the between-cluster variance divided by the sum of the within-cluster and between-cluster variances. The higher the variance between the clusters the higher ρ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpgpC0xe9GqFf0xc9 qqpeuf0xe9q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr 0=vr0=edbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaHbpGCaaa@3789@ will be.

To use deff when selecting a sample size, assumptions have to be made about the unknown parameter ρ . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpgpC0xe9GqFf0xc9 qqpeuf0xe9q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr 0=vr0=edbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaHbpGCcaGGUaaaaa@383B@ The cluster size b MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpgpC0xe9GqFf0xc9 qqpeuf0xe9q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr 0=vr0=edbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacaWGIbaaaa@36B0@ does not depend on the measured variable and can be influenced by the survey planner. For ρ , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpgpC0xe9GqFf0xc9 qqpeuf0xe9q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr 0=vr0=edbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaHbpGCcaGGSaaaaa@3839@ data from previous surveys can be used to formulate the necessary assumption. Especially for repeated cross-sectional surveys, their accumulated data is of great help in planning the sampling design for the next implementation of the survey.

Lynn, Häder, Gabler and Laaksonen (2007) describe how predicted design effects are used by the ESS to plan sample sizes that achieve a certain average effective sample size under a given sampling design. For recent rounds of the ESS, the prediction of the design effect and its components was informed by estimates of these statistics based on data from the preceding ESS rounds (The ESS Sampling Expert Panel, 2016).

An important factor that can also introduce homogeneity to measurements in face-to-face surveys is the interviewer. Embedded in the Total Survey Error (TSE) framework (Groves, 2009), different mechanisms have been described for how an interviewer can influence survey measurements. Similar to cluster sampling, interviewers have long been identified as a source of dependent measurements (Kish, 1965, page 522, Kish, 1962),with interviewers introducing homogeneity through measurement errors and selection effects, rather than the homogeneity of clusters that is intrinsic to the population. West and Blom (2017) give an overview of the research on interviewer effects. They detail how interviewer tasks like generating and/or applying sampling frames, making contact, and gaining cooperation and consent can have a selection effect on the recruitment of respondents. West and Blom (2017) also outline evidence that interviewers conducting measurements, making observations and finally recording the gathered information can introduce measurement and processing errors into the data that is used for analysis. For an overview of other sources of variance in surveys, we refer to the TSE framework as described, e.g., by Groves and Lyberg (2010) and Biemer (2010).

Analysis of interviewer effects using ESS data from different countries and years showed that this effect can be considerable (Beullens and Loosveldt, 2016). Such findings raise a question: To what extent ρ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpgpC0xe9GqFf0xc9 qqpeuf0xe9q8qiYRWFGCk9vi=dbbf9v8Gq0db9qqpm0dXdHqpq0=vr 0=vr0=edbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8 qacqaHbpGCaaa@3789@ in equation (1.1) is driven by intra-cluster correlation, rather than intra-interviewer correlation? Schnell and Kreuter (2005) show that the interviewer effect can be higher than the cluster effect, even for variables where a strong spatial correlation can be assumed. Consequently, the estimated design effect for face-to-face surveys is typically conflated with the interviewer effect. Hence, the design effect is systematically over-estimated in face-to-face surveys. This might pose a problem to surveys that predict design effects using historical data to plan sample sizes, as there is a risk of misallocating funds. A survey planner could try to offset an increase in the predicted design effect by increasing the sample size to hold the effective sample size constant. If the driving factor inflating the predicted design effect is the interviewer effect, funds could be more effectively allocated by hiring additional interviewers and/or training them better to improve measurement accuracy and reduce selection effects.

The novel part of the presented approach is that the proposed method allows for estimating a corrected design effect that is not conflated with the interviewer effects. With the proposed corrected design effect, the survey planner is able to make evidence-based decisions on changes in the sampling design, such as sample size and number of PSUs, and/or about the deployment of interviewers.

The article is structured as follows: Section 2 introduces the framework for describing the effects of the sampling design and the interviewer. The framework follows the model based justification of the design effect as outlined by Gabler, Häder and Lahiri (1999) and the introduction of an interviewer effect to this framework by Gabler and Lahiri (2009). The measurement models used to describe the observed data follow a multilevel structure. The influence of multi-stage or cluster sampling, and that of interviewers on the observed data, is modeled with the help of random effects that imply a certain variance-covariance structure. This approach allows for a factorization of the overall effect into separate sampling and interviewer effects. This separation is essential when addressing effects separately in order to control for them.

In Section 3, the sampling and interviewer effects described in Section 2 are estimated for ESS6 data with the help of multilevel models. First, we present the results from a simulation study conducted to assess the possibility of disentangling cluster and interviewer variances for the observed PSU-interviewer structure in the ESS6 data. Afterwards, we evaluate the applicability of the different measurement models for a selected set of ESS variables. The selected models are used to estimate the variances of different random effects in multilevel models, which are in turn used for estimating the intra-PSU and intra-interviewer correlation.

In Section 4, we present our conclusions and give recommendations for survey planners based on both our theoretical work in Section 2 and the empirical findings in Section 3. We then point to possible future research to adapt our relatively simplistic measurements models to better reflect complex sampling designs and the heterogeneity of interviewers.


Date modified: