2 Analysis of embedded K x L factorial experiments
Jan A. van den Brakel
Previous | Next
2.1 Experimental designs
embedded in probability samples
In a factorial design, the effects of two factors are tested simultaneously.
The first factor, denoted A contains levels. The second factor,
denoted B contains levels. The purpose of the
experiment is to test the main effects of the two factors and the interactions
between both factors on the main parameter estimates of the ongoing survey. To
this end a probability sample of size is drawn from a finite target
population U of size N according the sample design of the
regular survey. This sample design can be generally complex, and is described
by its first order inclusion probabilities for unit i and second order inclusion probabilities for units i and i'.
Subsequently,
this sample is randomly divided into subsamples according to a
randomized experiment. In the case of a CRD, the sample of size is randomly divided into subsamples , each with a size of sampling units. The sampling
units of each subsample are assigned to one of the treatment combinations. Under a
CRD, denotes the total number of
sampling units in the sample . The probability that sampling unit i
is assigned to subsample , conditionally on the realization of , equals . The unconditional probability that sampling unit i is selected in subsample equals
The power
of an experiment might be improved by using sampling structures such as strata,
clusters or interviewers as block variables in an RBD since restricted
randomization removes the variance between the blocks from the analysis of the
experiment, (Fienberg and Tanur (1987, 1988)). In the case of an RBD, the
sampling units are deterministically grouped in B more or less homogeneous blocks
. Within each block, the sampling units are randomly assigned to one of
the treatment combinations. Let denote the number of sampling
units in block assigned to treatment combination
, and the number of sampling units in
block . The probability that sampling unit i
is assigned to subsample , conditionally on the realization of and , equals , . The unconditional probability that sampling unit i is selected in subsample equals
In many
practical applications one of the subsamples is assigned to the
regular survey and serves, besides being used to produce estimates for the
regular publication, as the control group in the experiment. In such
situations, the size of this subsample will be substantially larger than the
other subsamples.
There are a
lot of issues in the planning and design stage of embedded experiments. The
field staff, for example, requires special attention, since an embedded
experiment can have a large impact on their daily routine of data collection,
to which they are accustomed. See van den Brakel and Renssen (1998) and van den
Brakel (2008) for more details about such design issues.
Although
factorial designs are efficient from a statistical point of view, there might
be strong practical arguments against a factorial set-up. The number of
treatment combinations increases rapidly with the number of factors in full
factorial designs, which might be difficult to implement in the data collection
of a survey process. A general solution, known from standard experimental
design theory, is to confound higher order interactions with blocks or to apply
fractional factorial designs (Hinkelmann and Kempthorne (2005); Montgomery
(2001)). These balanced designs, however, are generally hard to combine with the
fieldwork restrictions encountered in the daily practice of survey sampling. In
many applications the factors that changed in a survey redesign are therefore
combined into one treatment. The total effect of these modifications is tested
against the standard alternative in a two-treatment experiment. This implies
that the effects of all factors in the experiment are confounded and cannot be
separately estimated.
2.2 Testing hypotheses about
finite population parameters
The purpose
of embedded experiments is to test whether alternative survey implementations
result in significantly different estimates for finite population parameters.
Such differences are the result of non-sampling errors, like measurement errors
and response bias. A measurement error model is required to link systematic
differences between finite population parameters due to different survey
implementations or treatments. Therefore the measurement error model for
single-factor experiments proposed by van den Brakel and Renssen (2005) and van
den Brakel (2008) is extended to factorial designs.
Let denote the observation obtained
from the individual observed under the treatment combination and the interviewer. It is assumed that
the observations are a realization of the measurement error model
Here is the true intrinsic value of
the individual, the effect of the treatment combination and an error component. The model
also allows for interviewer effects, i.e. , where denotes a systematic interviewer
bias and the random effect of the interviewer, respectively. Let and denote the expectation and the
covariance with respect to the measurement error model. It is assumed that , , and that measurement errors between sampling units are independent.
Furthermore it is assumed that , and that random interviewer
effects between interviewers are independent. As a result the model allows for
correlated response between sampling units that are interviewed by the same
interviewer. The measurement error model allows for separate variances for
measurement errors under different treatment combinations and separate
variances for interviewers.
The
treatment effects can be interpreted as the bias in
the estimated population parameter if the true intrinsic population value of is measured by means of the survey implementation. The
treatment effect can be decomposed in the traditional way of an analysis of
variance for a two-way layout:
with the overall effect, and the main effects of treatment
factors and and the interactions between
treatment factors and . If the treatment effects are defined as fixed deviations from the
individuals' intrinsic value , then the overall mean equals zero. In that case corresponds with the bias
associated with the level of factor averaged over all levels of
factor , the bias associated with the level of factor , averaged over all levels of factor , and the additional bias associated
with the combination of the level of factor and the level of factor on top of and
The
following restrictions are required to identify model (2.2):
and
For each
sampling unit, a potential response variable is defined under each of the treatment combinations. Therefore
the measurement error model can be expressed in matrix notation as:
where , , a vector of order with each element equal to one
and . The sampling units are assigned to one of the treatment combinations
only, so only one of the responses of is actually observed. The model
assumptions specified above are stated as:
where is a
vector of order with
each element zero, a matrix
of order containing the variances of the measurement
errors , and a matrix
of order with
each element zero.
Let denote
the dimensional
vector of population means of defined
by (2.5). These are the values obtained under a complete enumeration of the
finite population under each of the treatment combinations and are defined as:
where denotes
the total number of interviewers available for the data collection and the
number of units assigned to the interviewer in the case of a complete
enumeration.
Only
systematic differences between the population parameters that are reflected by
the treatment effects should
lead to a rejection of the null hypotheses of no treatment effects. This is
accomplished by formulating hypotheses about in
expectation over the measurement error model, i.e.
Consequently,
hypotheses about main effects and interactions are formulated as
where denotes
an appropriate contrast matrix, and a vector
with elements equal to one and a dimension that is equal to the number of
contrasts (rows) defined by . The contrast matrix for the hypothesis about
the main effects of factor is
defined as
with the
identity matrix of order . Matrix defines
the contrasts between the levels
of factor , averaged over the levels
of factor . From (2.12) and due to restrictions (2.3) and
(2.4) it follows that the contrasts between the population parameters exactly
correspond to the contrasts between the main effects of the first factor:
The
contrast matrix for the hypothesis about the main effects of factor is
defined as
This matrix
defines the contrasts between the levels
of factor , averaged over the levels
of factor . From (2.12) and due to restrictions (2.3) and
(2.4) it follows that the contrasts between the population parameters exactly
correspond to the contrasts between the main effects of the second factor:
The
contrast matrices for the main effects use the first level of factors and as the
reference category. This implies that treatment combination is
considered as the control group in the experiment.
Interactions
between the two treatment factors are defined as the contrasts of factor between
the contrasts of factor or,
equivalently, as the contrasts of factor between
the contrasts of factor , Hinkelmann and Kempthorne (1994, chapter 11).
Therefore the contrast matrix for the hypothesis about the interactions between
factor and can be
defined as
This matrix contains the contrasts that define the interactions between
factor and . The contrasts between the population
parameters exactly correspond to the interactions between the first and the
second factor, since
Each element of this vector
defines one of the interactions, which neatly corresponds to the
contrasts between the interaction effects defined by (2.2). The first element
e.g. can be interpreted as the deviation of the treatment effect of the
particular combination of factor at level
2 and factor at level
2 from the two main effects of these factors.
2.3 Wald test
The
hypotheses specified in section 2.2, can be tested with a Wald test (Wald
1943), which is frequently applied in design-based testing procedures, see for
example Skinner, Holt and Smith (1989) or Chambers and Skinner (2003). If
denotes
a design-unbiased estimator for , the
contrast matrix , , or defined
in (2.14), (2.15) and (2.16), and the
covariance matrix of the contrasts between , then hypotheses can be tested with the Wald
statistic . The GREG estimators, proposed by van den
Brakel and Renssen (2005) and van den Brakel (2008) for single-factor
experiments are extended to embedded factorial designs in this section. For
notational convenience, the subscript q
will be omitted in , since there is no need to sum explicitly over
the interviewer subscript in most of the formulas developed in the rest of this
paper.
To apply
the model-assisted mode of inference to the analysis of embedded experiments,
it is assumed for each unit in the population that the intrinsic value in
measurement error model (2.5) is an independent realization of the following
linear regression model:
where H-vector with auxiliary
information, a H-vector with the regression
coefficients and the
residuals, which are independent random variables with variance . It is required that all are
known up to a common scale factor, that is , with known.
The GREG estimator for , based on the observations of subsample , is defined as (Särndal et al., 1992)
where,
denotes the
HT estimator for , the
finite population means of the auxiliary variables , and the HT
estimator for based on
the sample
units of subsample . Furthermore,
denotes the
HT-type estimator for the regression coefficients in (2.17) based on the sampling
units in subsample . In (2.19) and (2.20), are the
first order inclusion probabilities for the sampling units in the different subsamples, derived in subsection
2.1. Now is an
approximately design-unbiased estimator for and also
for by
definition.
Under the
null hypotheses that there are no treatment effects and no interactions, it
follows that . In that case, it might be efficient to substitute
for in the
GREG estimator (2.18) the pooled estimator
Since instead
of regression coefficients have to be estimated,
the pooled estimates of the regression coefficients will be
more precise, particularly in the case of small subsamples. Note, however, that
many commonly used weighting schemes meet the condition that a constant vector exists
such that for all . In this situation the GREG estimator reduces
to the simplified form
(Särndal et al. 1992, section 6.5). Under
this simplified form, the treatment effects are completely included in the
regression coefficients. In case of the pooled estimator (2.21), the
GREG
estimators are exactly equal by definition, since for all k and l.
An
expression for the covariance matrix of the contrasts between the elements of where
the covariance is taken over the sampling design, the experimental design and
the measurement error model, is given by
where denotes
the expectation with respect to the sampling design, and a diagonal
matrix with diagonal elements
in the case
of a CRD and
in the case
of an RBD. An estimator for can be
derived from the experimental design, conditionally on the measurement error
model and the sampling design. Therefore the covariance matrix (2.22) is
conveniently stated implicitly as the expectation over the measurement error
model and the sampling design. A design-based estimator for this covariance
matrix is given by
with a diagonal
matrix with elements
in the case
of a CRD and
in the case
of an RBD. Proofs for (2.22) and (2.25) are given by van den Brakel (2010) and
resemble the derivation of the covariance matrix for single factor experiments,
given by van den Brakel and Renssen (2005) and van den Brakel(2008).
The results
for (2.22) and(2.25) are obtained under the condition that a constant H-vector exists
such that for all . This is a rather weak condition, since it
implies that a weighting model is used that at least uses the size of the
finite population as a priori information. See van den Brakel and Renssen
(2005) or van den Brakel (2008) for a more detailed discussion.
Since the
subsamples are drawn without replacement from
a finite population, there is a nonzero design covariance between elements of . From that point of view, it is remarkable
that (2.25) has a structure as if the subsamples are drawn independently
through sampling with replacement using unequal selection probabilities. This
gives rise to an attractive variance estimation procedure for embedded
experiments, since no design covariances between the subsample estimates appear
in (2.25) and no second order inclusion probabilities are required in the
variance estimators (2.26) and (2.27). This result is obtained since the covariance
matrix of the contrasts between is
derived instead of the covariance matrix of
itself.
A detailed interpretation of this result is given by van den Brakel and Renssen
(2005) or van den Brakel (2008). See van den Brakel and Binder (2000) and
Hidiroglou and Lavallée (2005) for approximations of the covariance matrix of
The
design-based estimators and can be used to
construct a design-based Wald statistic to test the hypotheses described in
section 2.2:
Design-based
inferences are generally based on normal large-sample approximations to
construct confidence intervals for point estimates or p-values and critical regions for test statistics. Under this
approach it follows under the null hypothesis that the Wald statistic is
asymptotically distributed as a central chi-squared random variable, where the
number of degrees of freedom equals the number of contrasts specified in the
hypothesis.
The Wald
statistic for the hypotheses about the main effects and interactions are given
by (2.28) using the contrast matrix , , or . Under the null hypothesis, it follows that for the
test about the main effects of factor , for the
test about the main effects of factor and for the
test about interactions, where denotes
a central chi-squared distributed random variable with p degrees of freedom.
The Wald
test for the main effects can be further simplified. Expressions are developed
for the Wald test for the main effects for factor . Similar expressions can be derived for the
main effects of factor . Denote
It follows
that and . With the matrix inversion lemma, the Wald
statistic for the main effects of factor can be
simplified to:
Finally
note that the HT estimator (2.19) does not meet the condition that a constant H-vector exists
such that for all . The minimum use of auxiliary information used
in the GREG estimator is obtained with a weighting scheme that only uses the
size of the finite population as a priori knowledge, i.e. and
(Särndal
et al. 1992, section 7.4). Under this weighting scheme it follows that
and
. Expression (2.31) can be recognized as
Hájek's ratio estimator for a population mean, (Hájek 1971). This weighting
scheme satisfies the condition that a constant H-vector
exists
such that for all . Therefore an approximately design-unbiased
estimator for the covariance matrix of the contrasts between subsample
estimates is given by (2.26) and (2.27) for a CRD and an RBD respectively,
where . Estimator (2.31) is preferable above the HT
estimator (2.19), since (2.31) is more stable and the covariance matrix of the
contrasts between (2.31) always has the relatively simple form of (2.25).
2.4 Special cases
It will be
shown for two special cases that the design-based Wald statistic is equal to
the F-test of a standard analysis of
variance. Therefore, an ANOVA-type
pooled variance estimator for the diagonal elements of should
be considered as an alternative for (2.26) or (2.27). Such a pooled variance
estimator for a CRD is given by
and for an
RBD by
Now
consider a CRD that is embedded in a self-weighted sample, i.e. , with equally sized subsamples, i.e. . The inclusion probabilities for all units in
the subsamples are given by . Let . Under Hájek's ratio estimator (2.31) and the
pooled variance estimator (2.32) it follows that , , and
The
parameter estimates of the levels
of factor averaged
over the levels
of factor are
denoted as
with . The diagonal elements of are now
given by
Let . Inserting (2.34) and (2.35) into (2.30),
gives rise to the following expression for the Wald statistic of the main
effects of factor
Note that
in
(2.36) corresponds with the F-statistic
for the main effects of an analysis of variance for the two-way layout with
interactions, (Scheffé 1959, chapter 4). Under the null hypothesis and the
assumption of normally and independently distributed errors, the F-statistic in the two-way layout
follows an F-distribution with
and degrees
of freedom, which is denoted as . If , then . Consequently the F-statistic and the Wald statistic have the same limit
distribution.
Now
consider an RBD that is embedded in a self-weighted sampling design with equal
subsample sizes, thus and , with . Let . Furthermore, it is assumed that the fraction
of sampling units assigned to each treatment combination within each block is
equal, i.e. , and that the block sizes are sufficiently
large to assume that . Under Hájek's ratio estimator (2.31) and the
pooled variance estimator (2.33) it follows that , , and
The
parameter estimates of the levels
of factor averaged
over the levels
of factor and the
blocks are denoted as
where . The diagonal elements of are
given by
Let . If these results are inserted into (2.30),
then the expression for the Wald statistic of the main effects of factor can be
simplified to
It can be
recognized that
in
(2.39) corresponds with the F-statistic
for the main effects of an analysis of variance for the three-way layout with
interactions, (Scheffé 1959, chapter 4). As in the case of a CRD, this Wald
and F-statistic have the same limit
distribution.
Previous | Next