4 The Third Controversy. "Sampling Inference: Model-assisted or Model-based?�
Ken Brewer
Previous | Next
It came as a considerable shock to the finite population sampling
establishment when Royall (1970) issued his highly readable call to arms for
the reinstatement of purposive sampling and prediction-based inference. To read
this paper was to read Neyman (1934) being stood on its head. The identical
issues were being considered but the opposite conclusions were being drawn.
By 1973, however,
Royall had withdrawn the most extreme of his recommendations. This was that the
best sample to select would be the one that was optimal in terms of a model
represented by the following Equations:
and
Such a sample would typically have consisted of the largest units in the population as measured by
their realized values, asking for trouble if the parameter had
not been close to constant over the entire range of the sizes of the population
units.
In later articles
(Royal and Herson 1973a, Royal and Herson 1973b, Cumberland and Royall 1981),
Royall suggested that the chosen sample be "balanced,� in other words, that the
moments of the sample should be as close as possible to
the corresponding moments of the whole population. This formalized the much
earlier notion that samples should be chosen purposively to resemble the
population in miniature. The samples of Gini and Galvani had been chosen in
something of the same way meaning
here "something of the same way in intention�, but certainly not anything like
the same success in execution.
For the most part,
Royall's original stand remained unshaken. The business of a sampling
statistician was to make a realistic model of the relevant population, design a
sample to estimate its parameters, and make all inferences regarding that
population in terms of those parameter estimates. The randomization-based
concept of defining the variance of an estimator in terms of the variability of
its estimates over all possible samples was to be discarded in favour of the
prediction-based variance, which was sample-specific, and based on averaging
all possible realizations of the chosen prediction model.
Regardless of what
sample was drawn, Royall's estimator for a population total had this prediction form:
where was the best linear unbiased
estimator for based on the sample under model
in equation (4.1). This is in prediction form since the y-values of are predicted by the model.
Sampling
statisticians had at no stage been slow to take sides in this debate. Now the
battle-lines were drawn. The heat of the argument appears to have been
exacerbated by language-blocks; for instance the words "expectation� and
"variance� carried one set of connotations for randomization-based inference
and quite a different set for prediction-based inference. So assertions made on
one side appeared to those on the other side to be unintelligible nonsense.
A major
establishment counter-attack was launched with an article by Hansen, Madow and
Tepping (1983). A small (and by most standards undetectable) divergence from
Royall's model was shown nevertheless to be capable of distorting the sample
inferences substantially. The obvious counter would have been "But this
distortion would not have occurred if the sample had been drawn in a balanced
fashion.�
Previous | Next