4 The Third Controversy. "Sampling Inference: Model-assisted or Model-based?�

Ken Brewer

It came as a considerable shock to the finite population sampling establishment when Royall (1970) issued his highly readable call to arms for the reinstatement of purposive sampling and prediction-based inference. To read this paper was to read Neyman (1934) being stood on its head. The identical issues were being considered but the opposite conclusions were being drawn.

By 1973, however, Royall had withdrawn the most extreme of his recommendations. This was that the best sample to select would be the one that was optimal in terms of a model represented by the following Equations:

$Y_{i} = β X_{i} + U_{i} (4.1)$

$E (U_{i}) = 0 (4.2)$

$E (U_{i}^{2}) = σ^{2} X_{i} (4.3)$

and

$E (U_{i} U_{j}) = 0. (4.4)$

Such a sample would typically have consisted of the $n$ largest units in the population as measured by their realized $x_{i}$ values, asking for trouble if the parameter $β$ had not been close to constant over the entire range of the sizes of the population units.

In later articles (Royal and Herson 1973a, Royal and Herson 1973b, Cumberland and Royall 1981), Royall suggested that the chosen sample be "balanced,� in other words, that the moments of the sample $x_{i}$ should be as close as possible to the corresponding moments of the whole population. This formalized the much earlier notion that samples should be chosen purposively to resemble the population in miniature. The samples of Gini and Galvani had been chosen in something of the same way $-$ meaning here "something of the same way in intention�, but certainly not anything like the same success in execution.

For the most part, Royall's original stand remained unshaken. The business of a sampling statistician was to make a realistic model of the relevant population, design a sample to estimate its parameters, and make all inferences regarding that population in terms of those parameter estimates. The randomization-based concept of defining the variance of an estimator in terms of the variability of its estimates over all possible samples was to be discarded in favour of the prediction-based variance, which was sample-specific, and based on averaging all possible realizations of the chosen prediction model.

Regardless of what sample was drawn, Royall's estimator for a population total $T_{y} = \sum_{U} y_{i}$ had this prediction form:

$t_{y} = \sum_{s} y_{i} + \sum_{U - s} x_{i} {\hat{β}}_{BLUE},$

where ${\hat{β}}_{BLUE} = \sum_{s} y_{i} / \sum_{s} x_{i}$ was the best linear unbiased estimator for $β$ based on the sample under model in equation (4.1). This is in prediction form since the y-values of $U - s$ are predicted by the model.

Sampling statisticians had at no stage been slow to take sides in this debate. Now the battle-lines were drawn. The heat of the argument appears to have been exacerbated by language-blocks; for instance the words "expectation� and "variance� carried one set of connotations for randomization-based inference and quite a different set for prediction-based inference. So assertions made on one side appeared to those on the other side to be unintelligible nonsense.

A major establishment counter-attack was launched with an article by Hansen, Madow and Tepping (1983). A small (and by most standards undetectable) divergence from Royall's model was shown nevertheless to be capable of distorting the sample inferences substantially. The obvious counter would have been "But this distortion would not have occurred if the sample had been drawn in a balanced fashion.�

Previous | Next

Date modified:: 2017-09-20

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

4 The Third Controversy. "Sampling Inference: Model-assisted or Model-based?�