5 A third alternative, "Use them both together�

Ken Brewer

Eventually, a third position was also offered, the one held by the present author, namely that since there were merits in both the design-based (or randomization-based) and the model-based (or prediction-based) approaches, and that since it was possible to combine them, the two should be used together. I had actually foreshadowed this possibility in Brewer (1963), a paper that provoked little interest at the time, but was later spotted and accorded recognition by J.N.K. Rao, at least to the extent that he invited me to visit him in Ottawa for six weeks in 1974.

To combine these two approaches was relatively simple. In each of them there was a variable $y$ which was of central interest and a related or auxiliary variable $x,$ about which something additional was known that could be of assistance in estimating the value of that $y$ variable. That "something additional� was typically the known population total of all the $x$ values, denoted by $T_{x} .$ Consequently the relationship of central interest, was that which linked the crucial parameter $β$ in equation (4.1) to its cosmetic estimator ${\hat{β}}_{COS},$ namely

${\hat{β}}_{COS} = \frac{\sum_{s} (π_{i}^{- 1} - 1) y_{i}}{\sum_{s} (π_{i}^{- 1} - 1) x_{i}}, (5.1)$

where $π_{i}$ is the probability that unit $i$ is selected in the sample, or in the notation used by Särndal (2011),

${\hat{β}}_{COS} = \frac{\sum_{s} (d_{k} - 1) y_{i}}{\sum_{s} (d_{k} - 1) x_{i}}, (5.2)$

where his $d_{k}$ is identical to my $π_{i}^{- 1} .$ The resulting estimator of the total $Y = \sum_{U} y_{k}$ is

${\hat{Y}}_{COS} = \sum_{s} d_{k} y_{k} + (\sum_{U} x_{k} - \sum_{s} d_{k} x_{k}) \frac{\sum_{s} (d_{k} - 1) y_{k}}{\sum_{s} (d_{k} - 1) x_{k}} . (5.3)$

Särndal (2011) also shows that these $x$ and $y$ values can be related to each other in several different ways, but also shows that there is a common theme that runs through all of those ways. That common theme is that $y$ increases linearly as $x$ increases, and that the extent of that linearity is measured by the parameter $β$ in equation (4.1). Importantly, however, when ${\hat{β}}_{COS}$ replaces ${\hat{β}}_{BLUE}$ in Royall's prediction estimator, the estimator can be shown to be nearly unbiased under the design regardless of the validity of the assumed model.

Equation (5.2) can also be found explicitly on page 569 of Brewer (2011), immediately following its more general formula in matrix notation, namely

${\hat{β}}_{COS} = {[{X^{'}}_{s} Z_{s}^{- 1} (Π_{s}^{- 1} - I_{n}) X_{s}]}^{- 1} {X^{'}}_{s} Z_{s}^{- 1} (Π_{s}^{- 1} - I_{n}) y_{s} . (5.4)$

When, the question arises as to how many explanatory variables should be used in the relevant model, Särndal (2011) makes an apparently disparaging distinction between "explanatory rich� and "explanatory poor� countries. He certainly treats those "explanatory poor� countries as being at a substantial disadvantage as a result of having relatively few "explanators�.

There is at least one "explanatory rich� country (Australia) that appears to have made a deliberate decision to ignore whatever advantages might be available to those that are "explanatory rich�. The current Australian procedure (the one used primarily to produce seasonally adjusted series) is to use only a single auxiliary variable, namely the latest available Census total, as the single "explanator�.

Earlier, Brewer (1999a) had also presented a case that it might be preferable to use a cosmetic regression estimator to compensate for any lack of balance, rather than go to the trouble of selecting balanced samples. However, those who prefer to use balanced sampling directly can now select randomly from among many balanced or nearly balanced samples using the "cube method� (Deville and Tillé 2004). That paper also contains several references to earlier methods of selecting balanced samples, but regardless of how the relevant balanced sample is arrived at, the ways in which it needs to be used are identical.

In Brewer and Gregoire (2009) all three of the relevant approaches to estimation (randomization alone, prediction alone, and the two together) are examined. At this point, it is convenient to quote from yet another paper of mine (Brewer 2005, pages 390-391) which sets out the reasons why I was, and still am, concerned to use both methods simultaneously, and how readily it can be done.

"Each approach has its merits, and there are advantages in using both together. Consider how each of these inferences works.

First, design-based inference. Consider the general case where the inclusion probabilities $π_{i}^{}$ are known but may differ from unit to unit. In that case we can imagine the sampling statistician constructing a model of the population by looking at each of the sample units in turn and saying, Oh yes, you (the first unit) were included with one chance in 10, so my model of the population includes you and nine other non-sample units with the same $Y_{k}$ value as you. But you (the second unit) you were included with only one chance in two, so my model includes you and only one other unit like you.�

The consequence of using this procedure here was therefore that the model of the population in the sampler's mind would consist of two real sample units (one from each sample stratum) plus ten imaginary units, (nine from the stratum with a sample fraction of one in ten, plus one from the stratum with a sample fraction of one in two) and finally plus all the units from the completely enumerated stratum.

Brewer (2005, page 391) continues as follows: "So even design-based estimation can be thought of as being based on a model, but on a model quite different from the prediction models… that are favoured by the so-called model-based school. More accurately that school should be described as prediction-based and the design-based school should be described as randomization-based. Each school uses a model, but one uses a prediction model and the other a randomization model.�

The randomization-based approach described above is the one that was used for the selection of two sample units (one from each sampled stratum) plus all the units in the completely enumerated stratum. It also gave rise to the well-known Horvitz-Thompson estimator, which may be written

${\hat{T}}_{HT} = \sum_{i \in s}^{} \frac{Y_{i}}{π_{i}} = \sum_{i = 1}^{N} δ_{i} \frac{Y_{i}}{π_{i}} (5.5)$

where $δ_{i}$ is an inclusion indicator taking the value "one� if the $i^{th}$ unit is either in the sample or in the completely enumerated sector, and the value "zero� otherwise. In this particular case it is defined over both the two sampled units and also all the units in the completely enumerated sector. [This last sentence corrects the error mentioned above.]

Statisticians of the prediction-based school ridicule the use of randomization-based inference because the inclusion probabilities are chosen arbitrarily by the sample designer, and are therefore unable (they say) to tell us anything meaningful about the population! They prefer instead to use the Best Linear Unbiased Estimator (BLUE) of the regression parameter $β$ as a step towards arriving at the Best Linear Unbiased Predictor (BLUP) of $T .$ It is a predictor, because $T$ is a random variable under the model, not a parameter.

Which is then the better estimator of $T,$ the HT or the BLUP? The BLUP is the better if the prediction model holds exactly, and is much the better if both the sample and the population are small. However there will always be some sample size beyond which the HT is the more efficient estimator unless the model holds exactly.

Previous | Next

Date modified:: 2017-09-20

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

5 A third alternative, "Use them both together�