4 The Third Controversy. "Sampling Inference: Model-assisted or Model-based?�

Ken Brewer

Previous | Next

It came as a considerable shock to the finite population sampling establishment when Royall (1970) issued his highly readable call to arms for the reinstatement of purposive sampling and prediction-based inference. To read this paper was to read Neyman (1934) being stood on its head. The identical issues were being considered but the opposite conclusions were being drawn.

By 1973, however, Royall had withdrawn the most extreme of his recommendations. This was that the best sample to select would be the one that was optimal in terms of a model represented by the following Equations:

Y i =β X i + U i       ( 4.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamywam aaBaaaleaacaWGPbaabeaakiabg2da9iabek7aIjaadIfadaWgaaWc baGaamyAaaqabaGccqGHRaWkcaWGvbWaaSbaaSqaaiaadMgaaeqaaO GaaCzcaiaaxMaadaqadaqaaabaaaaaaaaapeGaaGinaiaac6cacaaI XaaapaGaayjkaiaawMcaaaaa@48DC@

E( U i )=0       ( 4.2 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyram aabmqabaGaamyvamaaBaaaleaacaWGPbaabeaaaOGaayjkaiaawMca aiabg2da9iaaicdacaWLjaGaaCzcamaabmaabaaeaaaaaaaaa8qaca aI0aGaaiOlaiaaikdaa8aacaGLOaGaayzkaaaaaa@4565@

E( U i 2 )= σ 2 X i       ( 4.3 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyrai aacIcacaWGvbWaa0baaSqaaiaadMgaaeaacaaIYaaaaOGaaiykaiab g2da9iabeo8aZnaaCaaaleqabaGaaGOmaaaakiaadIfadaWgaaWcba GaamyAaaqabaGccaWLjaGaaCzcamaabmaabaaeaaaaaaaaa8qacaaI 0aGaaiOlaiaaiodaa8aacaGLOaGaayzkaaaaaa@49EF@

and

E( U i U j )=0.       ( 4.4 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyrai aacIcacaWGvbWaaSbaaSqaaiaadMgaaeqaaOGaamyvamaaBaaaleaa caWGQbaabeaakiaacMcacqGH9aqpcaaIWaGaaiOlaiaaxMaacaWLja WaaeWaaeaaqaaaaaaaaaWdbiaaisdacaGGUaGaaGinaaWdaiaawIca caGLPaaaaaa@47E7@

Such a sample would typically have consisted of the n MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBaa aa@3B1F@  largest units in the population as measured by their realized x i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiEam aaBaaaleaacaWGPbaabeaaaaa@3C43@  values, asking for trouble if the parameter β MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqOSdi gaaa@3BCC@  had not been close to constant over the entire range of the sizes of the population units.

In later articles (Royal and Herson 1973a, Royal and Herson 1973b, Cumberland and Royall 1981), Royall suggested that the chosen sample be "balanced,� in other words, that the moments of the sample x i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiEam aaBaaaleaacaWGPbaabeaaaaa@3C43@  should be as close as possible to the corresponding moments of the whole population. This formalized the much earlier notion that samples should be chosen purposively to resemble the population in miniature. The samples of Gini and Galvani had been chosen in something of the same way MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGacaGaaiaabeqaamaabeabaaGcbaacba qcLbwaqaaaaaaaaaWdbiaa=nbiaaa@39BD@  meaning here "something of the same way in intention�, but certainly not anything like the same success in execution.

For the most part, Royall's original stand remained unshaken. The business of a sampling statistician was to make a realistic model of the relevant population, design a sample to estimate its parameters, and make all inferences regarding that population in terms of those parameter estimates. The randomization-based concept of defining the variance of an estimator in terms of the variability of its estimates over all possible samples was to be discarded in favour of the prediction-based variance, which was sample-specific, and based on averaging all possible realizations of the chosen prediction model.

Regardless of what sample was drawn, Royall's estimator for a population total T y = U y i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamivam aaBaaaleaacaWG5baabeaakiabg2da9maaqababaGaamyEamaaBaaa leaacaWGPbaabeaaaeaacaWGvbaabeqdcqGHris5aaaa@4209@  had this prediction form:

t y = s y i + Us x i β ^ BLUE , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiDam aaBaaaleaacaWG5baabeaakiabg2da9maaqafabaGaamyEamaaBaaa leaacaWGPbaabeaaaeaacaWGZbaabeqdcqGHris5aOGaey4kaSYaaa buaeaacaWG4bWaaSbaaSqaaiaadMgaaeqaaaqaaiaadwfacqGHsisl caWGZbaabeqdcqGHris5aOGafqOSdiMbaKaadaWgaaWcbaGaaeOqai aabYeacaqGvbGaaeyraaqabaGccaGGSaaaaa@5035@

where β ^ BLUE = s y i / s x i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGafqOSdi MbaKaadaWgaaWcbaGaaeOqaiaabYeacaqGvbGaaeyraaqabaGccqGH 9aqpdaWcgaqaamaaqababaGaamyEamaaBaaaleaacaWGPbaabeaaae aacaWGZbaabeqdcqGHris5aaGcbaWaaabeaeaacaWG4bWaaSbaaSqa aiaadMgaaeqaaaqaaiaadohaaeqaniabggHiLdaaaaaa@4A3B@  was the best linear unbiased estimator for β MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqOSdi gaaa@3BCC@  based on the sample under model in equation (4.1). This is in prediction form since the y-values of Us MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyvai abgkHiTiaadohaaaa@3CEB@  are predicted by the model.  

Sampling statisticians had at no stage been slow to take sides in this debate. Now the battle-lines were drawn. The heat of the argument appears to have been exacerbated by language-blocks; for instance the words "expectation� and "variance� carried one set of connotations for randomization-based inference and quite a different set for prediction-based inference. So assertions made on one side appeared to those on the other side to be unintelligible nonsense.

A major establishment counter-attack was launched with an article by Hansen, Madow and Tepping (1983). A small (and by most standards undetectable) divergence from Royall's model was shown nevertheless to be capable of distorting the sample inferences substantially. The obvious counter would have been "But this distortion would not have occurred if the sample had been drawn in a balanced fashion.�

Previous | Next

Date modified: