5 A third alternative, "Use them both together�

Ken Brewer

Previous | Next

Eventually, a third position was also offered, the one held by the present author, namely that since there were merits in both the design-based (or randomization-based) and the model-based (or prediction-based) approaches, and that since it was possible to combine them, the two should be used together. I had actually foreshadowed this possibility in Brewer (1963), a paper that provoked little interest at the time, but was later spotted and accorded recognition by J.N.K. Rao, at least to the extent that he invited me to visit him in Ottawa for six weeks in 1974.

To combine these two approaches was relatively simple. In each of them there was a variable y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyEaa aa@3B2A@  which was of central interest and a related or auxiliary variable x, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiEai aacYcaaaa@3BD9@  about which something additional was known that could be of assistance in estimating the value of that y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyEaa aa@3B2A@  variable. That "something additional� was typically the known population total of all the x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiEaa aa@3B29@  values, denoted by T x . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamivam aaBaaaleaacaWG4baabeaakiaac6caaaa@3CEA@  Consequently the relationship of central interest, was that which linked the crucial parameter β MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqOSdi gaaa@3BCC@  in equation (4.1) to its cosmetic estimator β ^ COS , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGafqOSdi MbaKaadaWgaaWcbaGaae4qaiaab+eacaqGtbaabeaakiaacYcaaaa@3F30@  namely

β ^ COS = s ( π i 1 1) y i s ( π i 1 1) x i ,       ( 5.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGafqOSdi MbaKaadaWgaaWcbaGaae4qaiaab+eacaqGtbaabeaakiabg2da9maa laaabaWaaabuaeaacaGGOaGaeqiWda3aa0baaSqaaiaadMgaaeaacq GHsislcaaIXaaaaOGaeyOeI0IaaGymaiaacMcacaWG5bWaaSbaaSqa aiaadMgaaeqaaaqaaiaadohaaeqaniabggHiLdaakeaadaaeqbqaai aacIcacqaHapaCdaqhaaWcbaGaamyAaaqaaiabgkHiTiaaigdaaaGc cqGHsislcaaIXaGaaiykaiaadIhadaWgaaWcbaGaamyAaaqabaaaba Gaam4Caaqab0GaeyyeIuoaaaGccaGGSaGaaCzcaiaaxMaadaqadaqa aabaaaaaaaaapeGaaGynaiaac6cacaaIXaaapaGaayjkaiaawMcaaa aa@5EE7@

where π i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqiWda 3aaSbaaSqaaiaadMgaaeqaaaaa@3D03@  is the probability that unit i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyAaa aa@3B1A@  is selected in the sample, or in the notation used by Särndal (2011),

β ^ COS = s ( d k 1) y i s ( d k 1) x i ,       ( 5.2 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGafqOSdi MbaKaadaWgaaWcbaGaae4qaiaab+eacaqGtbaabeaakiabg2da9maa laaabaWaaabuaeaacaGGOaGaamizamaaBaaaleaacaWGRbaabeaaki abgkHiTiaaigdacaGGPaGaamyEamaaBaaaleaacaWGPbaabeaaaeaa caWGZbaabeqdcqGHris5aaGcbaWaaabuaeaacaGGOaGaamizamaaBa aaleaacaWGRbaabeaakiabgkHiTiaaigdacaGGPaGaamiEamaaBaaa leaacaWGPbaabeaaaeaacaWGZbaabeqdcqGHris5aaaakiaacYcaca WLjaGaaCzcamaabmaabaaeaaaaaaaaa8qacaaI1aGaaiOlaiaaikda a8aacaGLOaGaayzkaaaaaa@59F2@

where his d k MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamizam aaBaaaleaacaWGRbaabeaaaaa@3C30@  is identical to my π i 1 . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqiWda 3aa0baaSqaaiaadMgaaeaacqGHsislcaaIXaaaaOGaaiOlaaaa@3F67@  The resulting estimator of the total Y= U y k MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamywai abg2da9maaqababaGaamyEamaaBaaaleaacaWGRbaabeaaaeaacaWG vbaabeqdcqGHris5aaaa@40DB@  is

Y ^ COS = s d k y k +( U x k s d k x k ) s ( d k 1 ) y k s ( d k 1 ) x k .       ( 5.3 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabmyway aajaWaaSbaaSqaaiaaboeacaqGpbGaae4uaaqabaGccqGH9aqpdaae qbqaaiaadsgadaWgaaWcbaGaam4AaaqabaGccaWG5bWaaSbaaSqaai aadUgaaeqaaaqaaiaadohaaeqaniabggHiLdGccqGHRaWkdaqadeqa amaaqafabaGaamiEamaaBaaaleaacaWGRbaabeaaaeaacaWGvbaabe qdcqGHris5aOGaeyOeI0YaaabuaeaacaWGKbWaaSbaaSqaaiaadUga aeqaaOGaamiEamaaBaaaleaacaWGRbaabeaaaeaacaWGZbaabeqdcq GHris5aaGccaGLOaGaayzkaaWaaSaaaeaadaaeqaqaamaabmqabaGa amizamaaBaaaleaacaWGRbaabeaakiabgkHiTiaaigdaaiaawIcaca GLPaaaaSqaaiaadohaaeqaniabggHiLdGccaWG5bWaaSbaaSqaaiaa dUgaaeqaaaGcbaWaaabeaeaadaqadeqaaiaadsgadaWgaaWcbaGaam 4AaaqabaGccqGHsislcaaIXaaacaGLOaGaayzkaaGaamiEamaaBaaa leaacaWGRbaabeaaaeaacaWGZbaabeqdcqGHris5aaaakiaac6caca WLjaGaaCzcamaabmaabaaeaaaaaaaaa8qacaaI1aGaaiOlaiaaioda a8aacaGLOaGaayzkaaaaaa@7021@

Särndal (2011) also shows that these x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiEaa aa@3B29@  and y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyEaa aa@3B2A@  values can be related to each other in several different ways, but also shows that there is a common theme that runs through all of those ways. That common theme is that y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyEaa aa@3B2A@  increases linearly as x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiEaa aa@3B29@  increases, and that the extent of that linearity is measured by the parameter β MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqOSdi gaaa@3BCC@  in equation (4.1). Importantly, however, when β ^ COS MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGafqOSdi MbaKaadaWgaaWcbaGaae4qaiaab+eacaqGtbaabeaaaaa@3E76@  replaces β ^ BLUE MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGafqOSdi MbaKaadaWgaaWcbaGaaeOqaiaabYeacaqGvbGaaeyraaqabaaaaa@3F3C@  in Royall's prediction estimator, the estimator can be shown to be nearly unbiased under the design regardless of the validity of the assumed model.

Equation (5.2) can also be found explicitly on page 569 of Brewer (2011), immediately following its more general formula in matrix notation, namely

β ^ COS = [ X s Z s 1 ( Π s 1 I n ) X s ] 1 X s Z s 1 ( Π s 1 I n ) y s .       ( 5.4 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGafqOSdi MbaKaadaWgaaWcbaGaae4qaiaab+eacaqGtbaabeaakiabg2da9maa dmaabaGabmiwayaafaWaaSbaaSqaaiaadohaaeqaaOGaamOwamaaDa aaleaacaWGZbaabaGaeyOeI0IaaGymaaaakmaabmaabaGaeuiOda1a a0baaSqaaiaadohaaeaacqGHsislcaaIXaaaaOGaeyOeI0Iaamysam aaBaaaleaacaWGUbaabeaaaOGaayjkaiaawMcaaiaadIfadaWgaaWc baGaam4CaaqabaaakiaawUfacaGLDbaadaahaaWcbeqaaiabgkHiTi aaigdaaaGcceWGybGbauaadaWgaaWcbaGaam4CaaqabaGccaWGAbWa a0baaSqaaiaadohaaeaacqGHsislcaaIXaaaaOWaaeWaaeaacqqHGo audaqhaaWcbaGaam4CaaqaaiabgkHiTiaaigdaaaGccqGHsislcaWG jbWaaSbaaSqaaiaad6gaaeqaaaGccaGLOaGaayzkaaGaamyEamaaBa aaleaacaWGZbaabeaakiaac6cacaWLjaGaaCzcamaabmaabaaeaaaa aaaaa8qacaaI1aGaaiOlaiaaisdaa8aacaGLOaGaayzkaaaaaa@6A89@

When, the question arises as to how many explanatory variables should be used in the relevant model, Särndal (2011) makes an apparently disparaging distinction between "explanatory rich� and "explanatory poor� countries. He certainly treats those "explanatory poor� countries as being at a substantial disadvantage as a result of having relatively few "explanators�.

There is at least one "explanatory rich� country (Australia) that appears to have made a deliberate decision to ignore whatever advantages might be available to those that are "explanatory rich�. The current Australian procedure (the one used primarily to produce seasonally adjusted series) is to use only a single auxiliary variable, namely the latest available Census total, as the single "explanator�.

Earlier, Brewer (1999a) had also presented a case that it might be preferable to use a cosmetic regression estimator to compensate for any lack of balance, rather than go to the trouble of selecting balanced samples. However, those who prefer to use balanced sampling directly can now select randomly from among many balanced or nearly balanced samples using the "cube method� (Deville and Tillé 2004). That paper also contains several references to earlier methods of selecting balanced samples, but regardless of how the relevant balanced sample is arrived at, the ways in which it needs to be used are identical.

In Brewer and Gregoire (2009) all three of the relevant approaches to estimation (randomization alone, prediction alone, and the two together) are examined. At this point, it is convenient to quote from yet another paper of mine (Brewer 2005, pages 390-391) which sets out the reasons why I was, and still am, concerned to use both methods simultaneously, and how readily it can be done.

"Each approach has its merits, and there are advantages in using both together. Consider how each of these inferences works.

First, design-based inference. Consider the general case where the inclusion probabilities π i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqiWda 3aa0baaSqaaiaadMgaaeaaaaaaaa@3D03@  are known but may differ from unit to unit. In that case we can imagine the sampling statistician constructing a model of the population by looking at each of the sample units in turn and saying, Oh yes, you (the first unit) were included with one chance in 10, so my model of the population includes you and nine other non-sample units with the same Y k MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamywam aaBaaaleaacaWGRbaabeaaaaa@3C25@  value as you. But you (the second unit) you were included with only one chance in two, so my model includes you and only one other unit like you.

The consequence of using this procedure here was therefore that the model of the population in the sampler's mind would consist of two real sample units (one from each sample stratum) plus ten imaginary units, (nine from the stratum with a sample fraction of one in ten, plus one from the stratum with a sample fraction of one in two) and finally plus all the units from the completely enumerated stratum.

Brewer (2005, page 391) continues as follows: "So even design-based estimation can be thought of as being based on a model, but on a model quite different from the prediction models… that are favoured by the so-called model-based school. More accurately that school should be described as prediction-based and the design-based school should be described as randomization-based. Each school uses a model, but one uses a prediction model and the other a randomization model.�

The randomization-based approach described above is the one that was used for the selection of two sample units (one from each sampled stratum) plus all the units in the completely enumerated stratum. It also gave rise to the well-known Horvitz-Thompson estimator, which may be written

T ^ HT = is Y i π i = i=1 N δ i Y i π i       ( 5.5 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGabmivay aajaWaaSbaaSqaaiaabIeacaqGubaabeaakiabg2da9maaqahabaWa aSaaaeaacaWGzbWaaSbaaSqaaiaadMgaaeqaaaGcbaGaeqiWda3aaS baaSqaaiaadMgaaeqaaaaaaeaacaWGPbGaeyicI4Saam4Caaqaaaqd cqGHris5aOGaeyypa0ZaaabCaeaacqaH0oazdaWgaaWcbaGaamyAaa qabaGcdaWcaaqaaiaadMfadaWgaaWcbaGaamyAaaqabaaakeaacqaH apaCdaWgaaWcbaGaamyAaaqabaaaaaqaaiaadMgacqGH9aqpcaaIXa aabaGaamOtaaqdcqGHris5aOGaaCzcaiaaxMaadaqadaqaaabaaaaa aaaapeGaaGynaiaac6cacaaI1aaapaGaayjkaiaawMcaaaaa@5C2D@

where δ i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqiTdq 2aaSbaaSqaaiaadMgaaeqaaaaa@3CEA@  is an inclusion indicator taking the value "one� if the i th MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamyAam aaCaaaleqabaGaaeiDaiaabIgaaaaaaa@3D29@  unit is either in the sample or in the completely enumerated sector, and the value "zero� otherwise. In this particular case it is defined over both the two sampled units and also all the units in the completely enumerated sector. [This last sentence corrects the error mentioned above.]

Statisticians of the prediction-based school ridicule the use of randomization-based inference because the inclusion probabilities are chosen arbitrarily by the sample designer, and are therefore unable (they say) to tell us anything meaningful about the population! They prefer instead to use the Best Linear Unbiased Estimator (BLUE) of the regression parameter β MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqOSdi gaaa@3BCC@  as a step towards arriving at the Best Linear Unbiased Predictor (BLUP) of T. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamivai aac6caaaa@3BB7@  It is a predictor, because T MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamivaa aa@3B05@  is a random variable under the model, not a parameter.

Which is then the better estimator of T, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamivai aacYcaaaa@3BB5@  the HT or the BLUP? The BLUP is the better if the prediction model holds exactly, and is much the better if both the sample and the population are small. However there will always be some sample size beyond which the HT is the more efficient estimator unless the model holds exactly.

Previous | Next

Date modified: