3. Sampling

Piero Demetrio Falorsi and Paolo Righi

Previous | Next

Let z k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaahQhada WgaaWcbaGaam4Aaaqabaaaaa@3ACD@  be a vector of auxiliary variables available for all k U . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadUgacq GHiiIZcaWGvbGaaiOlaaaa@3CAE@  A sampling design p ( s ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadchada qadeqaaiaadohaaiaawIcacaGLPaaaaaa@3C25@  is said to be balanced on the auxiliary variables if and only if it satisfies the following balancing equations

k s z k π k = k U z k ( 3.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaamaaqababa WaaSaaaeaacaWH6bWaaSbaaSqaaiaadUgaaeqaaaGcbaGaeqiWda3a aSbaaSqaaiaadUgaaeqaaaaaaeaacaWGRbGaeyicI4Saam4Caaqab0 GaeyyeIuoakiabg2da9maaqababaGaaCOEamaaBaaaleaacaWGRbaa beaaaeaacaWGRbGaeyicI4Saamyvaaqab0GaeyyeIuoakiaaywW7ca aMf8UaaGzbVlaaywW7caaMf8UaaiikaiaaiodacaGGUaGaaGymaiaa cMcaaaa@56AB@

for each sample s MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadohaaa a@39A6@  such that p ( s ) > 0 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadchada qadeqaaiaadohaaiaawIcacaGLPaaatCvAUfKttLearyWrPrgz5vhC GmfDKbacfaGae8Npa4JaaGimaaaa@4491@  (Deville and Tillé 2004). Depending on the auxiliary variables and the inclusion probabilities, equation (3.1) can be exactly or approximately satisfied in each possible sample; therefore, a balanced sampling design does not always exist. By specifying

z k = π k δ k , ( 3.2 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaahQhada WgaaWcbaGaam4AaaqabaGccqGH9aqpcqaHapaCdaWgaaWcbaGaam4A aaqabaGccaWH0oWaaSbaaSqaaiaadUgaaeqaaOGaaiilaiaaywW7ca aMf8UaaGzbVlaaywW7caaMf8UaaiikaiaaiodacaGGUaGaaGOmaiaa cMcaaaa@4D1F@

equations (3.1) become

k s δ k = k U π k δ k . ( 3.3 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaamaaqababa GaaCiTdmaaBaaaleaacaWGRbaabeaaaeaacaWGRbGaeyicI4Saam4C aaqab0GaeyyeIuoakiabg2da9maaqababaGaeqiWda3aaSbaaSqaai aadUgaaeqaaOGaaCiTdmaaBaaaleaacaWGRbaabeaaaeaacaWGRbGa eyicI4Saamyvaaqab0GaeyyeIuoakiaac6cacaaMf8UaaGzbVlaayw W7caaMf8UaaGzbVlaacIcacaaIZaGaaiOlaiaaiodacaGGPaaaaa@57C9@

In this case, the balancing equations state that the sample size achieved in each subpopulation U h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadwfada WgaaWcbaGaamiAaaqabaaaaa@3AA1@  is equal to the expected size. In different contexts, Ernst (1989) and Deville and Tillé (2004; page 905 Section 7.3), have proved that, (i) with the specification (3.2) and (ii) if the vector of the expected sample sizes, given by n = k U π k δ k , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaah6gacq GH9aqpdaaeqaqaaiabec8aWnaaBaaaleaacaWGRbaabeaakiaahs7a daWgaaWcbaGaam4AaaqabaaabaGaam4AaiabgIGiolaadwfaaeqani abggHiLdGccaGGSaaaaa@45CA@  includes only integer numbers, then a balanced sampling design always exists. Specification (3.2) defines sampling designs that guarantee equation (2.4), upon which we wish to focus on. Deville and Tillé (2004, pages 895 and 905), Deville and Tillé (2005, page 577) and Tillé (2006, page 168) have shown that several customary sampling designs may be considered as special cases of balanced sampling, by properly defining the vectors π MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaahc8aaa a@39FA@  and δ k MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaahs7ada WgaaWcbaGaam4Aaaqabaaaaa@3B0A@  of equation (3.2). These issues are illustrated in Remark 4.2 and in Section 6. Balanced samples may be drawn by means of the Cube method (Deville and Tillé 2004). This strongly facilitates the sample selection of incomplete stratified sampling designs that overcome the computational drawbacks of methods based on linear programming algorithms (Lu and Sitter 2002). The Cube method satisfies (3.1) exactly when (3.2) holds and n MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaah6gaaa a@39A5@  is a vector of integers. In the cases of SRSWOR and SSRSWOR, the standard sample selection methods can be used, as well as the Cube method. Deville and Tillé (2005) propose as approximation of the variance for the HT estimator, in the balanced sampling

E p ( t ^ ( d r ) t ( d r ) ) 2 [ N / ( N H ) ] [ k U ( 1 / π k 1 ) η ( d r ) k 2 ] ( 3.4 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadweada WgaaWcbaGaamiCaaqabaGcdaqadeqaaiqadshagaqcamaaBaaaleaa daqadeqaaiaadsgacaWGYbaacaGLOaGaayzkaaaabeaakiabgkHiTi aadshadaWgaaWcbaWaaeWabeaacaWGKbGaamOCaaGaayjkaiaawMca aaqabaaakiaawIcacaGLPaaadaahaaWcbeqaaiaaikdaaaGccqGHfj cqdaWadaqaamaalyaabaGaamOtaaqaamaabmaabaGaamOtaiabgkHi TiaadIeaaiaawIcacaGLPaaaaaaacaGLBbGaayzxaaWaamWaaeaada aeqaqaamaabmqabaWaaSGbaeaacaaIXaaabaGaeqiWda3aaSbaaSqa aiaadUgaaeqaaOGaeyOeI0IaaGymaaaaaiaawIcacaGLPaaacqaH3o aAdaqhaaWcbaWaaeWabeaacaWGKbGaamOCaaGaayjkaiaawMcaaiaa dUgaaeaacaaIYaaaaaqaaiaadUgacqGHiiIZcaWGvbaabeqdcqGHri s5aaGccaGLBbGaayzxaaGaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7 caGGOaGaaG4maiaac6cacaaI0aGaaiykaaaa@6FB5@

where E p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaadweada WgaaWcbaGaamiCaaqabaaaaa@3A99@  denotes the sampling expectation and

η ( d r ) k = y r k γ d k π k δ k [ A ( π ) ] 1 j U π j ( 1 / π j 1 ) δ j y r k γ d k ( 3.5 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiabeE7aOn aaBaaaleaadaqadeqaaiaadsgacaWGYbaacaGLOaGaayzkaaGaam4A aaqabaGccqGH9aqpcaWG5bWaaSbaaSqaaiaadkhacaWGRbaabeaaki abeo7aNnaaBaaaleaacaWGKbGaam4AaaqabaGccqGHsislcqaHapaC daWgaaWcbaGaam4AaaqabaGcceWH0oGbauaadaWgaaWcbaGaam4Aaa qabaGcdaWadaqaaiaahgeadaqadeqaaiaahc8aaiaawIcacaGLPaaa aiaawUfacaGLDbaadaahaaWcbeqaaiabgkHiTiaaigdaaaGcdaaeqa qaaiabec8aWnaaBaaaleaacaWGQbaabeaakmaabmqabaWaaSGbaeaa caaIXaaabaGaeqiWda3aaSbaaSqaaiaadQgaaeqaaOGaeyOeI0IaaG ymaaaaaiaawIcacaGLPaaacaWH0oWaaSbaaSqaaiaadQgaaeqaaOGa amyEamaaBaaaleaacaWGYbGaam4AaaqabaGccqaHZoWzdaWgaaWcba GaamizaiaadUgaaeqaaaqaaiaadQgacqGHiiIZcaWGvbaabeqdcqGH ris5aOGaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caGGOaGaaG4mai aac6cacaaI1aGaaiykaaaa@77E2@

with

A ( π ) = j U δ j δ j π j ( 1 π j ) . ( 3.6 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4HqaqFfpeea0xe9LqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9peuD0xXdbvk9qq=xd9qqaq=Jf9sr 0=vr0=vrWZqaaeaabiGaaiaacaqabeaadaqaaqaaaOqaaiaahgeada qadeqaaiaahc8aaiaawIcacaGLPaaacqGH9aqpdaaeqaqaaiaahs7a daWgaaWcbaGaamOAaaqabaGcceWH0oGbauaadaWgaaWcbaGaamOAaa qabaGccqaHapaCdaWgaaWcbaGaamOAaaqabaGcdaqadeqaaiaaigda cqGHsislcqaHapaCdaWgaaWcbaGaamOAaaqabaaakiaawIcacaGLPa aaaSqaaiaadQgacqGHiiIZcaWGvbaabeqdcqGHris5aOGaaiOlaiaa ywW7caaMf8UaaGzbVlaaywW7caaMf8UaaiikaiaaiodacaGGUaGaaG OnaiaacMcaaaa@5C59@

Recently, the simulation results in Breidt and Chauvet (2011) confirm that equation (3.4) represents a good approximation of the sampling variance when the balanced equations are satisfied exactly. Variance estimation is studied in Deville and Tillé (2005).

Previous | Next

Date modified: