A note on the concept of invariance in two-phase sampling designs Section 1. Introduction

Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. It consists of first selecting a large sample from the population (typically using a rudimentary sampling design) in order to collect data on variables that are inexpensive to obtain and that are related to the characteristics of interest. The idea behind two-phase sampling is to create a pseudo-sampling frame richer in auxiliary information than the original sampling frame. Then, using the variables observed in the first phase, an efficient sampling procedure can be used to select a (typically small) subsample from the first-phase sample in order to collect the characteristics of interest. Two-phase sampling may also be helpful in a context of nonresponse as the set of respondents is often viewed as a second-phase sample.

We adopt the following notation: consider a population U MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyvaaaa@358A@ of size N . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOtaiaac6 caaaa@3635@ A vector I 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaCysamaaBa aaleaacaaIXaaabeaaaaa@3669@ is generated according to the sampling design F ( I 1 ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOramaabm aabaGaaCysamaaBaaaleaacaaIXaaabeaaaOGaayjkaiaawMcaaiaa iYcaaaa@397D@ where I 1 = ( I 11 , , I 1 N ) Τ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaCysamaaBa aaleaacaaIXaaabeaakiaai2dadaqadaqaaiaadMeadaWgaaWcbaGa aGymaiaaigdaaeqaaOGaaGilaiablAciljaaiYcacaWGjbWaaSbaaS qaaiaaigdacaWGobaabeaaaOGaayjkaiaawMcaamaaCaaaleqabaGa eyiPdqfaaaaa@4211@ denotes a vector of indicators such that I 1 i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBa aaleaacaaIXaGaamyAaaqabaaaaa@3753@ is either equal to 0 or 1. The first-phase sample, denoted by s 1 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4CamaaBa aaleaacaaIXaaabeaakiaaiYcaaaa@374F@ is the set of population units for which I 1 i = 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBa aaleaacaaIXaGaamyAaaqabaGccaaI9aGaaGymaaaa@38DF@ and n 1 = i U I 1 i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOBamaaBa aaleaacaaIXaaabeaakiaai2dadaaeqaqabSqaaiaadMgacqGHiiIZ caWGvbaabeqdcqGHris5aOGaaGPaVlaadMeadaWgaaWcbaGaaGymai aadMgaaeqaaOGaaiilaaaa@417D@ is the size of s 1 . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4CamaaBa aaleaacaaIXaaabeaakiaac6caaaa@374B@ Then, a vector I 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaCysamaaBa aaleaacaaIYaaabeaaaaa@366A@ is generated according to the sampling design F ( I 2 | I 1 ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOramaabm aabaWaaqGaaeaacaWHjbWaaSbaaSqaaiaaikdaaeqaaOGaaGPaVdGa ayjcSdGaaGjbVlaahMeadaWgaaWcbaGaaGymaaqabaaakiaawIcaca GLPaaacaaISaaaaa@3FEF@ where I 2 = ( I 21 , , I 2 N ) Τ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaCysamaaBa aaleaacaaIYaaabeaakiaai2dadaqadaqaaiaadMeadaWgaaWcbaGa aGOmaiaaigdaaeqaaOGaaGilaiablAciljaaiYcacaWGjbWaaSbaaS qaaiaaikdacaWGobaabeaaaOGaayjkaiaawMcaamaaCaaaleqabaGa eyiPdqfaaaaa@4214@ denotes the vector of indicators such that I 2 i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBa aaleaacaaIYaGaamyAaaqabaaaaa@3754@ is either equal to 0 or 1. The second-phase sample, denoted by s 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4CamaaBa aaleaacaaIYaaabeaaaaa@3690@ is the set of population units for which both I 1 i = 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBa aaleaacaaIXaGaamyAaaqabaGccaaI9aGaaGymaaaa@38DF@ and I 2 i = 1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBa aaleaacaaIYaGaamyAaaqabaGccaaI9aGaaGymaaaa@38E0@ and n 2 = i U I 1 i I 2 i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOBamaaBa aaleaacaaIYaaabeaakiaai2dadaaeqaqabSqaaiaadMgacqGHiiIZ caWGvbaabeqdcqGHris5aOGaaGPaVlaadMeadaWgaaWcbaGaaGymai aadMgaaeqaaOGaamysamaaBaaaleaacaaIYaGaamyAaaqabaaaaa@4372@ is the size of s 2 . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4CamaaBa aaleaacaaIYaaabeaakiaac6caaaa@374C@ In practice, note that the indicators I 2 i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBa aaleaacaaIYaGaamyAaaqabaaaaa@3754@ are not generated for the population units belonging to the set U s 1 . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyvaiabgk HiTiaadohadaWgaaWcbaGaaGymaaqabaGccaGGUaaaaa@3912@ However, at least conceptually, nothing precludes defining these indicators for the units outside the first-phase sample.

Let π 1 i = P ( I 1 i = 1 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiWda3aaS baaSqaaiaaigdacaWGPbaabeaakiaai2dacaWGqbWaaeWaaeaacaWG jbWaaSbaaSqaaiaaigdacaWGPbaabeaakiaai2dacaaIXaaacaGLOa Gaayzkaaaaaa@3FA0@ and π 1 i j = P ( I 1 i = 1, I 1 j = 1 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiWda3aaS baaSqaaiaaigdacaWGPbGaamOAaaqabaGccaaI9aGaamiuamaabmaa baGaamysamaaBaaaleaacaaIXaGaamyAaaqabaGccaaI9aGaaGymai aaiYcacaWGjbWaaSbaaSqaaiaaigdacaWGQbaabeaakiaai2dacaaI XaaacaGLOaGaayzkaaaaaa@4575@ be the first-order and second-order selection probabilities at the first-phase. Similarly, let π 2 i ( I 1 ) = P ( I 2 i = 1 | I 1 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiWda3aaS baaSqaaiaaikdacaWGPbaabeaakmaabmaabaGaaCysamaaBaaaleaa caaIXaaabeaaaOGaayjkaiaawMcaaiaai2dacaWGqbWaaeWaaeaaca WGjbWaaSbaaSqaaiaaikdacaWGPbaabeaakiaai2dacaaIXaGaaGPa VpaaeeaabaGaaGjbVlaahMeadaWgaaWcbaGaaGymaaqabaaakiaawE a7aaGaayjkaiaawMcaaaaa@495D@ and π 2 i j ( I 1 ) = P ( I 2 i = 1, I 2 j = 1 | I 1 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiWda3aaS baaSqaaiaaikdacaWGPbGaamOAaaqabaGcdaqadaqaaiaahMeadaWg aaWcbaGaaGymaaqabaaakiaawIcacaGLPaaacaaI9aGaamiuamaabm aabaGaamysamaaBaaaleaacaaIYaGaamyAaaqabaGccaaI9aGaaGym aiaaiYcacaWGjbWaaSbaaSqaaiaaikdacaWGQbaabeaakiaai2daca aIXaGaaGPaVpaaeeaabaGaaGjbVlaahMeadaWgaaWcbaGaaCymaaqa baaakiaawEa7aaGaayjkaiaawMcaaaaa@4F32@ be the first-order and second-order selection probabilities at the second-phase. Note that the (first-order and second-order) selection probabilities at the second-phase may depend on the realized sample s 1 . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipu0de9LqFf0de9 vqaqFeFr0xbbG8FaYPYRWFb9fi0lXxbvc9=e0dfrpm0dXdHqVu0=vr 0=vr0=fdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4CamaaBa aaleaacaaIXaaabeaakiaac6caaaa@374B@

The paper is organized as follows. In Section 2, we define the concepts of weak and strong invariance and provide some examples. In Section 3, we discuss the implications of weak and strong invariance from an inferential point of view. In particular, we discuss the reverse decomposition of the variance in the case of a strongly invariant two-phase sampling design.

Date modified: