Comments on “Statistical inference with non-probability survey samples” – Miniaturizing data defect correlation: A versatile strategy for handling non-probability samples
Section 2. A finite-population deterministic identity for actual error

To demonstrate the fruitfulness of the finite-population framework, consider the estimation of the population mean, denoted by G ¯ , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaaceWGhbGbaebacaGGSaaaaa@3355@  of { G i =G( X i ):iN}, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaaI7bGaam4ramaaBaaaleaacaWGPb aabeaakiaaysW7cqGH9aqpcaaMe8Uaam4raiaaykW7caaIOaGaamiw amaaBaaaleaacaWGPbaabeaakiaaiMcacaaI6aGaaGjbVlaaykW7ca WGPbGaaGjbVlabgIGiolaaysW7tCvAUfKttLearyat1nwAKfgidfgB SL2zYfgCOLhaiqGacqWFobGtcaaMc8UaaGyFaiaacYcaaaa@55D1@  where N={1,,N} MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaatCvAUfKttLearyat1nwAKfgidfgBSL 2zYfgCOLhaiqGacqWFobGtcaaMe8Uaeyypa0JaaGjbVlaaiUhacaaI XaGaaGilaiaaysW7cqWIMaYscaaISaGaaGjbVlaad6eacaaI9baaaa@49B7@  indexes a finite population, and the X i s MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGybWaaSbaaSqaaiaadMgaaeqaaG qaaOGaa8xgGiaabohaaaa@357B@  are data collected on individual i. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGPbGaaiOlaaaa@3361@  For each i, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGPbGaaiilaaaa@335F@  let R i =1 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGsbWaaSbaaSqaaiaadMgaaeqaaO GaaGjbVlabg2da9iaaysW7caaIXaaaaa@3897@  if G i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGhbWaaSbaaSqaaiaadMgaaeqaaa aa@33A7@  (or rather X i ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGybWaaSbaaSqaaiaadMgaaeqaaO Gaaiykaaaa@346F@  is recorded in our sample, and R i =0 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGsbWaaSbaaSqaaiaadMgaaeqaaO GaaGjbVlabg2da9iaaysW7caaIWaaaaa@3896@  otherwise; hence the sample size is n R = i=1 N R i . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGUbWaaSbaaSqaaiaadkfaaeqaaO GaaGjbVlabg2da9iaaysW7daaeWaqabSqaaiaadMgacaaI9aGaaGym aaqaaiaad6eaa0GaeyyeIuoakiaaykW7caWGsbWaaSbaaSqaaiaadM gaaeqaaOGaaiOlaaaa@4169@  We stress that this is an all-encompassing indicator, which can (and should) be decomposed into R i = r i (1) ,, r i (J) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGsbWaaSbaaSqaaiaadMgaaeqaaO GaaGjbVlabg2da9iaaysW7caWGYbWaa0baaSqaaiaadMgaaeaacaaI OaGaaGymaiaaiMcaaaGccaGGSaGaaGjbVlablAciljaacYcacaaMe8 UaamOCamaaDaaaleaacaWGPbaabaGaaGikaiaadQeacaaIPaaaaOGa aiilaaaa@46B4@  when the data collection consists of J MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGkbaaaa@3290@  stages (e.g., r i (1) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGYbWaa0baaSqaaiaadMgaaeaaca aIOaGaaGymaiaaiMcaaaaaaa@35F3@  indicates whether or not the i th MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGPbWaaWbaaSqabeaacaqG0bGaae iAaaaaaaa@34BE@  individual is sampled, and r i (2) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGYbWaa0baaSqaaiaadMgaaeaaca aIOaGaaGOmaiaaiMcaaaaaaa@35F4@  whether the individual responded or not once sampled).

Let { W i ,iS} MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaaI7bGaam4vamaaBaaaleaacaWGPb aabeaakiaaiYcacaaMe8UaamyAaiaaysW7cqGHiiIZcaaMe8Uaam4u aiaai2haaaa@3E74@  be a set of weights to be determined, where the index set S={i: R i =1}, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGtbGaaGjbVlabg2da9iaaysW7ca aI7bGaamyAaiaaiQdacaaMe8UaamOuamaaBaaaleaacaWGPbaabeaa kiaaysW7cqGH9aqpcaaMe8UaaGymaiaai2hacaGGSaaaaa@438A@  such that iS W i >0. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaadaaeqaqabSqaaiaadMgacaaMc8Uaey icI4SaaGPaVlaadofaaeqaniabggHiLdGccaaMc8Uaam4vamaaBaaa leaacaWGPbaabeaakiaaysW7cqGH+aGpcaaMe8UaaGimaiaac6caaa a@4328@  Let G ¯ W MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaaceWGhbGbaebadaWgaaWcbaGaam4vaa qabaaaaa@33AD@  be the weighted sample average, expressible in three ways:

G ¯ W = iS W i G i iS W i = i=1 N R i W i G i i=1 N R i W i = E I ( R ˜ I G I ) E I ( R ˜ I ) ,(2.1) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaaceWGhbGbaebadaWgaaWcbaGaam4vaa qabaGccaaMe8UaaGjbVlabg2da9iaaysW7caaMe8+aaSaaaeaadaae qaqaaiaaykW7caWGxbWaaSbaaSqaaiaadMgaaeqaaOGaam4ramaaBa aaleaacaWGPbaabeaaaeaacaWGPbGaaGPaVlabgIGiolaaykW7caWG tbaabeqdcqGHris5aaGcbaWaaabeaeaacaaMc8Uaam4vamaaBaaale aacaWGPbaabeaaaeaacaWGPbGaaGPaVlabgIGiolaaykW7caWGtbaa beqdcqGHris5aaaakiaaysW7caaMe8Uaeyypa0JaaGjbVlaaysW7da WcaaqaamaaqadabaGaaGPaVlaadkfadaWgaaWcbaGaamyAaaqabaGc caWGxbWaaSbaaSqaaiaadMgaaeqaaOGaam4ramaaBaaaleaacaWGPb aabeaaaeaacaWGPbGaaGPaVlaai2dacaaMc8UaaGymaaqaaiaad6ea a0GaeyyeIuoaaOqaamaaqadabaGaaGPaVlaadkfadaWgaaWcbaGaam yAaaqabaGccaWGxbWaaSbaaSqaaiaadMgaaeqaaaqaaiaadMgacaaM c8UaaGypaiaaykW7caaIXaaabaGaamOtaaqdcqGHris5aaaakiaays W7caaMe8Uaeyypa0JaaGjbVlaaysW7daWcaaqaaiaabweadaWgaaWc baGaamysaaqabaGccaaMc8UaaGikaiqadkfagaacamaaBaaaleaaca WGjbaabeaakiaadEeadaWgaaWcbaGaamysaaqabaGccaaIPaaabaGa aeyramaaBaaaleaacaWGjbaabeaakiaaykW7caaIOaGabmOuayaaia WaaSbaaSqaaiaadMeaaeqaaOGaaGykaaaacaaISaGaaGzbVlaaywW7 caaMf8UaaGzbVlaaywW7caGGOaGaaGOmaiaac6cacaaIXaGaaiykaa aa@9BC0@

where R ˜ I = R I W I , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaaceWGsbGbaGaadaWgaaWcbaGaamysaa qabaGccaaMe8Uaeyypa0JaaGjbVlaadkfadaWgaaWcbaGaamysaaqa baGccaWGxbWaaSbaaSqaaiaadMeaaeqaaOGaaiilaaaa@3C36@  and E I MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGfbWaaSbaaSqaaiaadMeaaeqaaa aa@3383@  is taken with respect to the uniform distribution over the index set N. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaatCvAUfKttLearyat1nwAKfgidfgBSL 2zYfgCOLhaiqGacqWFobGtqaaaaaaaaaWdbiaac6caaaa@3D27@  The first expression in (2.1) simply defines a weighted sample average. With the help of R i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGsbWaaSbaaSqaaiaadMgaaeqaaO Gaaiilaaaa@346C@  the second expression turns the sample averages into finite-population averages. This trivial re-expression is fundamental because it explicates the role of R i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGsbWaaSbaaSqaaiaadMgaaeqaaa aa@33B2@  in influencing the behavior of G ¯ W MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaaceWGhbGbaebadaWgaaWcbaGaam4vaa qabaaaaa@33AD@  as an estimator of G ¯ . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaaceWGhbGbaebacaGGUaaaaa@3357@  The third expression reveals a divine probability through I, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGjbGaaiilaaaa@333F@  the finite-population index (FPI) variable, by utilizing the fact that averaging is the same as taking expectation over a uniformly distributed random index I. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGjbGaaiOlaaaa@3341@  All finite-population moments then can be expressed via E I . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGfbWaaSbaaSqaaiaadMeaaeqaaO GaaiOlaaaa@343F@

In particular, we can express the actual error of G ¯ W MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaaceWGhbGbaebadaWgaaWcbaGaam4vaa qabaaaaa@33AD@  via the following identity, where the first expression can be traced back to Hartley and Ross (1954), who used it to express biases in ratio estimators. The second expression was given in Meng (2018) with a slightly different (but equivalent) expression:  

G ¯ W G ¯ = Cov I ( R ˜ I , G I ) E I [ R ˜ I ] = ρ R ˜ ,G × N n W n W × σ G .(2.2) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaaceWGhbGbaebadaWgaaWcbaGaam4vaa qabaGccaaMe8UaeyOeI0IaaGjbVlqadEeagaqeaiaaysW7caaMe8Ua eyypa0JaaGjbVlaaysW7daWcaaqaaiaaboeacaqGVbGaaeODamaaBa aaleaacaWGjbaabeaakiaaykW7caaIOaGabmOuayaaiaWaaSbaaSqa aiaadMeaaeqaaOGaaGilaiaaysW7caWGhbWaaSbaaSqaaiaadMeaae qaaOGaaGykaaqaaiaabweadaWgaaWcbaGaamysaaqabaGccaaMc8Ua aG4waiqadkfagaacamaaBaaaleaacaWGjbaabeaakiaai2faaaGaaG jbVlaaysW7cqGH9aqpcaaMe8UaaGjbVlabeg8aYnaaBaaaleaaceWG sbGbaGaacaGGSaGaaGjbVlaadEeaaeqaaOGaaGjbVlaaysW7cqGHxd aTcaaMe8UaaGjbVpaakaaabaWaaSaaaeaacaaMc8UaamOtaiaaysW7 cqGHsislcaaMe8UaamOBamaaBaaaleaacaWGxbaabeaaaOqaaiaad6 gadaWgaaWcbaGaam4vaaqabaaaaaqabaGccaaMe8UaaGjbVlabgEna 0kaaysW7caaMe8Uaeq4Wdm3aaSbaaSqaaiaadEeaaeqaaOGaaGOlai aaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaiikaiaaikdacaGGUaGa aGOmaiaacMcaaaa@8AA4@

Here ρ R ˜ ,G = Corr I ( R ˜ I , G I ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacqaHbpGCdaWgaaWcbaGabmOuayaaia GaaiilaiaaykW7caWGhbaabeaakiaaysW7cqGH9aqpcaaMe8Uaae4q aiaab+gacaqGYbGaaeOCamaaBaaaleaacaWGjbaabeaakiaaykW7ca aIOaGabmOuayaaiaWaaSbaaSqaaiaadMeaaeqaaOGaaGilaiaaysW7 caWGhbWaaSbaaSqaaiaadMeaaeqaaOGaaGykaaaa@4957@  is the finite-population correlation between R ˜ I MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaaceWGsbGbaGaadaWgaaWcbaGaamysaa qabaaaaa@33A1@  and G I , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGhbWaaSbaaSqaaiaadMeaaeqaaO Gaaiilaaaa@3441@   σ G 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacqaHdpWCdaqhaaWcbaGaam4raaqaai aaikdaaaaaaa@3539@  is the finite-population variance of G I , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGhbWaaSbaaSqaaiaadMeaaeqaaO Gaaiilaaaa@3441@  and n W MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGUbWaaSbaaSqaaiaadEfaaeqaaa aa@33BC@  is the effective sample size due to using weights (Kish, 1965)

n W = n R 1+ CV W 2 ,(2.3) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGUbWaaSbaaSqaaiaadEfaaeqaaO GaaGjbVlaaysW7cqGH9aqpcaaMe8UaaGjbVpaalaaabaGaamOBamaa BaaaleaacaWGsbaabeaaaOqaaiaaigdacaaMe8Uaey4kaSIaaGjbVl aaboeacaqGwbWaa0baaSqaaiaadEfaaeaacaaIYaaaaaaakiaaiYca caaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlaacIcacaaIYaGaaiOlai aaiodacaGGPaaaaa@5134@

with CV W MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGdbGaaeOvamaaBaaaleaacaWGxb aabeaaaaa@3468@  being the coefficient of variation (i.e., standard deviation/mean) of { W i ,iS}. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaaI7bGaam4vamaaBaaaleaacaWGPb aabeaakiaaiYcacaaMe8UaamyAaiaaysW7cqGHiiIZcaaMe8Uaam4u aiaai2hacaGGUaaaaa@3F26@

The expression (2.2) is an algebraic identity because it holds for any instances of { ( G i , R i W i ),iN }. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaadaGadeqaaiaaiIcacaWGhbWaaSbaaS qaaiaadMgaaeqaaOGaaGilaiaaysW7caWGsbWaaSbaaSqaaiaadMga aeqaaOGaam4vamaaBaaaleaacaWGPbaabeaakiaaiMcacaaISaGaaG jbVlaadMgacaaMe8UaeyicI4SaaGjbVpXvP5wqonvsaeHbmv3yPrwy GmuySXwANjxyWHwEaGabciab=5eaojaayIW7aiaawUhacaGL9baaca GGUaaaaa@522C@  Hence no model assumptions are imposed, not even the assumption that R MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGsbaaaa@3298@  (or any quantity) is random, echoing the comment by Mary Thompson, as quoted in Wu (2022), that “the sample inclusion indicator R MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGsbaaaa@3298@  is a random variable is itself an assumption”. The only requirement is that the recorded G i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGhbWaaSbaaSqaaiaadMgaaeqaaa aa@33A7@  is unchanged from the G i s MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGhbWaaSbaaSqaaiaadMgaaeqaaG qaaOGaa8xgGiaabohaaaa@356A@  in the target population. (But note this requirement has two components: (1) there is no over-coverage, that is, everyone in the sample belongs to the target population, e.g., no non-eligible voters are surveyed when the target population is eligible voters, and (2) there is no measurement error; extensions to the cases with measurement errors are available, but not pursued in this article.) When we use equal weights, the three factors on the right-hand side of (2.2) reflect respectively (from left to right) data defect, data sparsity, and problem difficulty, as detailed in Meng (2018) and further illustrated in Bradley, Kuriwaki, Isakov, Sejdinovic, Meng and Flaxman (2021) in the context of COVID-19 vaccination surveys.

In particular, when all weights are equal, ρ R ˜ ,G MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacqaHbpGCdaWgaaWcbaGabmOuayaaia GaaiilaiaaykW7caWGhbaabeaaaaa@379A@  is termed as data defect correlation (ddc) in Meng (2018) because it measures the lack of representativeness of the sample via capturing the dependence of inclusion/recording indicator on the attributes ‒ the higher the dependence, the more biased the sample average becomes for estimating population averages. With the basic strategies of probabilistic sampling or inverse probability weighting, ddc will be zero on average because E( W i R i )=1, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGfbGaaGPaVlaaiIcacaWGxbWaaS baaSqaaiaadMgaaeqaaOGaamOuamaaBaaaleaacaWGPbaabeaakiaa iMcacaaMe8Uaeyypa0JaaGjbVlaaigdacaGGSaaaaa@3EFF@  and it is of O p ( N 1/2 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGpbWaaSbaaSqaaiaadchaaeqaaO GaaGPaVlaaiIcacaWGobWaaWbaaSqabeaacqGHsisldaWcgaqaaiaa igdaaeaacaaIYaaaaaaakiaaiMcaaaa@3A34@  order because it is essentially an average of N MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGobaaaa@3294@  independent terms (Meng, 2018). Our general goal here therefore is to bring down ddc to O p ( N 1/2 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGpbWaaSbaaSqaaiaadchaaeqaaO GaaGPaVlaaiIcacaWGobWaaWbaaSqabeaacqGHsisldaWcgaqaaiaa igdaaeaacaaIYaaaaaaakiaaiMcaaaa@3A34@  for non-probability samples, which we shall refer to as “miniaturizing ddc” because N 1/2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGobWaaWbaaSqabeaacqGHsislda WcgaqaaiaaigdaaeaacaaIYaaaaaaaaaa@353B@  is typically a minuscule number in practice.

When we use weights, the first term ρ R ˜ ,G MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacqaHbpGCdaWgaaWcbaGabmOuayaaia GaaiilaiaaysW7caWGhbaabeaaaaa@379C@  captures the data defect that still exists after the weighting adjustment, since no weights are perfect in practice. Identity (2.2) shows the impact of the weights on both data quality and data quantity. The impact on the nominal effective sample size n W MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGUbWaaSbaaSqaaiaadEfaaeqaaa aa@33BC@  is never positive because n W n R MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGUbWaaSbaaSqaaiaadEfaaeqaaO GaaGjbVlabgsMiJkaaysW7caWGUbWaaSbaaSqaaiaadkfaaeqaaaaa @3A8B@  as seen in (2.3). Incidentally, the exactness of (2.3) reveals that this well-known expression is in fact not an approximation (which is often attributed to Kish (1965)), but an exact formula for the reduction of the sample size due to weighting if the weighting had no impact on ddc. However, weighting can have a major positive impact on reducing the overall error by judiciously choosing weights to significantly decrease ddc, though apparently at the price of n W < n R . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGUbWaaSbaaSqaaiaadEfaaeqaaO GaaGjbVlabgYda8iaaysW7caWGUbWaaSbaaSqaaiaadkfaaeqaaOGa aiOlaaaa@3A96@  Of course, this is exactly the aim of the quasi-randomization framework, as discussed below. Most importantly, however, (2.2) leads to a unified insight about the variety of methods reviewed in Wu (2022), including an intuitive explanation of the doubly robust property, which has been receiving increased attention for integrating data sources including both probability and non-probability samples (e.g., Yang, Kim and Song, 2020).

Indeed, Zhang (2019, Section 3.1) used the first expression in (2.2) to define a unified non-parametric asymptotic (NPA) non-informativeness assumption, which requires that the numerator Cov I ( R ˜ I , G I ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGdbGaae4BaiaabAhadaWgaaWcba GaamysaaqabaGccaaMc8UaaGikaiqadkfagaacamaaBaaaleaacaWG jbaabeaakiaaiYcacaaMe8Uaam4ramaaBaaaleaacaWGjbaabeaaki aaiMcaaaa@3E63@  goes to zero, while keeping the denominator E I [ R ˜ I ] MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaqGfbWaaSbaaSqaaiaadMeaaeqaaO GaaGPaVlaaiUfaceWGsbGbaGaadaWgaaWcbaGaamysaaqabaGccaaI Dbaaaa@38CE@  positive, as N. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8srps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGabiWadaaakeaacaWGobGaaGjbVlabgkziUkaaysW7cq GHEisPcaGGUaaaaa@39BE@  This unification permits Zhang (2019) to evaluate the quasi-randomization approach and regression modeling via a common criterion. The ddc framework echoes this unification, as discussed in Section 3 below, with Section 4 stressing the same broad message as emphasized by Zhang (2019). Section 5 harvests another low-hanging fruit of the ddc formulation, since it provides an immediate explanation of the celebrated double robustness. Section 6 then ventures into a much harder area of engineering a more representative sub-sample out of a large non-representative sample, a worthwhile trade-off because data quality is far more important than data quantity (Meng, 2018), as briefly reviewed below.


Date modified: