Statistical inference with non-probability survey samples
Section 3. Model-based prediction approach

Model-based prediction methods for finite population parameters require two critical ingredients: the amount of auxiliary information that is available at the estimation stage and the reliability of the assumed model for inference. In the absence of any auxiliary information, the common mean model E ξ ( y i ) = μ 0 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGfbWaaSbaaSqaaiabe67a4bqaba GcdaqadaqaaiaadMhadaWgaaWcbaGaamyAaaqabaaakiaawIcacaGL PaaacaaMe8UaaGypaiaaysW7cqaH8oqBdaWgaaWcbaGaaGimaaqaba GccaGGSaaaaa@3F52@ V ξ ( y i ) = σ 2 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGwbWaaSbaaSqaaiabe67a4bqaba GcdaqadeqaaiaadMhadaWgaaWcbaGaamyAaaqabaaakiaawIcacaGL PaaacaaMe8UaaGypaiaaysW7cqaHdpWCdaahaaWcbeqaaiaaikdaaa GccaGGSaaaaa@3F74@ i = 1 , , N MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGPbGaaGjbVlaai2dacaaMe8UaaG ymaiaacYcacaaMe8UaeSOjGSKaaGilaiaaysW7caWGobaaaa@3DAC@ may be viewed as reasonable but the model-based prediction estimator μ ^ y = y ¯ A = n A 1 i S A y i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b aabeaakiaaysW7caaI9aGaaGjbVlqadMhagaqeamaaBaaaleaacaWG bbaabeaakiaaysW7caaI9aGaaGjbVlaad6gadaqhaaWcbaGaamyqaa qaaiabgkHiTiaaigdaaaGcdaaeqaqaaiaadMhadaWgaaWcbaGaamyA aaqabaaabaGaamyAaiabgIGiolaadofadaWgaaadbaGaamyqaaqaba aaleqaniabggHiLdGccaGGSaaaaa@4B05@ although unbiased under the model since E ξ ( y ¯ A μ y ) = 0 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGfbWaaSbaaSqaaiabe67a4bqaba GcdaqadaqaaiqadMhagaqeamaaBaaaleaacaWGbbaabeaakiaaysW7 cqGHsislcaaMe8UaeqiVd02aaSbaaSqaaiaadMhaaeqaaaGccaGLOa GaayzkaaGaaGjbVlaai2dacaaMe8UaaGimaiaacYcaaaa@4447@ is generally not an acceptable estimator of μ y . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH8oqBdaWgaaWcbaGaamyEaaqaba GccaGGUaaaaa@3549@ The variance σ 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaHdpWCdaahaaWcbeqaaiaaikdaaa aaaa@3459@ for the common mean model is typically large and it renders the estimator μ ^ y = y ¯ A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b aabeaakiaaysW7caaI9aGaaGjbVlqadMhagaqeamaaBaaaleaamiaa dgeaaOqabaaaaa@3AA6@ with a prediction variance that is too large to be practically useful.

3.1  Semiparametric outcome regression models

Without loss of generality, we assume that x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWH4baaaa@32AE@ contains 1 as its first component corresponding to the intercept of a regression model. Under the setting described in Section 2, we consider the following semiparametric model for the finite population, denoted as ξ : MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH+oaEcaaMc8UaaiOoaaaa@35B9@

E ξ ( y i | x i ) = m ( x i , β ) , and V ξ ( y i | x i ) = v ( x i ) σ 2 , i = 1, 2, , N , ( 3.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGfbWaaSbaaSqaaiabe67a4bqaba GcdaqadaqaaiaadMhadaWgaaWcbaGaamyAaaqabaGccaaMe8+aaqqa aeaacaaMe8UaaCiEamaaBaaaleaacaWGPbaabeaaaOGaay5bSdaaca GLOaGaayzkaaGaaGjbVlaaysW7caaI9aGaaGjbVlaaysW7caWGTbGa aGPaVpaabmaabaGaaCiEamaaBaaaleaacaWGPbaabeaakiaaiYcaca aMe8UaaCOSdaGaayjkaiaawMcaaiaacYcacaaMf8Uaaeyyaiaab6ga caqGKbGaaGzbVlaadAfadaWgaaWcbaGaeqOVdGhabeaakmaabmaaba GaamyEamaaBaaaleaacaWGPbaabeaakiaaysW7daabbeqaaiaaysW7 caWH4bWaaSbaaSqaaiaadMgaaeqaaaGccaGLhWoaaiaawIcacaGLPa aacaaMe8UaaGjbVlaai2dacaaMe8UaaGjbVlaadAhacaaMe8+aaeWa aeaacaWH4bWaaSbaaSqaaiaadMgaaeqaaaGccaGLOaGaayzkaaGaaG jbVlabeo8aZnaaCaaaleqabaGaaGOmaaaakiaayIW7caaISaGaaGzb VlaadMgacaaMe8UaaGypaiaaysW7caaIXaGaaGilaiaaysW7caaIYa GaaGilaiaaysW7cqWIMaYscaGGSaGaaGjbVlaad6eacaGGSaGaaGzb VlaaywW7caaMf8UaaGzbVlaaywW7caGGOaGaaG4maiaac6cacaaIXa Gaaiykaaaa@93AF@

where the mean function m ( , ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGTbGaaGPaVlaacIcacqGHflY1ca aMc8UaaGilaiaaykW7cqGHflY1caGGPaaaaa@3DE3@ and the variance function v ( ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG2bGaaGPaVlaacIcacqGHflY1ca GGPaaaaa@37D6@ have known forms, and the y i s MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaaSbaaSqaaiaadMgaaeqaaG qaaOGaa8xgGiaabohaaaa@3588@ are also assumed to be conditionally independent given the x i s . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWH4bWaaSbaaSqaaiaadMgaaeqaaG qaaOGaa8xgGiaabohacaqGUaaaaa@363C@ Let β 0 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWHYoWaaSbaaSqaaiaaicdaaeqaaa aa@33D1@ and σ 0 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaHdpWCdaqhaaWcbaGaaGimaaqaai aaikdaaaaaaa@3513@ be the true values of the model parameters β MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWHYoaaaa@32EB@ and σ 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaHdpWCdaahaaWcbeqaaiaaikdaaa aaaa@3459@ under the assumed model. The first major implication of assumption A1 is that E ξ ( y i | x i , R i = 1 ) = E ξ ( y i | x i ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGfbWaaSbaaSqaaiabe67a4bqaba GcdaqadaqaaiaadMhadaWgaaWcbaGaamyAaaqabaGccaaMe8+aaqqa aeaacaaMe8UaaCiEamaaBaaaleaacaWGPbaabeaakiaacYcaaiaawE a7aiaaysW7caWGsbWaaSbaaSqaaiaadMgaaeqaaOGaaGjbVlaai2da caaMe8UaaGymaaGaayjkaiaawMcaaiaaysW7caaI9aGaaGjbVlaadw eadaWgaaWcbaGaeqOVdGhabeaakmaabmaabaGaamyEamaaBaaaleaa caWGPbaabeaakiaaysW7daabbaqaaiaaysW7caWH4bWaaSbaaSqaai aadMgaaeqaaaGccaGLhWoaaiaawIcacaGLPaaaaaa@58E4@ and V ξ ( y i | x i , R i = 1 ) = V ξ ( y i | x i ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGwbWaaSbaaSqaaiabe67a4bqaba GcdaqadaqaaiaadMhadaWgaaWcbaGaamyAaaqabaGccaaMe8+aaqqa aeaacaaMe8UaaCiEamaaBaaaleaacaWGPbaabeaakiaacYcaaiaawE a7aiaaysW7caWGsbWaaSbaaSqaaiaadMgaaeqaaOGaaGjbVlaai2da caaMe8UaaGymaaGaayjkaiaawMcaaiaaysW7caaI9aGaaGjbVlaadA fadaWgaaWcbaGaeqOVdGhabeaakmaabmaabaGaamyEamaaBaaaleaa caWGPbaabeaakiaaysW7daabbaqaaiaaysW7caWH4bWaaSbaaSqaai aadMgaaeqaaaGccaGLhWoaaiaawIcacaGLPaaacaGGUaaaaa@59B8@ The model (3.1) which is assumed for the finite population also holds for the units in the non-probability survey sample S A . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGtbWaaSbaaSqaaiaadgeaaeqaaO GaaiOlaaaa@3433@ The quasi maximum likelihood estimator β ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWHYoGbaKaaaaa@32FB@ of β 0 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWHYoWaaSbaaSqaaiaaicdaaeqaaa aa@33D1@ is obtained using the dataset { ( y i , x i ) , i S A } MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaadaGadaqaaiaacIcacaWG5bWaaSbaaS qaaiaadMgaaeqaaOGaaGilaiaaysW7caWH4bWaaSbaaSqaaiaadMga aeqaaOGaaiykaiaacYcacaaMe8UaamyAaiaaysW7cqGHiiIZcaaMe8 Uaam4uamaaBaaaleaacaWGbbaabeaaaOGaay5Eaiaaw2haaaaa@455E@ from the non-probability survey sample as the solution to the quasi score equations (McCullagh and Nelder, 1989) given by

S ( β ) = i S A m ( x i , β ) β { v ( x i ) } 1 { y i m ( x i , β ) } = 0 . ( 3.2 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaqGtbWaaeWaaeaacaWHYoaacaGLOa GaayzkaaGaaGjbVlaaysW7caaI9aGaaGjbVlaaysW7daaeqbqaaiaa ysW7daWcaaqaaiabgkGi2kaad2gacaaMc8+aaeWaaeaacaWH4bWaaS baaSqaaiaadMgaaeqaaOGaaGilaiaaysW7caWHYoaacaGLOaGaayzk aaaabaGaeyOaIyRaaCOSdaaacaaMe8oaleaacaWGPbGaeyicI4Saam 4uamaaBaaameaacaWGbbaabeaaaSqab0GaeyyeIuoakmaacmaabaGa amODaiaaykW7caaIOaGaaCiEamaaBaaaleaacaWGPbaabeaakiaaiM caaiaawUhacaGL9baadaahaaWcbeqaaiabgkHiTiaaigdaaaGcdaGa daqaaiaadMhadaWgaaWcbaGaamyAaaqabaGccaaMe8UaeyOeI0IaaG jbVlaad2gacaaMc8UaaiikaiaahIhadaWgaaWcbaGaamyAaaqabaGc caaISaGaaGjbVlaahk7acaGGPaaacaGL7bGaayzFaaGaaGjbVlaai2 dacaaMe8UaaCimaiaac6cacaaMf8UaaGzbVlaaywW7caaMf8UaaGzb VlaacIcacaaIZaGaaiOlaiaaikdacaGGPaaaaa@8025@

The semiparametric model (3.1) can be extended to replace v ( x i ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG2bGaaGPaVlaacIcacaWH4bWaaS baaSqaaiaadMgaaeqaaOGaaiykaaaa@37B1@ by a general variance function v ( μ i ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG2bGaaGPaVlaacIcacqaH8oqBda WgaaWcbaGaamyAaaqabaGccaGGPaaaaa@3866@ where μ i = m ( x i , β ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH8oqBdaWgaaWcbaGaamyAaaqaba GccaaMe8UaaGypaiaaysW7caWGTbGaaGPaVlaacIcacaWH4bWaaSba aSqaaiaadMgaaeqaaOGaaGilaiaaysW7caWHYoGaaiykaiaac6caaa a@4296@ The quasi maximum likelihood estimation theory covers linear or nonlinear regression models with the weighted least square estimators, the logistic regression model and other generalized linear models. Let m i = m ( x i , β 0 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGTbWaaSbaaSqaaiaadMgaaeqaaO GaaGjbVlaai2dacaaMe8UaamyBaiaaykW7caGGOaGaaCiEamaaBaaa leaacaWGPbaabeaakiaaiYcacaaMi8UaaGjbVlaahk7adaWgaaWcba GaaGimaaqabaGccaGGPaaaaa@43A1@ and m ^ i = m ( x i , β ^ ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWGTbGbaKaadaWgaaWcbaGaamyAaa qabaGccaaMe8UaaGypaiaaysW7caWGTbGaaGPaVlaacIcacaWH4bWa aSbaaSqaaiaadMgaaeqaaOGaaGilaiaayIW7caaMe8UabCOSdyaaja GaaiykaiaacYcaaaa@4381@ i = 1, 2, , N . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGPbGaaGjbVlaai2dacaaMe8UaaG ymaiaaiYcacaaMe8UaaGOmaiaaiYcacaaMe8UaeSOjGSKaaGilaiaa ysW7caWGobGaaiOlaaaa@4163@

3.2  Two general forms of prediction estimators

There are two commonly used model-based prediction estimators for μ y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH8oqBlmaaBaaabaGaamyEaaqaba aaaa@348D@ in the presence of complete auxiliary information { x 1 , , x N } ; MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaaMi8+aaiWaaeaacaWH4bWaaSbaaS qaaiaaigdaaeqaaOGaaiilaiaaysW7cqWIMaYscaaISaGaaGjbVlaa yIW7caWH4bWaaSbaaSqaaiaad6eaaeqaaaGccaGL7bGaayzFaaGaai 4oaaaa@415D@ see Chapter 5 of Wu and Thompson (2020). Note that E ξ ( μ y ) = N 1 i = 1 N m i . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGfbWaaSbaaSqaaiabe67a4bqaba GccaGGOaGaeqiVd02aaSbaaSqaaiaadMhaaeqaaOGaaiykaiaaysW7 caaI9aGaaGjbVlaad6eadaahaaWcbeqaaiabgkHiTiaaigdaaaGcda aeWaqaaiaaykW7caWGTbWaaSbaaSqaaiaadMgaaeqaaaqaaiaadMga cqGH9aqpcaaIXaaabaGaamOtaaqdcqGHris5aOGaaiOlaaaa@4912@ The two prediction estimators are constructed as

μ ^ y 1 = 1 N i = 1 N m ^ i and μ ^ y 2 = 1 N { i S A y i i S A m ^ i + i = 1 N m ^ i } . ( 3.3 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b adcaaIXaaakeqaaiaaysW7caaMe8UaaGypaiaaysW7caaMe8+aaSaa aeaacaaIXaaabaGaamOtaaaacaaMe8+aaabCaeaacaaMc8UabmyBay aajaWaaSbaaSqaaiaadMgaaeqaaaqaaiaadMgacqGH9aqpcaaIXaaa baGaamOtaaqdcqGHris5aOGaaGzbVlaabggacaqGUbGaaeizaiaayw W7cuaH8oqBgaqcamaaBaaaleaacaWG5badcaaIYaaakeqaaiaaysW7 caaMe8UaaGypaiaaysW7caaMe8+aaSaaaeaacaaIXaaabaGaamOtaa aacaaMe8+aaiWaaeaadaaeqbqaaiaaykW7caWG5bWaaSbaaSqaaiaa dMgaaeqaaaqaaiaadMgacqGHiiIZcaWGtbWaaSbaaWqaaiaadgeaae qaaaWcbeqdcqGHris5aOGaaGjbVlabgkHiTiaaysW7daaeqbqaaiaa ykW7ceWGTbGbaKaadaWgaaWcbaGaamyAaaqabaaabaGaamyAaiabgI GiolaadofadaWgaaadbaGaamyqaaqabaaaleqaniabggHiLdGccaaM e8Uaey4kaSIaaGjbVpaaqahabaGaaGPaVlqad2gagaqcamaaBaaale aacaWGPbaabeaaaeaacaWGPbGaeyypa0JaaGymaaqaaiaad6eaa0Ga eyyeIuoaaOGaay5Eaiaaw2haaiaac6cacaaMf8UaaGzbVlaaywW7ca aMf8UaaGzbVlaacIcacaaIZaGaaiOlaiaaiodacaGGPaaaaa@9076@

The estimator μ ^ y 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b adcaaIYaaakeqaaaaa@356F@ is built based on μ y = N 1 { i S A y i + i S A y i } MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH8oqBdaWgaaWcbaGaamyEaaqaba GccaaMe8Uaeyypa0JaaGjbVlaad6eadaahaaWcbeqaaiabgkHiTiaa igdaaaGcdaGadaqaamaaqababaGaaGPaVlaadMhadaWgaaWcbaGaam yAaaqabaaabaGaamyAaiabgIGiolaadofadaWgaaadbaGaamyqaaqa baaaleqaniabggHiLdGccaaMe8Uaey4kaSIaaGjbVpaaqababaGaam yEamaaBaaaleaacaWGPbaabeaaaeaacaWGPbGaeyycI8Saam4uamaa BaaameaacaWGbbaabeaaaSqab0GaeyyeIuoaaOGaay5Eaiaaw2haaa aa@53A7@ and uses i S A m ^ i = i = 1 N m ^ i i S A m ^ i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaadaaeqaqaaiaaykW7ceWGTbGbaKaada WgaaWcbaGaamyAaaqabaaabaGaamyAaiabgMGiplaadofadaWgaaad baGaamyqaaqabaaaleqaniabggHiLdGccaaMe8UaaGypaiaaysW7da aeWaqaaiaaykW7ceWGTbGbaKaadaWgaaWcbaGaamyAaaqabaaabaGa amyAaiabg2da9iaaigdaaeaacaWGobaaniabggHiLdGccaaMe8Uaey OeI0IaaGjbVpaaqababaGaaGPaVlqad2gagaqcamaaBaaaleaacaWG PbaabeaaaeaacaWGPbGaeyicI4Saam4uamaaBaaameaacaWGbbaabe aaaSqab0GaeyyeIuoaaaa@5659@ to predict the unobserved term i S A y i . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaadaaeqaqaaiaadMhadaWgaaWcbaGaam yAaaqabaaabaGaamyAaiabgMGiplaadofadaWgaaadbaGaamyqaaqa baaaleqaniabggHiLdGccaGGUaaaaa@3AA3@ Under a linear regression model where m ( x , β ) = x β , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGTbGaaGPaVpaabmaabaGaaCiEai aacYcacaaMe8UaaCOSdaGaayjkaiaawMcaaiaayIW7caaMe8Uaeyyp a0JaaGjbVlqahIhagaqbaiaahk7acaGGSaaaaa@42DB@ the two estimators given in (3.3) reduce to

μ ^ y1 = μ x β ^ and μ ^ y2 = n A N ( y ¯ A x ¯ A β ^ )+ μ x β ^ ,(3.4) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b adcaaIXaaakeqaaiaaysW7caaMe8UaaGypaiaaysW7caaMe8UaeqiV d02aa0baaSqaaiaahIhaaeaaiiaacqWFYaIOaaGcceWHYoGbaKaaca aMi8UaaGzbVlaabggacaqGUbGaaeizaiaaywW7cuaH8oqBgaqcamaa BaaaleaacaWG5badcaaIYaaakeqaaiaaysW7caaMe8UaaGypaiaays W7caaMe8+aaSaaaeaacaWGUbWaaSbaaSqaaiaadgeaaeqaaaGcbaGa amOtaaaacaaMe8+aaeWaaeaaceWG5bGbaebadaWgaaWcbaGaamyqaa qabaGccaaMe8UaeyOeI0IaaGjbVlqahIhagaqeamaaDaaaleaacaWG bbaabaGae8NmGikaaOGabCOSdyaajaaacaGLOaGaayzkaaGaaGjbVl abgUcaRiaaysW7cqaH8oqBdaqhaaWcbaGaaCiEaaqaaiab=jdiIcaa kiqahk7agaqcaiaacYcacaaMf8UaaGzbVlaaywW7caaMf8UaaGzbVl aacIcacaaIZaGaaiOlaiaaisdacaGGPaaaaa@7AD8@

where μ x = N 1 i = 1 N x i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH8oqBdaWgaaWcbaGaaCiEaaqaba GccaaMe8UaaGypaiaaysW7caWGobWaaWbaaSqabeaacqGHsislcaaI XaaaaOWaaabmaeaacaaMc8UaaCiEamaaBaaaleaacaWGPbaabeaaae aacaWGPbGaeyypa0JaaGymaaqaaiaad6eaa0GaeyyeIuoaaaa@444C@ is the vector of the population means of the x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWH4baaaa@32AE@ variables and x ¯ A = n A 1 i S A x i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWH4bGbaebadaWgaaWcbaGaamyqaa qabaGccaaMe8Uaeyypa0JaaGjbVlaad6gadaqhaaWcbaGaamyqaaqa aiabgkHiTiaaigdaaaGcdaaeqaqaaiaahIhadaWgaaWcbaGaamyAaa qabaaabaGaamyAaiabgIGiolaadofadaWgaaadbaGaamyqaaqabaaa leqaniabggHiLdaaaa@43B5@ is the vector of the simple sample means of x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWH4baaaa@32AE@ from the non-probability sample S A . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGtbWaaSbaaSqaaiaadgeaaeqaaO GaaiOlaaaa@3433@ If the linear regression model contains an intercept and β ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWHYoGbaKaaaaa@32FB@ is the ordinary least square estimator, we have μ ^ y2 = μ ^ y1 = μ x β ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b adcaaIYaaakeqaaiaaysW7caaI9aGaaGjbVlqbeY7aTzaajaWaaSba aSqaaiaadMhamiaaigdaaOqabaGaaGjbVlaai2dacaaMe8UaeqiVd0 2aa0baaSqaaiaahIhaaeaaiiaacqWFYaIOaaGcceWHYoGbaKaaaaa@46B1@ since y ¯ A x ¯ A β ^ =0 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWG5bGbaebadaWgaaWcbaGaamyqaa qabaGccaaMe8UaeyOeI0IaaGjbVlqahIhagaqeamaaDaaaleaacaWG bbaabaaccaGae8NmGikaaOGabCOSdyaajaGaaGjbVlaai2dacaaMe8 UaaGimaaaa@4148@ due to the zero sum of fitted residuals. The prediction estimators in (3.4) under a linear model only require the population means μ x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH8oqBdaWgaaWcbaGaaCiEaaqaba aaaa@3490@ in addition to the non-probability sample S A . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGtbWaaSbaaSqaaiaadgeaaeqaaO GaaiOlaaaa@3433@ Under the setting described in Section 2 with auxiliary information on x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWH4baaaa@32AE@ provided through a reference probability sample S B , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGtbWaaSbaaSqaaiaadkeaaeqaaO Gaaiilaaaa@3432@ we simply replace i = 1 N m ^ i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaadaaeWaqaaiaaykW7ceWGTbGbaKaada WgaaWcbaGaamyAaaqabaaabaGaamyAaiabg2da9iaaigdaaeaacaWG obaaniabggHiLdaaaa@3ACD@ by i S B d i B m ^ i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaadaaeqaqaaiaadsgadaqhaaWcbaGaam yAaaqaaiaadkeaaaGcceWGTbGbaKaadaWgaaWcbaGaamyAaaqabaaa baGaamyAaiabgIGiolaadofadaWgaaadbaGaamOqaaqabaaaleqani abggHiLdaaaa@3CBF@ for the estimators in (3.3) and substitute μ x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH8oqBdaWgaaWcbaGaaCiEaaqaba aaaa@3490@ by μ ^ x = N ^ B 1 i S B d i B x i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWH4b aabeaakiaaysW7caaI9aGaaGjbVlqad6eagaqcamaaDaaaleaacaWG cbaabaGaeyOeI0IaaGymaaaakmaaqababaGaamizamaaDaaaleaaca WGPbaabaGaamOqaaaakiaahIhadaWgaaWcbaGaamyAaaqabaaabaGa amyAaiabgIGiolaadofadaWgaaadbaGaamOqaaqabaaaleqaniabgg HiLdaaaa@4725@ for the estimators in (3.4), where N ^ B = i S B d i B . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWGobGbaKaadaWgaaWcbaGaamOqaa qabaGccaaMe8UaaGypaiaaysW7daaeqaqaaiaadsgadaqhaaWcbaGa amyAaaqaaiaadkeaaaaabaGaamyAaiabgIGiolaadofadaWgaaadba GaamOqaaqabaaaleqaniabggHiLdGccaGGUaaaaa@4116@ The population size N MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGobaaaa@3280@ appearing in (3.3) or (3.4) should also be replaced by N ^ B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWGobGbaKaadaWgaaWcbaGaamOqaa qabaaaaa@3383@ even if it is known.

3.3  Mass imputation

Model-based prediction estimators of μ y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH8oqBdaWgaaWcbaGaamyEaaqaba aaaa@348D@ using a non-probability survey sample on ( y , x ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaGGOaGaamyEaiaaiYcacaaMe8UaaC iEaiaacMcaaaa@3748@ and a reference probability survey sample on x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWH4baaaa@32AE@ have traditionally been presented as the mass imputation estimator. The study variable y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5baaaa@32AB@ is not observed for any units in the reference survey sample S B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGtbWaaSbaaSqaaiaadkeaaeqaaa aa@3378@ and hence can be viewed as missing for all i S B . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGPbGaaGjbVlabgIGiolaaysW7ca WGtbWaaSbaaSqaaiaadkeaaeqaaOGaaiOlaaaa@39C0@ Let y i * MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaa0baaSqaaiaadMgaaeaaca GGQaaaaaaa@3474@ be an imputed value for y i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaaSbaaSqaaiaadMgaaeqaaO Gaaiilaaaa@347F@ i S B . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGPbGaaGjbVlabgIGiolaaysW7ca WGtbWaaSbaaSqaaiaadkeaaeqaaOGaaiOlaaaa@39C0@ The mass imputation estimator of μ y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH8oqBdaWgaaWcbaGaamyEaaqaba aaaa@348D@ is then constructed as

μ ^ y MI = 1 N ^ B i S B d i B y i * , ( 3.5 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b GaaeytaiaabMeaaeqaaOGaaGjbVlaaysW7caaI9aGaaGjbVlaaysW7 daWcaaqaaiaaigdaaeaaceWGobGbaKaadaWgaaWcbaGaamOqaaqaba aaaOWaaabuaeaacaaMc8UaamizamaaDaaaleaacaWGPbaabaGaamOq aaaakiaadMhadaqhaaWcbaGaamyAaaqaaiaacQcaaaaabaGaamyAai abgIGiolaadofadaWgaaadbaGaamOqaaqabaaaleqaniabggHiLdGc caGGSaGaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caGGOaGaaG4mai aac6cacaaI1aGaaiykaaaa@5977@

where N ^ B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWGobGbaKaadaWgaaWcbaGaamOqaa qabaaaaa@3383@ is defined as before and the subscript “MI” indicates “Mass Imputation” (not “Multiple Imputation”). Under the deterministic regression imputation where y i * = x i β ^ , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaa0baaSqaaiaadMgaaeaaca GGQaaaaOGaaGjbVlaai2dacaaMe8UaaCiEamaaDaaaleaacaWGPbaa baaccaGae8NmGikaaOGabCOSdyaajaGaaiilaaaa@3E06@ the estimator μ ^ y MI MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b GaaeytaiaabMeaaeqaaaaa@3639@ reduces to the model-based prediction estimator μ ^ x β ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaDaaaleaacaWH4b aabaaccaGae8NmGikaaOGabCOSdyaajaaaaa@377C@ as discussed in Section 3.2.

The mass imputation approach to analyzing non-probability survey samples has the same spirit as model-based prediction methods but it opens the door for using more flexible models and imputation techniques that have been developed in the existing literature on missing data problems. The approach was first examined by Rivers (2007) through the so-called sample matching method. For each i S B , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGPbGaaGjbVlabgIGiolaaysW7ca WGtbWaaSbaaSqaaiaadkeaaeqaaOGaaiilaaaa@39BE@ the “missing” y i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaaSbaaSqaaiaadMgaaeqaaa aa@33C5@ is imputed as y i * = y j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaa0baaSqaaiaadMgaaeaaca GGQaaaaOGaaGjbVlabg2da9iaaysW7caWG5bWaaSbaaSqaaiaadQga aeqaaaaa@3AB7@ for some j S A , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGQbGaaGjbVlabgIGiolaaysW7ca WGtbWaaSbaaSqaaiaadgeaaeqaaOGaaiilaaaa@39BE@ where j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGQbaaaa@329C@ is a matching donor from S A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGtbWaaSbaaSqaaiaadgeaaeqaaa aa@3377@ selected through the nearest neighbor method as measured by the distance between x i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWH4bWaaSbaaSqaaiaadMgaaeqaaa aa@33C8@ and x j . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWH4bWaaSbaaSqaaiaadQgaaeqaaO GaaiOlaaaa@3485@ The underlying model ξ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH+oaEaaa@3370@ for the nearest neighbor imputation method is nonparametric, i.e., E ξ ( y i | x i ) = m ( x i ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGfbWaaSbaaSqaaiabe67a4bqaba GcdaqadaqaaiaadMhadaWgaaWcbaGaamyAaaqabaGccaaMe8+aaqqa aeaacaaMe8UaaCiEamaaBaaaleaacaWGPbaabeaaaOGaay5bSdaaca GLOaGaayzkaaGaaGjbVlabg2da9iaaysW7caWGTbGaaGPaVpaabmaa baGaaCiEamaaBaaaleaacaWGPbaabeaaaOGaayjkaiaawMcaaaaa@4939@ for some unknown function m ( ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGTbGaaGPaVlaacIcacqGHflY1ca GGPaGaaiOlaaaa@387F@ The matching value y j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaaSbaaSqaaiaadQgaaeqaaa aa@33C6@ can be viewed as the predicted value of the missing y i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaaSbaaSqaaiaadMgaaeqaaa aa@33C5@ under the model. Theoretical properties of estimators based on nearest neighbor imputation were discussed by Chen and Shao (2000, 2001) for missing survey data problems.

The semiparametric model (3.1) can be used for deterministic regression mass imputation. Under assumption A1, a consistent estimator β ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWHYoGbaKaaaaa@32FB@ of β MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWHYoaaaa@32EB@ is first obtained from the non-probability sample dataset { ( y i , x i ) , i S A } , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaadaGadaqaaiaacIcacaWG5bWaaSbaaS qaaiaadMgaaeqaaOGaaGilaiaaysW7caWH4bWaaSbaaSqaaiaadMga aeqaaOGaaiykaiaaiYcacaaMe8UaamyAaiaaysW7cqGHiiIZcaaMe8 Uaam4uamaaBaaaleaacaWGbbaabeaaaOGaay5Eaiaaw2haaiaacYca aaa@4614@ and the estimator β ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWHYoGbaKaaaaa@32FB@ is then used to compute the imputed values y i * = m ( x i , β ^ ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaa0baaSqaaiaadMgaaeaaca GGQaaaaOGaaGjbVlabg2da9iaaysW7caWGTbGaaGPaVpaabmaabaGa aCiEamaaBaaaleaacaWGPbaabeaakiaaiYcacaaMe8UabCOSdyaaja aacaGLOaGaayzkaaaaaa@425A@ for i S B . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGPbGaaGjbVlabgIGiolaaysW7ca WGtbWaaSbaaSqaaiaadkeaaeqaaOGaaiOlaaaa@39C0@ In other words, the assumption A1 implies the so-called model transportability by Kim, Park, Chen and Wu (2021): the model which is built for the non-probability sample can be used for prediction with the reference probability sample. The resulting mass imputation estimator μ ^ y MI MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b GaaeytaiaabMeaaeqaaaaa@3639@ is identical to one of the model-based prediction estimators presented in Section 3.2. Asymptotic properties and variance estimation for the estimator μ ^ y MI MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b GaaeytaiaabMeaaeqaaaaa@3639@ using the semiparametric model (3.1) were discussed by Kim et al. (2021).

Under the mass imputation approach, the only role played by the observed y i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaaSbaaSqaaiaadMgaaeqaaa aa@33C5@ for i S A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGPbGaaGjbVlabgIGiolaaysW7ca WGtbWaaSbaaSqaaiaadgeaaeqaaaaa@3903@ is to estimate the model parameters β . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWHYoGaaiOlaaaa@339D@ The estimator μ ^ y MI MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b GaaeytaiaabMeaaeqaaaaa@3639@ is constructed using the fitted model and auxiliary information from the reference probability sample S B . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGtbWaaSbaaSqaaiaadkeaaeqaaO GaaiOlaaaa@3434@ It seems that we did not fully use the information on the observed y i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaaSbaaSqaaiaadMgaaeqaaa aa@33C5@ given that μ y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH8oqBdaWgaaWcbaGaamyEaaqaba aaaa@348D@ is the main parameter of interest. This led to the research question described in Chapter 17 of Wu and Thompson (2020) on “reverse sample matching”. The proposed estimator is constructed as μ ^ y A = ( N ^ * ) 1 i S A d i * y i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b GaaeyqaaGcbeaacaaMe8Uaeyypa0JaaGjbVlaacIcaceWGobGbaKaa daahaaWcbeqaaiaacQcaaaGccaGGPaWaaWbaaSqabeaacqGHsislca aIXaaaaOWaaabeaeaacaWGKbWaa0baaSqaaiaadMgaaeaacaGGQaaa aOGaamyEamaaBaaaleaacaWGPbaabeaaaeaacaWGPbGaeyicI4Saam 4uamaaBaaameaacaWGbbaabeaaaSqab0GaeyyeIuoaaaa@497F@ using all the observed y i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5bWaaSbaaSqaaiaadMgaaeqaaa aa@33C5@ in the non-probability sample, where N ^ * = i S A d i * . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWGobGbaKaadaahaaWcbeqaaiaacQ caaaGccaaMe8UaaGypaiaaysW7daaeqaqabSqaaiaadMgacqGHiiIZ caWGtbWaaSbaaWqaaiaadgeaaeqaaaWcbeqdcqGHris5aOGaaGPaVl aadsgadaqhaaWcbaGaamyAaaqaaiaacQcaaaGccaGGUaaaaa@4285@ The d i * MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGKbWaa0baaSqaaiaadMgaaeaaca GGQaaaaaaa@345F@ is a matched survey weight from S B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGtbWaaSbaaSqaaiaadkeaaeqaaa aa@3378@ such that d i * = d j B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGKbWaa0baaSqaaiaadMgaaeaaca GGQaaaaOGaaGjbVlabg2da9iaaysW7caWGKbWaa0baaSqaaiaadQga aeaacaWGcbaaaaaa@3B55@ with j S B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGQbGaaGjbVlabgIGiolaaysW7ca WGtbWaaSbaaSqaaiaadkeaaeqaaaaa@3905@ being the nearest neighbor of i S A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGPbGaaGjbVlabgIGiolaaysW7ca WGtbWaaSbaaSqaaiaadgeaaeqaaaaa@3903@ as measured by x i x j . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaadaqbdeqaaiaaykW7caWH4bWaaSbaaS qaaiaadMgaaeqaaOGaaGjbVlabgkHiTiaaysW7caWH4bWaaSbaaSqa aiaadQgaaeqaaOGaaGPaVdGaayzcSlaawQa7aiaac6caaaa@40EF@ Theoretical properties of the reverse matched estimator μ ^ y A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b GaamyqaaGcbeaaaaa@356D@ using the nearest neighbor j S B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGQbGaaGjbVlabgIGiolaaysW7ca WGtbWaaSbaaSqaaiaadkeaaeqaaaaa@3905@ to match d i * MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGKbWaa0baaSqaaiaadMgaaeaaca GGQaaaaaaa@345F@ with d j B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGKbWaa0baaSqaaiaadQgaaeaaca WGcbaaaaaa@3479@ have not been formally investigated in the existing literature.

Wang, Graubard, Katki and Li (2020) proposed a kernel weighting approach to reverse sample matching using d i * j S B K i j d j , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGKbWaa0baaSqaaiaadMgaaeaaca GGQaaaaOGaaGjbVlabg2Hi1kaaysW7daaeqaqaaiaadUeadaWgaaWc baGaamyAaiaadQgaaeqaaOGaamizamaaBaaaleaacaWGQbaabeaaae aacaWGQbGaeyicI4Saam4uamaaBaaameaacaWGcbaabeaaaSqab0Ga eyyeIuoakiaacYcaaaa@44C6@ where K i j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGlbWaaSbaaSqaaiaadMgacaWGQb aabeaaaaa@3486@ is a kernel distance between p ^ i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWGWbGbaKaadaWgaaWcbaGaamyAaa qabaaaaa@33CC@ and p ^ j ; MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWGWbGbaKaadaWgaaWcbaGaamOAaa qabaGccaGG7aaaaa@3496@ see the adjusted logistic propensity (ALP) weighting method discussed at the end of Section 4.1.1 on the calculation of p ^ i . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaaceWGWbGbaKaadaWgaaWcbaGaamyAaa qabaGccaGGUaaaaa@3488@ They showed that the estimator μ ^ y A MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacuaH8oqBgaqcamaaBaaaleaacaWG5b GaaeyqaaGcbeaaaaa@356B@ is consistent under certain regularity conditions. In a recent working paper posted on arXiv by Liu and Valliant (2021), the authors discussed issues with the bias and the variance of the reverse matched estimator under different randomization frameworks involving one, two or all three of the sources ( p , q , ξ ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaadaqadaqaaiaadchacaaISaGaaGjbVl aadghacaaISaGaaGjbVlabe67a4bGaayjkaiaawMcaaiaac6caaaa@3C1C@ The authors also proposed a calibration step over the matched weights, which seems to be a promising idea. Further research on this topic is needed.

The mass imputation approach to analyzing non-probability survey samples leads to an interesting research question that is currently under investigation by a doctoral student at University of Waterloo: Is it theoretically feasible and practically useful to create a mass-imputed dataset { ( y i * , x i , d i B ) , i S B } MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaadaGadaqaaiaacIcacaWG5bWaa0baaS qaaiaadMgaaeaacaGGQaaaaOGaaGilaiaaysW7caWH4bWaaSbaaSqa aiaadMgaaeqaaOGaaGilaiaaysW7caWGKbWaa0baaSqaaiaadMgaae aacaWGcbaaaOGaaiykaiaaiYcacaaMe8UaamyAaiaaysW7cqGHiiIZ caaMe8Uaam4uamaaBaaaleaacaWGcbaabeaaaOGaay5Eaiaaw2haaa aa@4B2C@ based on the reference probability survey sample that can be used for general statistical inferences? The answer clearly depends on the types of inferential problems to be conducted over the imputed dataset. A minimum requirement is that the conditional distribution of the study variable y MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG5baaaa@32AB@ given the covariates x MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWH4baaaa@32AE@ is preserved for the mass-imputed dataset. The nearest neighbor imputation method and the random regression imputation method can be useful for this purpose. Fractional imputation is another possibility, especially for binary or ordinal study variables. Multiple imputation is also potentially useful in this direction to create multiple mass-imputed datasets. The subscript “MI” in this case might need to be changed to “MI2”, meaning “Mass Imputation with Multiple Imputation”.


Date modified: