3 Pseudo-likelihood-based selection with BIC

Chen Xu, Jiahua Chen and Harold Mantel

Previous | Next

3.1 BIC in surveys

With the model settings described in Section 2, it is clear that, if the measurement ( y i , x i ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba WaaeWabeaacaWG5bWaaSbaaSqaaiaadMgaaeqaaOGaaGilaiaahIha daWgaaWcbaGaamyAaaqabaaakiaawIcacaGLPaaaaaa@425E@  is observed for every unit in population D, MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Wefv3ySLgznfgDOfdaryqr1ngBPrginfgDObYtUvgaiuaacqWFdepr caGGSaaaaa@47CE@  the randomness in the data introduced by the probability sampling design is completely gone. In this situation, the selection of the influential variables is based on the entire population and the classical selection criteria developed in non-survey settings (purely model-based) remain valid for model-design-based inference. In particular, let s{ 1,,p } MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4CaiabgAOinpaacmqabaGaaGymaiaaiYcacqWIMaYscaaISaGa amiCaaGaay5Eaiaaw2haaaaa@4540@  be an arbitrary set of τ( s ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaeqiXdq3aaeWabeaacaWGZbaacaGLOaGaayzkaaaaaa@401E@  covariates, which corresponds to a candidate model in form of (2.1). The "census-based� BIC (Schwarz 1978) selects the model (covariates) that minimizes

BIC N ( s )=2 l N ( β s )+τ( s )logN,       ( 3.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaaeOqaiaabMeacaqGdbWaaSbaaSqaaiaad6eaaeqaaOWaaeWabeaa caWGZbaacaGLOaGaayzkaaGaeyypa0JaeyOeI0IaaGOmaiaadYgada WgaaWcbaGaamOtaaqabaGcdaqadeqaaiqahk7agaafamaaBaaaleaa caWGZbaabeaaaOGaayjkaiaawMcaaiabgUcaRiabes8a0naabmqaba Gaam4CaaGaayjkaiaawMcaaiGacYgacaGGVbGaai4zaiaad6eacaaI SaGaaCzcaiaaxMaadaqadaqaaabaaaaaaaaapeGaaG4maiaac6caca aIXaaapaGaayjkaiaawMcaaaaa@591B@

where l N ( β )= i=1 N logf( y i ; x i β ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamiBamaaBaaaleaacaWGobaabeaakmaabmqabaGaaCOSdaGaayjk aiaawMcaaiabg2da9maaqadabeWcbaGaamyAaiabg2da9iaaigdaae aacaWGobaaniabggHiLdGcciGGSbGaai4BaiaacEgacaWGMbWaaeWa beaacaWG5bWaaSbaaSqaaiaadMgaaeqaaOGaai4oaiaahIhadaWgaa WcbaGaamyAaaqabaGccaWHYoaacaGLOaGaayzkaaaaaa@52B7@  is the census log-likelihood function and β s MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GabCOSdyaauaWaaSbaaSqaaiaadohaaeqaaaaa@3E54@  is the maximizer of l N ( β ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamiBamaaBaaaleaacaWGobaabeaakmaabmqabaGaaCOSdaGaayjk aiaawMcaaaaa@4099@  based on s. MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4Caiaac6caaaa@3D81@  It can be seen that the BIC (3.1) is a decreasing function of the maximized log-likelihood and an increasing function of the number of variables included in the model. Hence, a lower BIC implies either a simpler model (fewer explanatory variables), a better fit (higher maximized likelihood), or both. A model with balanced complexity and goodness of fit is preferred.

We note that the census BIC (3.1) is conceptual, because observing ( y i , x i ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba WaaeWabeaacaWG5bWaaSbaaSqaaiaadMgaaeqaaOGaaGilaiaahIha daWgaaWcbaGaamyAaaqabaaakiaawIcacaGLPaaaaaa@425E@  for all units in D MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Wefv3ySLgznfgDOfdaryqr1ngBPrginfgDObYtUvgaiuaacqWFdepr aaa@471E@  is usually not feasible in applications. Instead, a representative sample d={ i 1 ,, i n }{ 1,,N } MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaamizaiabg2da9maacmqabaGaamyAamaaBaaaleaacaaIXaaabeaa kiaaiYcacqWIMaYscaaISaGaamyAamaaBaaaleaacaWGUbaabeaaaO Gaay5Eaiaaw2haaiabgkOimpaacmqabaGaaGymaiaaiYcacqWIMaYs caaISaGaamOtaaGaay5Eaiaaw2haaaaa@4EC6@  with n MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamOBaaaa@3CCA@  units is often drawn from D MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Wefv3ySLgznfgDOfdaryqr1ngBPrginfgDObYtUvgaiuaacqWFdepr aaa@471E@  and the measurements are observed based on the sampled units. Due to the intrinsic dependence structure among the sampled units, a full likelihood on d MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaamizaaaa@3CC0@  is prohibitive to compute in general. Alternatively, for the model-design-based inference, a pseudo-log-likelihood function is frequently used, which takes the form

l n ( β )= id w i logf( y i ;β )       ( 3.2 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamiBamaaBaaaleaacaWGUbaabeaakmaabmqabaGaaCOSdaGaayjk aiaawMcaaiabg2da9maaqafabaGaam4DamaaBaaaleaacaWGPbaabe aakiGacYgacaGGVbGaai4zaiaadAgadaqadeqaaiaadMhadaWgaaWc baGaamyAaaqabaGccaGG7aGaaCOSdaGaayjkaiaawMcaaaWcbaGaam yAaiabgIGiolaadsgaaeqaniabggHiLdGccaWLjaGaaCzcamaabmaa baaeaaaaaaaaa8qacaaIZaGaaiOlaiaaikdaa8aacaGLOaGaayzkaa aaaa@57F2@

with w i =k/ P( id ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4DamaaBaaaleaacaWGPbaabeaakiabg2da9maalyaabaGaam4A aaqaaiaadcfadaqadaqaaiaadMgacqGHiiIZcaWGKbaacaGLOaGaay zkaaaaaaaa@45BC@  denoting the survey weight for the i th MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamyAamaaCaaaleqabaGaaeiDaiaabIgaaaaaaa@3ED4@  unit. The scaling parameter k MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4Aaaaa@3CC7@  in w i MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4DamaaBaaaleaacaWGPbaabeaaaaa@3DED@  does not have analytical impacts on the pseudo-likelihood-based inference. For the simplicity of presentation, we choose k=n/N MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4Aaiabg2da9maalyaabaGaamOBaaqaaiaad6eaaaaaaa@3FA9@  such that n 1 l n ( β ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamOBamaaCaaaleqabaGaeyOeI0IaaGymaaaakiaadYgadaWgaaWc baGaamOBaaqabaGcdaqadeqaaiaahk7aaiaawIcacaGLPaaaaaa@438B@  is design-unbiased to N 1 l N ( β ). MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamOtamaaCaaaleqabaGaeyOeI0IaaGymaaaakiaadYgadaWgaaWc baGaamOtaaqabaGcdaqadeqaaiaahk7aaiaawIcacaGLPaaacaGGUa aaaa@43FD@  Maximizing l n ( β ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamiBamaaBaaaleaacaWGUbaabeaakmaabmqabaGaaCOSdaGaayjk aiaawMcaaaaa@40B9@  over β MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaaCOSdaaa@3D15@  leads to a maximum pseudo-likelihood estimator (MPLE) β ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GabCOSdyaajaaaaa@3D25@  for β, MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaaCOSdiaacYcaaaa@3DC5@  i.e.,

β ^ =arg max β l n ( β ). MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GabCOSdyaajaacceGae8xpa0JaaeyyaiaabkhacaqGNbWaaCbeaeaa ciGGTbGaaiyyaiaacIhaaSqaaiabek7aIbqabaGccaWGSbWaaSbaaS qaaiaad6gaaeqaaOWaaeWabeaacaWHYoaacaGLOaGaayzkaaGaaGOl aaaa@4B44@

Under the appropriate sampling designs, β ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GabCOSdyaajaaaaa@3D25@  is often n 1/2 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamOBamaaCaaaleqabaGaeyOeI0YaaSGbaeaacaaIXaaabaGaaGOm aaaaaaaaaa@3F71@  consistent for β MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaaCOSdaaa@3D15@  under the joint randomization framework. The idea of using pseudo-likelihood for inference on model parameters has been widely adopted in the literature (see, e.g., Binder 1983; Godambe and Thompson 1986; Molina and Skinner 1992).

In this paper, we aim to develop an analogue of BIC criterion based on the pseudo-likelihood. Following the super-population formulation described in Section 2, let β s MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaaCOSdmaaBaaaleaacaWGZbaabeaaaaa@3E39@  be the τ( s )­ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaeqiXdq3aaeWabeaacaWGZbaacaGLOaGaayzkaaqefCuzVj3zPfga iuaacaWFTcaaaa@442D@ dimensional coefficient of model s MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4Caaaa@3CCF@  and let ν s MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaeqyVd42aaSbaaSqaaiaadohaaeqaaaaa@3EB3@  be the prior density of β s . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaaCOSdmaaBaaaleaacaWGZbaabeaakiaac6caaaa@3EF5@  Then a pseudo-marginal density function of the data is given by

P n ( y|s )= L n ( y; β s ) ν s ( β s )d β s MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamiuamaaBaaaleaacaWGUbaabeaakmaabmqabaWaaqGaaeaacaWH 5baacaGLiWoacaWGZbaacaGLOaGaayzkaaGaeyypa0Zaa8qaaeqale qabeqdcqGHRiI8aOGaamitamaaBaaaleaacaWGUbaabeaakmaabmqa baGaaCyEaiaahUdacaWHYoWaaSbaaSqaaiaadohaaeqaaaGccaGLOa GaayzkaaGaeqyVd42aaSbaaSqaaiaadohaaeqaaOWaaeWabeaacaWH YoWaaSbaaSqaaiaadohaaeqaaaGccaGLOaGaayzkaaGaamizaiaahk 7adaWgaaWcbaGaam4CaaqabaGccaaIUaaaaa@589A@

with L n ( y; β s )=exp{ l n ( y; β s ) }. MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamitamaaBaaaleaacaWGUbaabeaakmaabmqabaGaaCyEaiaahUda caWHYoWaaSbaaSqaaiaadohaaeqaaaGccaGLOaGaayzkaaGaeyypa0 JaciyzaiaacIhacaGGWbWaaiWabeaacaWGSbWaaSbaaSqaaiaad6ga aeqaaOWaaeWabeaacaWH5bGaaC4oaiaahk7adaWgaaWcbaGaam4Caa qabaaakiaawIcacaGLPaaaaiaawUhacaGL9baacaGGUaaaaa@5228@  Consequently, we may regard the following expression as the pseudo-posterior probability of the model s: MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4CaiaacQdaaaa@3D8D@

P n ( s|y )= P n ( y|s )P( s ) sS P( s ) P n ( y|s ) ,       ( 3.3 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamiuamaaBaaaleaacaWGUbaabeaakmaabmqabaWaaqGaaeaacaWG ZbaacaGLiWoacaWH5baacaGLOaGaayzkaaGaeyypa0ZaaSaaaeaaca WGqbWaaSbaaSqaaiaad6gaaeqaaOWaaeWabeaadaabcaqaaiaahMha aiaawIa7aiaadohaaiaawIcacaGLPaaacaWGqbWaaeWabeaacaWGZb aacaGLOaGaayzkaaaabaWaaabuaeaacaWGqbWaaeWabeaacaWGZbaa caGLOaGaayzkaaGaamiuamaaBaaaleaacaWGUbaabeaakmaabmqaba WaaqGaaeaacaWH5baacaGLiWoacaWGZbaacaGLOaGaayzkaaaaleaa caWGZbGaeyicI4Saam4uaaqab0GaeyyeIuoaaaGccaaISaGaaCzcai aaxMaadaqadaqaaabaaaaaaaaapeGaaG4maiaac6cacaaIZaaapaGa ayjkaiaawMcaaaaa@6442@

where S MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4uaaaa@3CAF@  denotes the collection of all candidate models. In the spirit of Bayesian analysis, the model with the highest P n ( s|y ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamiuamaaBaaaleaacaWGUbaabeaakmaabmqabaWaaqGaaeaacaWG ZbaacaGLiWoacaWH5baacaGLOaGaayzkaaaaaa@42EF@  is then considered to be the one that receives the most support from the data. Since sS P( s ) P n ( y|s ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba WaaabeaeaacaWGqbWaaeWabeaacaWGZbaacaGLOaGaayzkaaGaamiu amaaBaaaleaacaWGUbaabeaakmaabmqabaWaaqGaaeaacaWH5baaca GLiWoacaWGZbaacaGLOaGaayzkaaaaleaacaWGZbGaeyicI4Saam4u aaqab0GaeyyeIuoaaaa@4B7D@  does not depend on any specific model, the highest P n ( s|y ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamiuamaaBaaaleaacaWGUbaabeaakmaabmqabaWaaqGaaeaacaWG ZbaacaGLiWoacaWH5baacaGLOaGaayzkaaaaaa@42EF@  is achieved by the model that maximizes the corresponding P n ( y|s )P( s ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamiuamaaBaaaleaacaWGUbaabeaakmaabmqabaWaaqGaaeaacaWH 5baacaGLiWoacaWGZbaacaGLOaGaayzkaaGaamiuamaabmqabaGaam 4CaaGaayjkaiaawMcaaaaa@4646@ . When the uniform prior P( s )=ζ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamiuamaabmqabaGaam4CaaGaayjkaiaawMcaaiabg2da9iabeA7a 6baa@41F1@  is used and the weight scaling is chosen as k=n/N , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4Aaiabg2da9maalyaabaGaamOBaaqaaiaad6eaaaGaaiilaaaa @4059@  we obtain a Laplace approximation under some regularity conditions (see Xu and Chen 2012):

2log{ P n ( y|s ) }=2 l n ( β ^ s )+τ( s )logn+ O p ( 1 ). MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaeyOeI0IaaGOmaiGacYgacaGGVbGaai4zamaacmqabaGaamiuamaa BaaaleaacaWGUbaabeaakmaabmqabaWaaqGaaeaacaWH5baacaGLiW oacaWGZbaacaGLOaGaayzkaaaacaGL7bGaayzFaaGaeyypa0JaeyOe I0IaaGOmaiaadYgadaWgaaWcbaGaamOBaaqabaGcdaqadeqaaiqahk 7agaqcamaaBaaaleaacaWGZbaabeaaaOGaayjkaiaawMcaaiabgUca Riabes8a0naabmqabaGaam4CaaGaayjkaiaawMcaaiGacYgacaGGVb Gaai4zaiaad6gacqGHRaWkcaWGpbWaaSbaaSqaaiaadchaaeqaaOWa aeWabeaacaaIXaaacaGLOaGaayzkaaGaaGOlaaaa@6133@

Accordingly, we choose the model s MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4Caaaa@3CCF@  that minimizes

BIC n ( s )=2 l n ( β ^ s )+τ( s )logn.       ( 3.4 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaaeOqaiaabMeacaqGdbWaaSbaaSqaaiaad6gaaeqaaOWaaeWabeaa caWGZbaacaGLOaGaayzkaaGaeyypa0JaeyOeI0IaaGOmaiaadYgada WgaaWcbaGaamOBaaqabaGcdaqadeqaaiqahk7agaqcamaaBaaaleaa caWGZbaabeaaaOGaayjkaiaawMcaaiabgUcaRiabes8a0naabmqaba Gaam4CaaGaayjkaiaawMcaaiGacYgacaGGVbGaai4zaiaad6gacaaI UaGaaCzcaiaaxMaacaWLjaWaaeWaaeaaqaaaaaaaaaWdbiaaiodaca GGUaGaaGinaaWdaiaawIcacaGLPaaaaaa@5A17@

Compared with the census BIC (3.1), the first term in BIC (3.4) is the maximum survey-weighted pseudo-likelihood, which is potentially helpful to avoid sampling errors that might lead to biased inferences for the target population. We refer to (3.4) as a pseudo-likelihood-based version of BIC in the context of surveys. In the joint randomization framework, we establish the selection consistency of using BIC (3.4) through a PPL-based implementation procedure, as will be seen in Section 4.

3.2 Implementing BIC via penalized pseudo-likelihood

In applications, a straightforward way to implement BIC is best-subset selection, where BIC is evaluated and compared for each candidate model. However, this procedure can be computationally impractical when the number of covariates is large. Alternatively, penalized likelihood methods have recently been used as computationally efficient procedures for implementing a selection criterion. These methods exclude variables from the model by estimating their coefficients to be zero, and shrink the other coefficients accordingly. By varying the penalty on the likelihood, we can obtain a series of models with differing sparsity. To avoid an exhaustive search of the entire model space, the selection criterion is used to pick an optimal one among these sparse models. The effectiveness of this implementation strategy has been illustrated in the non-survey context for BIC (Wang, Li and Tsai 2007; Liu, Wang and Liang 2011) and GCV (Fan and Li 2001; Xie, Pan and Shen 2008) among others.

Sharing the same spirit, we proposed a penalized pseudo-likelihood (PPL) procedure for the implementation of BIC (3.4) for survey data. Specifically, following pseudo-likelihood (3.2) with k=n/N , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4Aaiabg2da9maalyaabaGaamOBaaqaaiaad6eaaaGaaiilaaaa @4059@  we define the survey-weighted penalized estimator β ^ λ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GabCOSdyaajaWaaSbaaSqaaiabeU7aSbqabaaaaa@3F05@  that maximizes the penalized pseudo-likelihood function

Q n ( β )= l n ( β )n j=1 p ϕ λ ( | β j | ),       ( 3.5 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamyuamaaBaaaleaacaWGUbaabeaakmaabmqabaGaaCOSdaGaayjk aiaawMcaaiabg2da9iaadYgadaWgaaWcbaGaamOBaaqabaGcdaqade qaaiaahk7aaiaawIcacaGLPaaacqGHsislcaWGUbWaaabCaeaacqaH vpGzdaWgaaWcbaGaeq4UdWgabeaaaeaacaWGQbGaeyypa0JaaGymaa qaaiaadchaa0GaeyyeIuoakmaabmaabaWaaqWaaeaacqaHYoGydaWg aaWcbaGaamOAaaqabaaakiaawEa7caGLiWoaaiaawIcacaGLPaaaca aISaGaaCzcaiaaxMaadaqadaqaaabaaaaaaaaapeGaaG4maiaac6ca caaI1aaapaGaayjkaiaawMcaaaaa@5F45@

where ϕ λ ( ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaeqy1dy2aaSbaaSqaaiabeU7aSbqabaGcdaqadeqaaiabgwSixdGa ayjkaiaawMcaaaaa@435D@  is a penalty function indexed by a tuning parameter λ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaeq4UdWgaaa@3D8B@  controlling the size of the penalty. With an appropriate choice of ϕ λ ( ), β ^ λ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaeqy1dy2aaSbaaSqaaiabeU7aSbqabaGcdaqadeqaaiabgwSixdGa ayjkaiaawMcaaiaacYcaceWHYoGbaKaadaWgaaWcbaGaeq4UdWgabe aaaaa@473B@  contains zero estimates for some coefficients and thus automatically produces a sparse model. The desirable sparsity of β ^ λ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GabCOSdyaajaWaaSbaaSqaaiabeU7aSbqabaaaaa@3F05@  typically requires the singularity of the corresponding ϕ λ ( ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaeqy1dy2aaSbaaSqaaiabeU7aSbqabaGcdaqadeqaaiabgwSixdGa ayjkaiaawMcaaaaa@435D@  at the origin. Some popular choices of ϕ λ ( ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaeqy1dy2aaSbaaSqaaiabeU7aSbqabaGcdaqadeqaaiabgwSixdGa ayjkaiaawMcaaaaa@435D@  include the L γ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaamitamaaBaaaleaacqaHZoWzaeqaaaaa@3E7B@  penalty (Frank and Friedman 1993; Tibshirani 1996), i.e., ϕ λ ( | β | )=λ | β | γ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaeqy1dy2aaSbaaSqaaiabeU7aSbqabaGcdaqadaqaamaaemaabaGa eqOSdigacaGLhWUaayjcSdaacaGLOaGaayzkaaGaeyypa0Jaeq4UdW 2aaqWaaeaacqaHYoGyaiaawEa7caGLiWoadaahaaWcbeqaaiabeo7a Nbaaaaa@4F26@  with γ( 0,1 ], MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaeq4SdCMaeyicI48aaKamaeaacaaIWaGaaGilaiaaigdaaiaawIca caGLDbaacaGGSaaaaa@43CF@  and the SCAD penalty (Fan and Li 2001), which is defined by the following derivative:

ϕ λ ( | β | )=λ{ I( | β |λ )+ ( aλ| β | ) + ( a1 )λ I( | β |>λ ) }       ( 3.6 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gafqy1dyMbauaadaWgaaWcbaGaeq4UdWgabeaakmaabmaabaWaaqWa aeaacqaHYoGyaiaawEa7caGLiWoaaiaawIcacaGLPaaacqGH9aqpcq aH7oaBdaGadaqaaiaadMeadaqadaqaamaaemaabaGaeqOSdigacaGL hWUaayjcSdGaeyizImQaeq4UdWgacaGLOaGaayzkaaGaey4kaSYaaS aaaeaadaqadaqaaiaadggacqaH7oaBcqGHsisldaabdaqaaiabek7a IbGaay5bSlaawIa7aaGaayjkaiaawMcaamaaBaaaleaacqGHRaWkae qaaaGcbaWaaeWaaeaacaWGHbGaeyOeI0IaaGymaaGaayjkaiaawMca aiabeU7aSbaacaWGjbWaaeWaaeaadaabdaqaaiabek7aIbGaay5bSl aawIa7aiabg6da+iabeU7aSbGaayjkaiaawMcaaaGaay5Eaiaaw2ha aiaaxMaacaWLjaWaaeWaaeaaqaaaaaaaaaWdbiaaiodacaGGUaGaaG OnaaWdaiaawIcacaGLPaaaaaa@75F8@

with a= MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaamyyaiabg2da9aaa@3DC3@  3.7 being a common choice.

With different values of λ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaeq4UdWgaaa@3D8B@  for a properly specified ϕ λ ( ), β ^ λ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaeqy1dy2aaSbaaSqaaiabeU7aSbqabaGcdaqadeqaaiabgwSixdGa ayjkaiaawMcaaiaacYcaceWHYoGbaKaadaWgaaWcbaGaeq4UdWgabe aaaaa@473B@  leads to models of differing sparsity. These sparse models (with respect to λ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaeq4UdWgaaa@3D8B@  ) naturally form a collection of candidate models. BIC (3.4) can then be used to select an optimal model within this collection. To be more specific, let Ω MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaeuyQdCfaaa@3D65@  be the range of λ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaeq4UdWgaaa@3D8B@  and let s λ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4CamaaBaaaleaacqaH7oaBaeqaaaaa@3EAF@  denote the model produced by β ^ λ . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GabCOSdyaajaWaaSbaaSqaaiabeU7aSbqabaGccaGGUaaaaa@3FC1@  We treat S Ω ={ s λ :λΩ } MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4uamaaBaaaleaacqqHPoWvaeqaaOGaeyypa0ZaaiWabeaacaWG ZbWaaSbaaSqaaiabeU7aSbqabaGccaGG6aGaeq4UdWMaeyicI4Saeu yQdCfacaGL7bGaayzFaaaaaa@4A11@  as the collection of candidate models under consideration, and select the model s * S Ω MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba Gaam4CamaaCaaaleqabaGaaiOkaaaakiabgIGiolaadofadaWgaaWc baGaeuyQdCfabeaaaaa@41CA@  such that BIC n ( s * )= min λΩ BIC( s λ ). MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rq1rFfpeea0x e9LqFf0xe9q8qqvqFr0dXdHiVc=bYP0xb9sq=fFfeu0RXxb9qr0dd9 q8qi0lf9Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcba GaaeOqaiaabMeacaqGdbWaaSbaaSqaaiaad6gaaeqaaOWaaeWabeaa caWGZbWaaWbaaSqabeaacaGGQaaaaaGccaGLOaGaayzkaaGaeyypa0 JaciyBaiaacMgacaGGUbWaaSbaaSqaaiabeU7aSjabgIGiolabfM6a xbqabaGccaqGcbGaaeysaiaaboeadaqadeqaaiaadohadaWgaaWcba Gaeq4UdWgabeaaaOGaayjkaiaawMcaaiaac6caaaa@5307@  We refer to this selection procedure as the penalized pseudo-likelihood-based BIC method (PPL-BIC). Compared with traditional best-subset selection, the PPL-BIC procedure focuses on the models that are produced by the survey-weighted penalized estimators, and therefore it can be much less computationally expensive.

Previous | Next

Date modified: