3 Pseudo-likelihood-based selection with BIC
Chen Xu, Jiahua Chen and Harold Mantel
Previous | Next
3.1 BIC in surveys
With the model settings described in Section 2, it is
clear that, if the measurement is observed for every unit in population the randomness in the data introduced by the
probability sampling design is completely gone. In this situation, the
selection of the influential variables is based on the entire population and
the classical selection criteria developed in non-survey settings (purely
model-based) remain valid for model-design-based inference. In particular, let be an arbitrary set of covariates, which corresponds to a candidate
model in form of (2.1). The "census-based� BIC (Schwarz 1978) selects the model
(covariates) that minimizes
where is the census log-likelihood function and is the maximizer of based on It can be seen that the BIC (3.1) is a
decreasing function of the maximized log-likelihood and an increasing function
of the number of variables included in the model. Hence, a lower BIC implies
either a simpler model (fewer explanatory variables), a better fit (higher
maximized likelihood), or both. A model with balanced complexity and goodness
of fit is preferred.
We note that the census BIC (3.1) is conceptual,
because observing for all units in is usually not feasible in applications.
Instead, a representative sample with units is often drawn from and the measurements are observed based on the
sampled units. Due to the intrinsic dependence structure among the sampled
units, a full likelihood on is prohibitive to compute in general.
Alternatively, for the model-design-based inference, a pseudo-log-likelihood
function is frequently used, which takes the form
with denoting the survey weight for the unit. The scaling parameter in does not have analytical impacts on the
pseudo-likelihood-based inference. For the simplicity of presentation, we
choose such that is design-unbiased to Maximizing over leads to a maximum pseudo-likelihood estimator
(MPLE) for i.e.,
Under the appropriate sampling designs, is often consistent for under the joint randomization framework. The
idea of using pseudo-likelihood for inference on model parameters has been
widely adopted in the literature (see, e.g.,
Binder 1983; Godambe and Thompson 1986; Molina and Skinner 1992).
In this paper, we aim to develop an analogue of BIC
criterion based on the pseudo-likelihood. Following the super-population
formulation described in Section 2, let be the dimensional coefficient of model and let be the prior density of Then a pseudo-marginal density function of the
data is given by
with Consequently, we may regard the following
expression as the pseudo-posterior probability of the model
where denotes the collection of all candidate
models. In the spirit of Bayesian analysis, the model with the highest is then considered to be the one that receives
the most support from the data. Since does not depend on any specific model, the
highest is achieved by the model that maximizes the
corresponding . When the uniform prior is used and the weight scaling is chosen as we obtain a Laplace approximation under some
regularity conditions (see Xu and Chen 2012):
Accordingly, we choose the model that minimizes
Compared with the census BIC (3.1), the first term
in BIC (3.4) is the maximum survey-weighted pseudo-likelihood, which is
potentially helpful to avoid sampling errors that might lead to biased
inferences for the target population. We refer to (3.4) as a pseudo-likelihood-based
version of BIC in the context of surveys. In the joint randomization framework,
we establish the selection consistency of using BIC (3.4) through a PPL-based
implementation procedure, as will be seen in Section 4.
3.2 Implementing BIC via penalized pseudo-likelihood
In applications, a straightforward way to implement BIC
is best-subset selection, where BIC is evaluated and compared for each
candidate model. However, this procedure can be computationally impractical
when the number of covariates is large. Alternatively, penalized likelihood
methods have recently been used as computationally efficient procedures for
implementing a selection criterion. These methods exclude variables from the
model by estimating their coefficients to be zero, and shrink the other
coefficients accordingly. By varying the penalty on the likelihood, we can
obtain a series of models with differing sparsity. To avoid an exhaustive
search of the entire model space, the selection criterion is used to pick an
optimal one among these sparse models. The effectiveness of this implementation
strategy has been illustrated in the non-survey context for BIC (Wang, Li and
Tsai 2007; Liu, Wang and Liang 2011) and GCV (Fan and Li 2001; Xie, Pan and
Shen 2008) among others.
Sharing the same spirit, we proposed a penalized
pseudo-likelihood (PPL) procedure for the implementation of BIC (3.4) for
survey data. Specifically, following pseudo-likelihood (3.2) with we define the survey-weighted penalized
estimator that maximizes the penalized pseudo-likelihood
function
where is a penalty function indexed by a tuning
parameter controlling the size of the penalty. With an
appropriate choice of contains zero estimates for some coefficients
and thus automatically produces a sparse model. The desirable sparsity of typically requires the singularity of the
corresponding at the origin. Some popular choices of include the penalty (Frank and Friedman 1993; Tibshirani
1996), i.e., with and the SCAD penalty (Fan and Li 2001), which
is defined by the following derivative:
with 3.7 being a common choice.
With different values of for a properly specified leads to models of differing sparsity. These
sparse models (with respect to ) naturally form a collection of candidate
models. BIC (3.4) can then be used to select an optimal model within this
collection. To be more specific, let be the range of and let denote the model produced by We treat as the collection of candidate models under
consideration, and select the model such that We refer to this selection procedure as the
penalized pseudo-likelihood-based BIC method (PPL-BIC). Compared with
traditional best-subset selection, the PPL-BIC procedure focuses on the models
that are produced by the survey-weighted penalized estimators, and therefore it
can be much less computationally expensive.
Previous | Next