Bayesian inference for a variance component model using pairwise composite likelihood with survey data
Section 1. Introduction

Multi-stage survey designs are used in many population-based surveys. Increasingly, multilevel models have been used to make inferences when data are obtained from a multi-stage survey.

Desiring to improve such inferences Rao, Verret and Hidiroglou (2013) (RVH) proposed using a weighted log pairwise composite likelihood approach. There is an extensive literature on composite likelihoods: see review papers by Varin (2008), Varin, Reid and Firth (2011) and Yi (2017), and many applications. In their Section 4 RVH describe a unified approach applicable to both linear and generalized linear models. Important aspects of their work include (a) obtaining design-consistent point estimates of mean and regression parameters and variance components, and (b) using only first-order inclusion probabilities and second-order probabilities within clusters. In particular, RVH work in (a) is important because of design inconsistency when the number of clusters (first-stage units) grows while the cluster sample sizes remain small (Pfeffermann, Skinner, Holmes, Goldstein and Rasbash, 1998). Unlike the pseudo-likelihood approach in common use (Rabe-Hesketh and Skrondal, 2006) their method ensures that (a) holds for outcomes from generalized linear models. The research in RVH has been extended by Yi, Rao and Li (2016) (YRL), who provide a more general framework, additional theory and extensive simulations.

Two related developments have led to our research. First, there is increasing interest in the use of Bayesian methods for inferences from survey data. Section 5 has a general reference together with an introduction to papers describing extensive use of Bayesian methods at the National Agricultural Statistical Service of the US Department of Agriculture. Second, there is (Bayesian) literature demonstrating the possibility of overstated precision by using unadjusted composite likelihoods, e.g., Ribatet, Cooley and Davison (2012) (RCD) and Stoehr and Friel (2018).

Our approach is to start with a posterior distribution taken proportional to the product of a composite likelihood and prior distribution. Comparing this approximate posterior distribution with one using the full likelihood, we show that inferences based on the approximate posterior exhibit overstated precision. Making adjustments to the posterior distribution based on the composite likelihood as in RCD, we then use simulations to compare the three ways of formulating a posterior distribution, i.e., those based on the full, composite and adjusted composite likelihoods. This is done by visual evaluation of the graphs of the posterior densities and coverages (over repeated simulations) of 95% credible intervals for the model parameters.

The methodology is described in Section 2.3. The adjustments to the approximate posterior distribution based on a composite likelihood are derived from a transformation of the logarithm of the composite likelihood at its mode, designed so that the negative of the inverse of the curvature matrix of the approximate posterior density at its mode will match the corresponding posterior variance-covariance matrix of the parameters. This is similar to the property in frequentist inference that the inverse of the observed Fisher information matrix (negative of the Hessian of the log likelihood at its mode) estimates the variance-covariance matrix of the maximum likelihood estimates.

To focus on the main issue we use a “noninformative” prior distribution for the parameters of our model, described below. Then the corresponding posterior density is close to the normalized likelihood, and advances shown in a Bayesian context would also be seen in a frequentist model-based approach.

To simplify the initial investigation a standard linear random effects (intercept only) superpopulation model is assumed. Consider a survey population drawn from that superpopulation and composed of a large number N MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGobaaaa@3280@  of clusters, each of a common size, say, m. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGTbGaaiOlaaaa@3351@  Let Y ij MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGzbWaaSbaaSqaaiaadMgacaWGQb aabeaaaaa@3494@  denote the continuous response variable for elementary unit j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGQbaaaa@329C@  in cluster i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGPbaaaa@329B@  with i=1,,N MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGPbGaaGjbVlaai2dacaaMe8UaaG ymaiaaiYcacaaMe8UaeSOjGSKaaGilaiaaysW7caWGobaaaa@3DB2@  and j=1,,m. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGQbGaaGjbVlaai2dacaaMe8UaaG ymaiaaiYcacaaMe8UaeSOjGSKaaGilaiaaysW7caWGTbGaaiOlaaaa @3E84@  Then we write

Y ij =θ+ u i + e ij (1.1) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGzbWaaSbaaSqaaiaadMgacaWGQb aabeaakiaaysW7caaMe8UaaGypaiaaysW7caaMe8UaeqiUdeNaaGjb VlabgUcaRiaaysW7caWG1bWaaSbaaSqaaiaadMgaaeqaaOGaaGjbVl abgUcaRiaaysW7caWGLbWaaSbaaSqaaiaadMgacaWGQbaabeaakiaa ywW7caaMf8UaaGzbVlaaywW7caaMf8UaaiikaiaaigdacaGGUaGaaG ymaiaacMcaaaa@55A8@

where u i ~N( 0, σ u 2 ), e ij ~N( 0, σ e 2 ), MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG1bWaaSbaaSqaaiaadMgaaeqaaO GaaGjbVJqaaiaa=5hacaaMe8UaamOtaiaaykW7daqadeqaaiaaicda caaISaGaaGjbVlabeo8aZnaaDaaaleaacaWG1baabaGaaGOmaaaaaO GaayjkaiaawMcaaiaaiYcacaaMe8UaamyzamaaBaaaleaacaWGPbGa amOAaaqabaGccaaMe8Uaa8NFaiaaysW7caWGobGaaGPaVpaabmqaba GaaGimaiaaiYcacaaMe8Uaeq4Wdm3aa0baaSqaaiaadwgaaeaacaaI YaaaaaGccaGLOaGaayzkaaGaaiilaaaa@5710@  all u i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWG1bWaaSbaaSqaaiaadMgaaeqaaa aa@33C1@  and e ij MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGLbWaaSbaaSqaaiaadMgacaWGQb aabeaaaaa@34A0@  are independent, and θ, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH4oqCcaGGSaaaaa@3413@   σ u MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaHdpWCdaWgaaWcbaGaamyDaaqaba aaaa@3496@ , and σ e MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaHdpWCdaWgaaWcbaGaamyzaaqaba aaaa@3486@  are parameters.

We also begin by assuming that the survey sampling design is a simple random sample of n MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGUbaaaa@32A0@  clusters, where n MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGUbaaaa@32A0@  is a positive integer. This has the advantage that the model (1.1) holds not only for the superpopulation and the finite population but also (replacing N MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGobaaaa@3280@  by n) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacaWGUbGaaiykaaaa@334D@  for the sample itself, arising from generation of the population followed by the selection of the sample using the sampling design. It ensures that the likelihood function to be used in Bayesian inference is well defined. As well, it can be shown that Bayesian inference from the sample for the parameters of model (1.1) would be interpretable also in terms of the frequentist theory for analytic uses of survey data (Skinner, Holt and Smith, 1989).

Our work is valuable because we show the perils of using an unadjusted pairwise composite likelihood to form an approximate posterior distribution for inference even in this very simple and straightforward case. Extensions to unequal probability sampling designs are discussed in Section 4.

The proposed adjustment leads to excellent frequentist properties for inference about the mean θ. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH4oqCcaGGUaaaaa@3415@  The posterior mean of θ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaH4oqCaaa@3363@  has low bias in the frequency sense, and the frequentist coverage of credible intervals aligns with the nominal levels. For σ u MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfea0=yr0R Yxir=Jbba9q8aq0=yq=He9q8qqQ8frFve9Fve9Ff0dmeaabaqaciGa caGaaeqabaGaaiaadaaakeaacqaHdpWCdaWgaaWcbaGaamyDaaqaba aaaa@3496@  it provides a significant improvement over using the unadjusted composite likelihood. However, the coverage falls short of the nominal level, leading to the need for additional research about how to modify the adjustment.

The rest of the paper is structured as follows. Section 2 provides the definitions of the full, composite and adjusted composite likelihoods and the prior distributions. There follows a description of the curvature adjustment and the reasons for its use. The simulation studies are described in Section 3 including the model, prior distributions, sample sizes and their settings, number of replications, etc. This section also describes how the results are displayed together with a summary of our findings. Extensions to unequal probability sampling cases are discussed in Section 4. Conclusions are in Section 5.


Date modified: