Bayesian inference for a variance component model using pairwise composite likelihood with survey data
Section 4. Extension to unequal probability sampling designs
An important extension of our setting is to a complex
sampling framework, where frequentist parameter estimation through estimation
of a population-level pairwise composite likelihood is now in fairly common
use. RVH and YRL have shown that an approach based on applying a frequentist
pairwise composite likelihood works well for estimating multilevel model
variance components in the case of certain unequal probability sampling
designs, and avoids the issue of inconsistency when the second stage sample
sizes are small. The uncertainty estimation in this approach uses estimating
function theory and may not require the adjustments we consider in this paper.
However, it would be desirable to formulate a Bayesian counterpart of this
method. If a Bayesian formulation were agreed upon, the results of our paper
would predict a need for adjustment of the
pseudo-log-pairwise-composite-likelihood to align it with an appropriate log
full likelihood function.
Suppose that the purpose is still analytic, that the
model for is (1.1), and the objects of inference are the
mean and the variance component or its square root. The survey population has first stage units with sizes and the first-stage sample consists of of these, selected with an unequal probability
sampling design. At the second stage, elementary units are selected by simple random
sampling from the first stage unit, if that unit has been
sampled at the first-stage. If the sizes and and the sampling design probabilities (where runs through the two-stage subsets of the
population satisfying the sample size specifications) do not depend on the or values, the likelihood function can be taken
to be of the form of (2.3), with replaced by and the extension of our work is
straightforward in principle. However, if the sizes or sampling design
probabilities do depend on the values of or they will be informative about the parameters
of interest. The sample-level likelihood function from the combination of multilevel
model and sampling design may be ill-defined or intractable. From a Bayesian
perspective we then need to consider what can reasonably substitute for the
true likelihood, and how closely that substitute can be approximated by an
adjusted pairwise composite likelihood. The answers may depend upon the
preferred method of using the sampling design probabilities in inference, and
there are several possibilities. Pursuing these possibilities would be a
fruitful avenue for future research.
One method, with limited applicability, would be based
on the approach of Léon-Novelo and Savitsky (2019). Assuming single stage
Bernoulli sampling (so that the sampling probabilities are fully determined by
the inclusion probabilities) they model the joint distribution of the outcome
variable, and the inclusion probability, using the model generating from in the population and a model generating from and To make computations feasible there are
restrictions on the form of this model; see their Theorem 1 and,
especially, the special case in their Section 2.1.
We can extend the model in Section 2.1 of
Léon-Novelo and Savitsky (2019) to two-stage cluster sampling. A further
extension, i.e., replacing the sampling density of with a pairwise composite likelihood analogous
to the likelihood part of (2.6), can be made. Thus, subject to the limitations
in Theorem 1 of Léon-Novelo and Savitsky (2019), there are counterparts to
the posterior densities, (2.5) and (2.6), that include the inclusion
probabilities.
Another method, not fully Bayesian, but perhaps the most
widely applicable extension of our approach, is to consider the population
(census) log likelihood function ((2.5) and (2.6) of RVH) to be correct, and
formulate a corresponding census log pairwise composite likelihood function as
in our Section 2. We would then try to estimate the latter from the sample
using sampling weights ((4.2) of RVH), and make adjustments such as appropriate
weight normalization, or “scaling” as in Pfeffermann, Skinner, Holmes,
Goldstein and Rasbash (1998), and curvature adjustments to the resulting
estimated log pairwise composite likelihood function. This would produce a log
pseudo-pairwise-likelihood function that could be used as an approximate log
likelihood function in Bayesian inference. It would yield a Bayesian
counterpart to the frequentist method put forward by RVH and YRL, and would
extend the method of this paper to the unequal probability sampling situation.
We have obtained some preliminary details for this
second approach. That is, if is known, analytic expressions for the full
likelihood and pairwise composite likelihood are available for at the census level. For the partial likelihood
we alter (2.8) by taking fixed and add the weights and as in (4.2) of RVH. With a locally uniform
prior for
where
with
and
After some algebra,
Similarly, we
alter (2.7) by taking fixed
and adding the weights. With a locally uniform prior for
After some algebra,
where
and
Choice of the scaling of the weights will be important.
To quantify the overstated precision in the log pairwise composite posterior a
numerical evaluation may be required.
An advantage of
pursuing extensions of this Bayesian approach further in future research would
be that it is focused on inference for the model parameters rather than on
finite population quantities, and thus it would not be necessary to bring
third- or fourth-order inclusion probabilities into uncertainty estimation for or
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© His Majesty the King in Right of Canada as represented by the Minister of Industry, 2022
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa