Bayesian small area demography
Section 1. Introduction

Demography has traditionally been a big-data and big-area discipline. Demographers have used censuses, registration data, and surveys to obtain national-level estimates and forecasts. Big sample sizes for national populations have meant that, in contrast to most of applied statistics, sampling variation is small. Demographers have instead concentrated on other problems, such as measurement errors, and developed their own techniques and terminology distinct from mainstream statistics. Classic demographic methods combine simple deterministic models with complex expert judgements. The models are simple enough to be implemented on computer spreadsheets, but require practitioners to intervene and correct for problems caused by violations of the underlying assumptions. These methods have had many successes. They have, for instance, been used to document the dramatic fall in mortality and fertility in developed countries, and have alerted policy makers to future population ageing.

Traditional demographic methods are, however, coming under strain. The reason is the rising demand for disaggregation. Policy makers, social scientists, market researchers, and other users of demographic estimates and forecasts require ever-more disaggregated numbers. The United Nations 2030 Agenda for Sustainable Development, for instance, calls for increasing significantly “the availability of high-quality, timely and reliable data disaggregated by income, gender, age, race, ethnicity, migratory status, disability, geographic location and other characteristics relevant in national contexts” (United Nations General Assembly, 2015, Goal 17.18). Disaggregation is challenging to traditional demography because, even when the overall population is large, the number of people in each subpopulation can be small. With these small numbers, random variation in data collection, or in underlying demographic processes such as fertility, mortality, and migration, becomes prominent, and deterministic methods break down.

To deal with these problems, demographers have been turning to mainstream statistics for new ideas on ways to deal with random variation. Similarly, statisticians have been showing an increasing interest in demographic applications. The result has been a boom in statistical demography (Alho and Spencer, 2006).

Demographic phenomena are often highly regular. Mortality, fertility, and migration rates, for instance, have characteristic age-sex profiles that are stable over time or that change in consistent ways. These regularities reflect common events over individuals’ life courses. Migration rates typically peak in the late teenage years, for instance, because these are the years when people reach adulthood and begin to leave home. The ability to model units that are similar but not identical is a particular strength of Bayesian methods. Bayesians build models with multiple layers that can capture multiple, overlapping types of variability. Bayesian models pool information from across similar units, to improve accuracy and precision.

Bayesian methods have other advantages for demographic modelling. They can coherently combine uncertainty from many sources, including random variation, missing data, and uncertainty about future trends. Bayesian methods also make it easy to construct inferences about derived quantities. Life expectancy, for instance, is a complicated nonlinear deterministic function of age-specific mortality rates, but within a Bayesian framework, deriving inferences about life expectancy from inference about age-specific mortality rates is straightforward.

Because of advantages such as these, within the field of statistical demography, there has been particularly fast growth in Bayesian statistical demography (Bijak and Bryant, 2016). The most prominent example has been the adoption, by the United Nations, of Bayesian methods for population forecasting (Gerland, Raftery, Ševčíková, Li, Gu, Spoorenberg, Alkema, Fosdick, Chunn, Lalic, Bay, Buettner, Heilig and Wilmoth, 2014).

In this paper, we illustrate how Bayesian methods, and particularly Bayesian hierarchical models, can be used to obtain disaggregated demographic estimates and forecasts. The examples are drawn from a long-term project to develop Bayesian demographic methods for use in official statistics, including the development of open source software implementing the methods. In the statistical literature, the problem of obtaining estimates for domains with small sample sizes has been referred to as small area estimation (Pfeffermann, 2013; Rao and Molina, 2015). The models that we consider are all “area-level” models, in that they use counts and rates for disaggregated cells, rather than individual-level data. With area-level models, we can use datasets in the form of confidentialized tables that individual-level models cannot use. Demands for disaggregated estimates and forecasts are also related to groups rather than individuals.

In Section 2, we present mortality estimates for Māori, the indigenous people of New Zealand. The main inferential challenge is to capture the complex relationship between mortality and age, despite small numbers and considerable random variation. In Section 3, we interpolate and forecast obesity rates in New Zealand by age, based on survey data. The main problem here is carrying out a time series analysis with data from only five years. We conclude, in Section 4, by addressing two traditional objections to the use of Bayesian methods in statistical agencies.


Date modified: