Multilevel time series modelling of antenatal care coverage in Bangladesh at disaggregated administrative levels
Section 1. Introduction

Demographic and Health Surveys have been widely used in over 90 countries for estimating national and sub-national level indicators on fertility, family planning, child mortality, child health, maternal health, and nutrition of children and women (DHS, 2021). In the sampling design of the Bangladesh Demographic and Health Survey (BDHS), administrative units lower than the sub-national level (7 divisions), such as 64 districts and more than 450 sub-districts (second and third administrative hierarchies respectively), are not accounted for. Consequently sample sizes are too small to estimate any indicator under division level with standard design-based estimators. Over the time period 1994-2014 seven surveys have been conducted, providing time series of direct estimates at the national level and division level on aforementioned indicators to monitor progress in declining maternal and neonatal mortality in Bangladesh. However, for optimal allocation of resources and policy making, reliable statistical information at the more detailed regional level of districts is required. For these regions, small area estimation models are developed in this paper. Small area estimation refers to a class of model based estimation procedures that improve upon the accuracy of direct domain estimates by increasing the effective sample size in each separate domain with sample information observed in other domains or preceding reference period. This is often referred to as borrowing strength over space or time, respectively (Rao and Molina, 2015).

The BDHS is conducted repeatedly with varying time lags of 3 or 4 years between two consecutive surveys. Seven editions for the period of 1994 until 2014 are included in this study. In this paper multivariate multilevel time series (MTS) models are developed to produce reliable trend estimates of antenatal care (ANC) coverage at district level as well as division and national levels. These models are developed in an hierarchical Bayesian framework and fitted using Markov Chain Monte Carlo simulations. The advantage of a multivariate time series approach is that it takes advantage of all available information by modelling cross-sectional and temporal correlations among districts and reference periods. The models are defined at an annual frequency and therefore properly account for the varying time lags of 3 or 4 years between the subsequent survey occasions. On top of that the MTS models provide predictions for the years without survey data.

Two related response variables are considered in this paper: whether no or at least four antenatal consults have been received, abbreviated as ANC0 and ANC4. Direct estimates along with variance estimates are calculated from the cross-sectional data of the seven BDHS surveys at the most detailed regional level of districts and are used as input for the MTS model. A drawback of this approach is that additional auxiliary information, available from two censuses, cannot be included in the MTS models. The censuses are conducted with intervals of ten years. This implies that the same values of auxiliary information, available from a particular census are used in two or even three subsequent editions of the BDHS conducted after this census but preceding the next census. This creates shocks in the MTS predictions during periods in which information from a new census becomes available. This problem is circumvented by developing the following two-step approach. As a first step, cross-sectional Fay-Herriot (FH) models (Fay and Herriot, 1979) are applied to each survey occasion using the direct estimates at the district level and their smoothed standard errors. The census auxiliary information is used to improve these cross-sectional FH models. In a second step, these cross-sectional FH estimates are used as input in the MTS model. Note that the cross-sectional FH predictions for a particular survey year are correlated. The MTS models account for this correlation by using the full variance-covariance matrices of the cross-sectional FH predictions as input for the MTS model. The advantage of this two-step approach is that it removes large sampling errors from the direct estimates and stabilizes the input series for the MTS models. This relies, however, on the assumption that the input series for the MTS models are not biased due to miss-specification of the cross-sectional FH models. To avoid this, a careful model selection and evaluation process for the cross-sectional FH models in the first step is required.

The MTS models borrow strength over time and space in several ways. Cross-sectional relations are modelled using fixed effects as well as district-level random intercepts and slopes, either independent or correlated. Spatial correlations among districts are also considered. Smooth trends and local level trends at district, division and national level are used to model temporal and cross-sectional correlations. Instead of defining a full correlation matrix between the trend disturbance terms at the district level, trends are defined at the division level, which are shared by all underlying districts. Deviations from this overall trend are modelled with trends at the district level. This is a parsimonious way of modelling cross-sectional relations between districts (Boonstra and van den Brakel, 2019). Trend estimates at the district level are obtained directly from the model outputs, while trends at division and national levels are obtained by aggregation of the district level predictions. The advantage of producing estimates for higher aggregation levels by aggregating predictions from the most detailed regional level is that all publication tables are numerically consistent by definition. Estimates for districts for the non-surveyed years and districts not covered in the surveyed years are also predicted based on the estimated time series models.

The MTS models developed in this paper are extensions of the FH model. Rao and Yu (1994) extended FH model to borrow strength over time by assuming area-specific random effects follow a first-order autoregressive AR(1) model over time independently across areas. Datta, Lahiri, Maiti and Lu (1999), and You and Rao (2000) generalized a time-series extension of the FH model following Rao and Yu (1994) in hierarchical Bayes framework by considering area-specific error terms follow first-order random walk model over time instead of AR(1) process. Datta, Lahiri and Maiti (2002) also used a random walk model for the time component to estimate median income of four-person families by state using time series and cross-sectional data using empirical Bayes estimation method. Marhuenda, Molina and Morales (2013) extended Rao and Yu (1994) model to a spatio-temporal version of FH model by incorporating additional assumption that area-specific random effects follow a first order simultaneously autoregressive process (Pratesi and Salvati, 2008) to include spatial correlation among data from neighboring areas. These extensions are very specific to only area-level random effects component. In the spatio-temporal FH model considered in this study, random effects can be specified at various disaggregation levels beside the target detailed level domains to incorporate spatial, temporal and spatio-temporal correlations among the data. In this regard, the considered hierarchical Bayesian model is more flexible than the other extension of the FH model. Other relevant accounts of multilevel time-series models and state space models extending the FH model to borrow strength over both time and space, include You, Rao and Gambino (2003); You (2008); Pfeffermann and Burck (1990); Pfeffermann and Tiller (2006); Bollineni-Balabay, van den Brakel, Palm and Boonstra (2017); Boonstra and van den Brakel (2022, 2019) and Boonstra, van den Brakel and Das (2021).

The remainder of this article is organized as follows. In Section 2 the need for reliable low regional statistical information to evaluate Sustainable Development Goals related to maternal and neonatal mortality in Bangladesh is described. Section 3 briefly describes the data sources and the computation of direct estimates and variance estimates from the BDHS survey data, along with transformations of direct estimates and the Generalized Variance Function (GVF) approach for smoothing the variance estimates, which both improve model fitting. Section 4 describes the hierarchical Bayesian time series multilevel modelling framework. The models selected for ANC0 and ANC4 are presented in Section 5, along with a brief discussion of the model building process. Section 6 provides a discussion on the trend estimates based on the developed models, and some model evaluation results are illustrated in Section 7. The paper concludes with a discussion in Section 8.


Date modified: