Multilevel time series modelling of antenatal care coverage in Bangladesh at disaggregated administrative levels
Section 4. Time series multilevel modelling

Table of contents

In this study, direct estimates and their standard errors are available for the survey years 1994, 1997, 2000, 2004, 2007, 2011 and 2014. To account for the varying time-lags of 3 or 4 years between the subsequent survey years, the MTS models are defined at an annual frequency, (i.e., values refer to a reference period of one year) at the most detailed regional level of the 64 districts. With a time span of 21 years, there are 1,344 domain-year combinations. With seven available survey years, the model is fitted to the 448 domain-year observations. The years between two subsequent surveys are defined as missing in the model. In this way the period-to-period evolution of the trend is specified correctly and the model provides predictions for the missing domain-year combinations.

For convenience let us now denote by ${\hat{Y}}_{i t}$ the input series for the time series models for either ANC0 or ANC4 in year $t$ and domain $i .$ This can be the untransformed direct estimates, the square root transformed direct estimates or the model predictions obtained with the cross-sectional FH models. Here domain index $i$ runs from 1 to $M_{d} = 64$ and time index $t$ from 1 to $T = 21.$ We further combine these estimates into a vector $\hat{Y} = ({\hat{Y}}_{11}, \dots {\hat{Y}}_{M_{d} 1}, \dots {\hat{Y}}_{1 T}, \dots {\hat{Y}}_{M_{d} T})^{'},$ a vector of dimension $M = M_{d} T .$

4.1 Model structure

The multilevel models considered take the general linear additive form

$\hat{Y} = X β + \sum_{α} Z^{(α)} v^{(α)} + e, (4.1)$

where $X$ is a $M \times p$ design matrix for a $p$ -vector of fixed effects $β,$ and the $Z^{(α)}$ are $M \times q^{(α)}$ design matrices for $q^{(α)}$ -dimensional random effect vectors $v^{(α)} .$ Here the sum over $α$ runs over several possible random effect terms at different levels, such as local level and smooth trends at district and division levels, white noise at the most detailed level of the $M$ domains, etc. This is explained in more detail below. In formula (4.1) $e = (e_{11}, \dots, e_{M_{d} 1}, \dots e_{M_{d} T})^{'}$ denotes, depending on the input series, the sampling errors of the direct estimates or the prediction errors of the cross-sectional FH model. The errors are taken to be normally distributed as $e ~ N (0, Σ)$ where $Σ = \oplus_{t =1}^{T} Σ_{t} .$ If the input series are the untransformed direct estimates, then $Σ_{t}$ is the covariance matrix for the untransformed direct estimates observed in year $t .$ If the input series are transformed, then $Σ_{t}$ is the covariance matrix for the transformed direct estimates, as described in Subsection 3.5. If the input series are the predictions based on the cross-sectional FH models, then $Σ_{t}$ contains the estimated mean squared errors of the FH predictions. Under MTS-II, $Σ_{t}$ is diagonal and ignores the correlations between the domain predictions. Under MTS-III, $Σ_{t}$ is a full covariance matrix that also accommodates the correlations between domain predictions.

Based on the distribution of the sampling errors $e$ in (4.1), the likelihood function conditional on fixed and random effects parameters can be defined as

$p (\hat{Y} | η, Σ) = N (\hat{Y} | η, Σ), (4.2)$

where $η = X β + \sum_{α} Z^{(α)} v^{(α)}$ is the linear predictor. For the errors $e$ a Student-t distribution instead of the normal distribution can be considered to give smaller weight to more outlying observations, following West (1984).

The fixed effect part of $η$ can contain components like an intercept, a linear trend, main effects for division and district and possibly the second-order interactions for linear trends and division or district. The vector $β$ of fixed effects is assigned a normal prior $p (β) = N (0, 100 I_{p}),$ with $I_{x}$ the identity matrix of dimension $x \times x .$ This is only very weakly informative as a standard error of 10 is very large relative to the scales of the (transformed) direct estimates and the covariates used.

The second term on the right hand side of (4.1) consists of a sum of contributions to the linear predictor by random effects or varying coefficient terms. The random effect vectors $v^{(α)}$ for different $α$ are assumed to be independent, but the components within a vector $v^{(α)}$ are possibly correlated to accommodate temporal or cross-sectional correlation. To describe the general model for each vector $v^{(α)}$ of random effects, we suppress superscript $α$ in what follows for notational convenience.

Each random effects vector $v$ is assumed to be distributed as

$v ~ N (0, A \otimes V), (4.3)$

where $V$ and $A$ are $d \times d$ and $l \times l$ covariance matrices, respectively, and $A \otimes V$ denotes the Kronecker product of $A$ with $V .$ The total length of $v$ is $q = d l,$ and these coefficients may be thought of as corresponding to $d$ effects allowed to vary over $l$ levels of a factor variable. If, e.g., $V$ corresponds to division, then $V$ defines $d = 7$ different random effects that correspond to the 7 categories of division. If subsequently $A$ corresponds to time, then $l = 21$ years. In that case each of the 7 effects can vary over its 21 levels (years in this case). Each random effect generated for a division $\times$ year combination is shared by all districts belonging to that division in that particular year.

The covariance matrix $A$ describes the covariance structure among the levels of the factor variable, and is assumed to be known. Instead of covariance matrices, precision matrices $Q_{A} = A^{- 1}$ are actually used, because of computational efficiency (Rue and Held, 2005). The covariance matrix $V$ for the $d$ varying effects can be parameterized in one of three different ways: (i) a full parameterized covariance matrix, (ii) a diagonal matrix with unequal diagonal elements, and (iii) a diagonal matrix with equal diagonal elements. The scaled-inverse Wishart prior is used as proposed in O’Malley and Zaslavsky (2008) and recommended by Gelman and Hill (2007) when a full covariance matrix is assumed, while half-Cauchy priors are used for the standard deviations when the covariance matrix is assumed diagonal with equal or unequal elements. In case of diagonal variances, half-Cauchy priors are better default priors than the more common inverse gamma priors (Gelman, 2006).

The following random effect structures are considered in the model selection procedure:

Random intercepts for the $M_{d}$ domains. In this case $A = I_{M_{d}}$ and $V$ is a scalar variance parameter. This implies $v_{i t} = ν_{i}, \forall t$ and $ν_{i} ~ N (0, σ_{I}^{2}).$
First or second order random walks at different aggregation levels. A first order random walk or local level trend at district level is defined as $v_{i t} = L_{i t}$ with $L_{i t} = L_{i, t - 1} + η_{i t}$ and $η_{i t} ~ N (0, σ_{R 1, i}^{2}).$ A second order random walk or smooth trend model at district level is defined as $v_{i t} = L_{i t}$ with $L_{i t} = L_{i, t - 1} + R_{i, t - 1},$ $R_{i t} = R_{i, t - 1} + η_{i t}$ and $η_{i t} ~ N (0, σ_{R 2, i}^{2}).$ Both kind of trends can be defined similarly at the division or national level. See Rue and Held (2005) for the specification of the precision matrix $Q_{A}$ for first and second order random walks. A full covariance matrix for the trend innovations can be considered to allow for cross-sectional besides temporal correlations, or a diagonal matrix with different or equal variance parameters to allow for temporal correlations only. In the case of equal variances, $σ_{R 1, i}^{2} = σ_{R 1}^{2}$ and $σ_{R 2, i}^{2} = σ_{R 2}^{2}, \forall i .$ First and second order random walk components at district level are denoted below by $RW1_District$ and $RW2_District$ respectively. At division level they are denoted by $RW1_Division$ and $RW2_Division .$
The first order random walks as used in our models cannot capture an overall level as the corresponding random effects are constrained to sum to zero over time. Similarly, the second order random walks cannot capture both level and linear trend. This means that level and linear trend must be accommodated by other model terms, as either fixed or random effects. District-level intercepts have already been discussed under item 1. To also include linear trends by district, this component can be extended to random intercepts and slopes linear in time. In that case $V$ can be either a $2 \times 2$ general covariance matrix

$V = (\begin{matrix} σ_{I}^{2} & ρ_{I S} σ_{I} σ_{S} \\ ρ_{I S} σ_{I} σ_{S} & σ_{S}^{2} \end{matrix}),$

accounting for correlations between intercepts and slopes, or a diagonal matrix with diagonal elements $σ_{I}^{2}$ and $σ_{S}^{2}$ the variances of the radom intercept and slopes respectively. This model component is referred to as $RIS_District$ below.

Spatial random effects: random intercepts varying over the spatial location of districts following an intrinsic conditional autoregressive (ICAR) model (Besag and Kooperberg, 1995), defined as $v_{i} | v_{- i} ~ N ((Σ_{i^{'} \in n b (i)} v_{i^{'}}) / a_{i}, σ_{S p}^{2} / a_{i})$ for each spatial effect conditional on the others. Here $n b (i)$ is the set of domains neighbouring domain $i$ and $a_{i}$ the number of domains neighbouring domain $i$ . See Rue and Held (2005) for the specification of the precision matrix $Q_{A} .$ This spatial component is referred to later as $Spatial_District .$
White noise: to allow for random unexplained variation, white noise at the most detailed domain-by-year level can be included. In this case $A = I_{M}$ and $V$ a scalar variance parameter. This implies $ν_{i t} ~ N (0, σ_{W}^{2}).$

We also investigated generalisations of (4.3) to non-normal distributions of random effects by implementing Student-t, horseshoe prior (Carvalho, Polson and Scott, 2010) and Laplace (Tibshirani, 1996; Park and Casella, 2008). These alternative distributions have fatter tails allowing for occasional large effects. However, these distributions did not improve results for the considered target variables in terms of model information criteria as well as the underlying trend predictions. Therefore the normal distribution is used for all random effect components. The exact lay out of the final MTS models for ANC0 and ANC4 are specified in Subsections 5.1 and 5.2 respectively.

4.2 Model estimation

The models are fitted using Markov Chain Monte Carlo (MCMC) sampling, in particular the Gibbs sampler (Geman and Geman, 1984; Gelfand and Smith, 1990). See Boonstra and van den Brakel (2022) for a specification of the full conditional distributions. The models specified in Subsection 4.1 are run in R (R Core Team, 2015) using package mcmcsae (Boonstra, 2021). The Gibbs sampler is run in parallel for three independent chains with randomly generated starting values. In the model building stage 1,000 iterations are used, in addition to a “burn-in” period of 100 iterations. This was sufficient for reasonably stable Monte Carlo estimates of the model parameters and trend predictions. For the selected model we use a longer run of 1,000 burn-in plus 5,000 iterations of which the draws of every fifth iteration are stored. This leaves $3 \times 1,000 = 3,000$ draws to compute estimates and standard errors. The convergence of the MCMC simulation is assessed using trace and autocorrelation plots as well as the Gelman-Rubin potential scale reduction factor (Gelman and Rubin, 1992), which diagnoses the mixing of the chains. For the longer simulation of the selected model all model parameters and model predictions have potential scale reduction factors below 1.01 and sufficient effective numbers of independent draws.

Many models of the form (4.1) have been fitted to the data. For the comparison of models using the same input data we use the Widely Applicable Information Criterion or Watanabe-Akaike Information Criterion (WAIC) (Watanabe, 2010, 2013) and the Deviance Information Criterion (DIC) (Spiegelhalter, Best, Carlin and van der Linde, 2002). We also compare the models graphically by their model fits and trend predictions at three aggregation levels.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2022-12-15

Language selection

Search and menus

Search

Multilevel time series modelling of antenatal care coverage in Bangladesh at disaggregated administrative levels
Section 4. Time series multilevel modelling

4.1 Model structure

4.2 Model estimation

Multilevel time series modelling of antenatal care coverage in Bangladesh at disaggregated administrative levels Section 4. Time series multilevel modelling

4.1 Model structure

4.2 Model estimation

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Multilevel time series modelling of antenatal care coverage in Bangladesh at disaggregated administrative levels
Section 4. Time series multilevel modelling