Multilevel time series modelling of antenatal care coverage in Bangladesh at disaggregated administrative levels
Section 3. Data sources and input estimates

3.1 Data sources

Since 1993-94 the BDHS has been conducted under the authority of the National Institute of Population Research and Training (NIPORT) of the Ministry of Health and Family Welfare (MOHFW) to evaluate existing health and social programs and to design new strategies for improving the health status of the country’s women and children. Until 2018, eight BDHS surveys have been conducted: in 1993-94, 1996-97, 2000, 2004, 2007, 2011, 2014 and 2017-18. In this study, the survey data over the period 1994-2014 have been used since the district level location of the surveyed clusters is not disclosed in the most recent BDHS 2017-18. Over the period of 1994-2014, three Population and Housing Censuses have been conducted, in 1991, 2001 and 2011. Full census data are not available, but only 10% of Census 1991 data, 10% of Census 2001 data and 5% of Census 2011 data are publicly available from IPUMS-International (https://international.ipums.org). A number of district-level contextual variables have been generated and used in the development of cross-sectional FH models to produce input estimates for the MTS models.

3.2 Direct estimates

The variables analysed in this paper are ANC0 and ANC4. Bangladesh is divided into 7 sub-national regions, called divisions. These divisions are further divided into 64 districts, which is the most detailed regional level considered in this study. As a first step, estimates and variance estimates of the two target variables at the district level are obtained from each survey year’s unit-level data using the standard design-based direct survey estimator (hereafter denoted by DIR), where the survey weights are used to account for the sampling design and for non-response.

In this study, reproductive age ever-married women who have given birth within the last three years before a survey year are considered as the target population. Since in the census population such pregnancy related information is not available, area-specific population size is estimated by the number of reproductive age ever-married women available in the three Censuses. This means that even though the area-specific sample sizes are based on a census, there is some uncertainty about them, which is ignored in the SAE models. See Das, van den Brakel, Boonstra and Haslett (2021) for more details about division and district specific population sizes.

The BDHS uses a two-stage stratified sample of households. The strata are formed from divisions and sub-divisions according to their urban-rural characterization. The primary sampling units (PSUs) are the enumeration areas of the Population and Housing Census created to have an average of about 120 households (slightly vary over census). In the first stage, PSUs are selected with probabilities proportional to PSU size, i.e., the number of households. In the second stage, a complete household listing is carried out in all selected PSUs and then about 30 households are selected from each PSU using systematic sampling. The response rates among eligible women have been over 95% in all BDHS years. Though the sample size of the ever-married women is greater than 10,000 in all the surveys, in this study only the ever-married women who had a child birth in the three years preceding the survey year are considered, and therefore sample sizes are smaller. At the district level, mean sample sizes vary between 60 and 114, with some districts having less than 10 or even no observed women.

Sampling weights are calculated based on selection probabilities. These weights are then adjusted for household and individual non-response. The direct estimate for the population proportion in a certain domain $i$ for survey year $t$ is computed as the sample mean

${\hat{Y}}_{i t} = \frac{\sum_{j \in s_{i t}} w_{i j t} y_{i j t}}{\sum_{j \in s_{i t}} w_{i j t}}, (3.1)$

where $y$ is the response variable of interest, $s_{i t}$ is the set of ever-married women in domain $i$ for which $y$ is observed in year $t,$ and $w_{i j t}$ is the survey weight for person $j$ living in area $i$ in year $t .$ Note that the weights $w_{i j t}$ are scaled such that the sum over the weights in the sample is equal to the net sample size. The corresponding variance estimates are approximated as

$var ({\hat{Y}}_{i t}) = \frac{1}{n_{i t} (n_{i t} - 1)} \sum_{j \in s_{i t}} w_{i j t} {(y_{i j t} - {\hat{Y}}_{i t})}^{2}, (3.2)$

where $n_{i t}$ is the number of ever-married women observed in domain $i$ at the survey year $t .$ Initially, the variance was approximated by calculating the variance among the estimated PSU totals as if they were selected by using stratified sampling with replacement, known as the ultimate sampling unit variance approximation. This resulted in zero variance estimates for a few domains. Variance approximation (3.2) avoids these zero variance estimates, and otherwise results in variance estimates comparable with the initial approximation where PSUs were assumed to be selected with replacement. In the first MTS model, denoted by MTS-I, these direct estimates are used as the input series.

3.3 Cross-sectional Fay-Herriot estimates

An issue with the MTS-I model is the use of census data as auxiliary variables in the MTS model. Because the time gap between two subsequent censuses is 10 years whereas the BDHS is conducted every 3 or 4 years, the census covariates remain the same until the new census data are available. Including these census data as covariates in the MTS-I models will bias estimates of trends and period-to-period changes. One way to take advantage of the census information is to model the direct estimates at the district level in separate cross-sectional FH models using relevant contextual variables extracted from the census data. It is also expected that the use of on-time available census auxiliary variables in repetitive cross-sectional FH models may affect regression coefficients and the accuracy of model predictions of the dependent variable, but not the predictions of the dependent variable itself. Compared to the direct estimates used in MTS-I, these cross-sectional FH models also provide better estimates by already borrowing some strength over districts.

The cross-sectional FH estimates and their standard errors are used as input for a second model, denoted by MTS-II. The cross-sectional FH estimates are correlated due to their common fixed effect components, which is ignored in MTS-II. Therefore a third MTS model, denoted by MTS-III, is developed using cross-sectional FH estimates and their full covariance matrix as input.

The fixed and random effect components for the survey-specific cross-sectional FH models are shown in Appendix Tables A.2 and A.3. For all the models, random effects are assumed to follow a normal distribution. Non-normal models have been considered for the random effects (Laplace and horseshoe) and the sampling error (t-distribution) as alternatives for the normal distribution. This, however, did not improve the model fit.

3.4 Generalized variance functions

In the FH and MTS models, the variance estimates of the direct estimates are largely treated as fixed given quantities. Since these variance estimates can be very noisy, they are smoothed using a GVF before using them in the FH and MTS models. It is understood that a district without sample information is considered as missing and is therefore not considered in the model development approach. The cross-sectional FH model can produce estimates and standard errors for these out-of-sample domains. These synthetic estimates are, however, not used in the development of the MTS-II and MTS-III models to allow for a better comparison with the MTS-I model.

The GVFs are regression models that relate the variance estimates to predictors such as sample size, survey design variables, and point estimates (Wolter (2007), Chapter 7). For both ANC0 and ANC4, the following GVF is used:

$\log se ({\hat{Y}}_{i t}) = α + β \log {\tilde{Y}}_{i t} + γ \log (m_{i t} + 1) + δ Division + \in_{i t}, (3.3)$

where $se ({\hat{Y}}_{i t})$ is the standard error of ${\hat{Y}}_{i t}$ in (3.1), $m_{i t}$ the number of sampling units contributing to district $i$ in year $t$ and $Division$ is a categorical variable with 7 levels. Since we cannot trust the direct estimates for very small $m_{i t},$ the ${\tilde{Y}}_{i t}$ on the right hand side of (3.3) are simple smoothed estimates

$\begin{array}{l} {\tilde{Y}}_{i t} & = λ_{i t} {\hat{Y}}_{i t} + (1 - λ_{i t}) {\bar{Y}}_{d [i] t}, \\ λ_{i t} & = \frac{m_{i t}}{m_{i t} + 1}, \end{array} (3.4)$

where ${\bar{Y}}_{d [i] t}$ denotes the mean for division $d (d = 1$ to 7) to which district $i$ belongs, in year $t .$ As mentioned by a referee, a composite regression estimator can be used as an alternative for (3.4).

The regression errors $\in_{i t},$ are assumed to be independent and normally distributed with a common variance parameter $σ^{2} .$ The GVFs are fitted only to districts with non-zero standard errors of the direct estimates. The predicted (smoothed) standard errors based on the fitted models are

${se}_{pred} ({\hat{Y}}_{i t}) = \exp (\hat{α} + \hat{β} \log {\tilde{Y}}_{i t} + \hat{γ} \log (m_{i t} + 1) + \hat{δ} Division + {\hat{σ}}^{2} / 2), (3.5)$

where $\hat{σ}$ is 0.03 for ANC0 and 0.003 for ANC4, respectively. The R-squared values for both models are quite high 0.79 for ANC0 and 0.99 for ANC4. Note that the exponential back-transformation in (3.5) includes a bias correction, which in this case has only a small effect. This approach is used to get smoothed standard errors for the cross-sectional FH models and MTS-I model.

3.5 Transformations of input series

Square root, log and log-ratio transformation are considered as a variance stabilizing transformation, see Sakia (1992). The square root transformation is applied to ANC4 data (the MTS models and the cross-sectional FH models) since this transformation reduces the correlation between point estimates and their standard errors of the input series, reduces heterogeneity, improves the convergence of the MCMC simulation, and reduces the skewness of proportion data if they take values close to the lower boundary of zero. For ANC0, the square root transformation is only used for the year specific cross-sectional FH models in 2011 and 2014 only. In the other years, no transformation is applied. In all three MTS models, no transformation is applied for ANC0 since the square root transformation for the input series increases the dependency between direct estimates and standard errors.

Let ${\hat{Y}}_{i t} = \sqrt{({\hat{Y}}_{i t} + ε)}$ denote the square root transformed direct estimates, where $ε$ is a small number (0.005), necessary because for some districts direct estimates equal zero. Using a first order Taylor approximation it can be shown that $se ({\hat{Y}}_{i t}) \approx se ({\hat{Y}}_{i t}) / (2 \sqrt{{\hat{Y}}_{i t} + ε}) .$

If the GVF (3.3) is applied to the standard errors of the untransformed direct estimates, then the standard errors for domains with a very small number of sampling units can become unreasonably large due to the linearisation approximation. This issue is avoided by applying the GVF to the standard errors of the transformed estimates, i.e., $se ({\hat{Y}}_{i t}).$

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2022-12-15

Language selection

Search and menus

Search

Multilevel time series modelling of antenatal care coverage in Bangladesh at disaggregated administrative levels
Section 3. Data sources and input estimates

3.1 Data sources

3.2 Direct estimates

3.3 Cross-sectional Fay-Herriot estimates

3.4 Generalized variance functions

3.5 Transformations of input series

Multilevel time series modelling of antenatal care coverage in Bangladesh at disaggregated administrative levels Section 3. Data sources and input estimates

3.1 Data sources

3.2 Direct estimates

3.3 Cross-sectional Fay-Herriot estimates

3.4 Generalized variance functions

3.5 Transformations of input series

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Multilevel time series modelling of antenatal care coverage in Bangladesh at disaggregated administrative levels
Section 3. Data sources and input estimates