Multilevel time series modelling of antenatal care coverage in Bangladesh at disaggregated administrative levels
Section 3. Data sources and input estimates
3.1 Data sources
Since 1993-94 the BDHS has been conducted under the
authority of the National Institute of Population Research and Training
(NIPORT) of the Ministry of Health and Family Welfare (MOHFW) to evaluate
existing health and social programs and to design new strategies for improving
the health status of the country’s women and children. Until 2018, eight BDHS
surveys have been conducted: in 1993-94, 1996-97, 2000, 2004, 2007, 2011, 2014
and 2017-18. In this study, the survey data over the period 1994-2014 have been
used since the district level location of the surveyed clusters is not
disclosed in the most recent BDHS 2017-18. Over the period of 1994-2014, three
Population and Housing Censuses have been conducted, in 1991, 2001 and 2011.
Full census data are not available, but only 10% of Census 1991 data, 10% of
Census 2001 data and 5% of Census 2011 data are publicly available from
IPUMS-International (https://international.ipums.org). A number of
district-level contextual variables have been generated and used in the
development of cross-sectional FH models to produce input estimates for the MTS
models.
3.2 Direct estimates
The variables analysed in this paper are ANC0 and ANC4.
Bangladesh is divided into 7 sub-national regions, called divisions. These
divisions are further divided into 64 districts, which is the most detailed
regional level considered in this study. As a first step, estimates and
variance estimates of the two target variables at the district level are
obtained from each survey year’s unit-level data using the standard design-based
direct survey estimator (hereafter denoted by DIR), where the survey weights
are used to account for the sampling design and for non-response.
In this study, reproductive age ever-married women who
have given birth within the last three years before a survey year are
considered as the target population. Since in the census population such
pregnancy related information is not available, area-specific population size
is estimated by the number of reproductive age ever-married women available in
the three Censuses. This means that even though the area-specific sample sizes
are based on a census, there is some uncertainty about them, which is ignored
in the SAE models. See Das, van den Brakel, Boonstra and Haslett (2021) for more details about
division and district specific population sizes.
The BDHS uses a two-stage stratified sample of
households. The strata are formed from divisions and sub-divisions according to
their urban-rural characterization. The primary sampling units (PSUs) are the
enumeration areas of the Population and Housing Census created to have an
average of about 120 households (slightly vary over census). In the first
stage, PSUs are selected with probabilities proportional to PSU size, i.e., the
number of households. In the second stage, a complete household listing is
carried out in all selected PSUs and then about 30 households are selected from
each PSU using systematic sampling. The response rates among eligible women
have been over 95% in all BDHS years. Though the sample size of the
ever-married women is greater than 10,000 in all the surveys, in this study
only the ever-married women who had a child birth in the three years preceding
the survey year are considered, and therefore sample sizes are smaller. At the
district level, mean sample sizes vary between 60 and 114, with some districts
having less than 10 or even no observed women.
Sampling weights are calculated based on selection
probabilities. These weights are then adjusted for household and individual
non-response. The direct estimate for the population proportion in a certain
domain for survey year is computed as the sample mean
where is the response variable of interest, is the set of ever-married women in domain for which is observed in year and is the survey weight for person living in area in year Note that the weights are scaled such that the sum over the weights
in the sample is equal to the net sample size. The corresponding variance
estimates are approximated as
where is the number of ever-married women observed
in domain at the survey year Initially, the variance was approximated by
calculating the variance among the estimated PSU totals as if they were
selected by using stratified sampling with replacement, known as the ultimate
sampling unit variance approximation. This resulted in zero variance estimates
for a few domains. Variance approximation (3.2) avoids these zero variance
estimates, and otherwise results in variance estimates comparable with the
initial approximation where PSUs were assumed to be selected with replacement.
In the first MTS model, denoted by MTS-I, these direct estimates are used as
the input series.
3.3 Cross-sectional Fay-Herriot estimates
An issue with the MTS-I model is the use of census data
as auxiliary variables in the MTS model. Because the time gap between two
subsequent censuses is 10 years whereas the BDHS is conducted every 3 or 4
years, the census covariates remain the same until the new census data are
available. Including these census data as covariates in the MTS-I models will
bias estimates of trends and period-to-period changes. One way to take
advantage of the census information is to model the direct estimates at the
district level in separate cross-sectional FH models using relevant contextual
variables extracted from the census data. It is also expected that the use of
on-time available census auxiliary variables in repetitive cross-sectional FH
models may affect regression coefficients and the accuracy of model predictions
of the dependent variable, but not the predictions of the dependent variable
itself. Compared to the direct estimates used in MTS-I, these cross-sectional
FH models also provide better estimates by already borrowing some strength over
districts.
The cross-sectional FH estimates and their standard
errors are used as input for a second model, denoted by MTS-II. The
cross-sectional FH estimates are correlated due to their common fixed effect
components, which is ignored in MTS-II. Therefore a third MTS model, denoted by
MTS-III, is developed using cross-sectional FH estimates and their full
covariance matrix as input.
The fixed and random effect components for the
survey-specific cross-sectional FH models are shown in Appendix Tables A.2
and A.3. For all the models, random effects are assumed to follow a normal
distribution. Non-normal models have been considered for the random effects
(Laplace and horseshoe) and the sampling error (t-distribution) as alternatives
for the normal distribution. This, however, did not improve the model fit.
3.4 Generalized variance functions
In the FH and MTS models, the variance estimates of the
direct estimates are largely treated as fixed given quantities. Since these
variance estimates can be very noisy, they are smoothed using a GVF before
using them in the FH and MTS models. It is understood that a district without
sample information is considered as missing and is therefore not considered in
the model development approach. The cross-sectional FH model can produce
estimates and standard errors for these out-of-sample domains. These synthetic
estimates are, however, not used in the development of the MTS-II and MTS-III
models to allow for a better comparison with the MTS-I model.
The GVFs are regression models that relate the variance
estimates to predictors such as sample size, survey design variables, and point
estimates (Wolter (2007),
Chapter 7). For both ANC0 and ANC4, the following GVF is used:
where is the standard error of in (3.1), the number of sampling units contributing to
district in year and is a categorical variable with 7 levels. Since
we cannot trust the direct estimates for very small the on the right hand side of (3.3) are simple
smoothed estimates
where denotes the mean for division to 7) to which district belongs, in year As mentioned by a referee, a composite
regression estimator can be used as an alternative for (3.4).
The regression errors are assumed to be independent and normally
distributed with a common variance parameter The GVFs are fitted only to districts with
non-zero standard errors of the direct estimates. The predicted (smoothed)
standard errors based on the fitted models are
where is 0.03 for ANC0 and 0.003 for ANC4,
respectively. The R-squared values for both models are quite high 0.79 for ANC0
and 0.99 for ANC4. Note that the exponential back-transformation in (3.5)
includes a bias correction, which in this case has only a small effect. This
approach is used to get smoothed standard errors for the cross-sectional FH
models and MTS-I model.
3.5 Transformations of input series
Square root, log and log-ratio transformation are
considered as a variance stabilizing transformation, see Sakia (1992). The square root
transformation is applied to ANC4 data (the MTS models and the cross-sectional
FH models) since this transformation reduces the correlation between point
estimates and their standard errors of the input series, reduces heterogeneity,
improves the convergence of the MCMC simulation, and reduces the skewness of
proportion data if they take values close to the lower boundary of zero. For
ANC0, the square root transformation is only used for the year specific
cross-sectional FH models in 2011 and 2014 only. In the other years, no
transformation is applied. In all three MTS models, no transformation is
applied for ANC0 since the square root transformation for the input series
increases the dependency between direct estimates and standard errors.
Let denote the square root transformed direct
estimates, where is a small number (0.005), necessary because
for some districts direct estimates equal zero. Using a first order Taylor
approximation it can be shown that
If the GVF (3.3) is applied to the standard errors of
the untransformed direct estimates, then the standard errors for domains with a
very small number of sampling units can become unreasonably large due to the
linearisation approximation. This issue is avoided by applying the GVF to the
standard errors of the transformed estimates, i.e.,
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© His Majesty the King in Right of Canada as represented by the Minister of Industry, 2022
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa