# Variance estimation under monotone non-response for a panel survey

Section 6. Illustration

In this section, we aim at illustrating our results on a real data set from the ELFE survey. The population of inference consists of infants born in one of the 544 French maternity units during 2011, except very premature infants. Our illustration is meant to mimic as closely as possible the methodology of the ELFE survey. In particular, the modeling of attrition at each time is performed with variables available at baseline as explanatory variables only. As pointed out by the Associate Editor, under the MAR assumption, the variables of interest measured at any times $\delta \mathrm{<}t$ may also have been used to model attrition between times $t-1$ and $t.$

An original sample ${s}_{0}$ of about 35,600 infants was originally selected when the babies were just a few days old and were still at the maternity unit. The sample was selected using a cross-classified sampling design (Skinner, 2015; Juillard, Chauvet and Ruiz-Gazen, 2016). A sample of days and a sample of maternity units were independently selected, and both sample selections may be approximated by stratified simple random sampling (STSI). The sample consisted in all the infants born during one of the 25 selected days in one of the 320 selected maternity units.

Among the 35,600 infants originally selected, a total of
18,329 face-to-face interviews were completed with their families, which
represents a response rate of 51%. This led to the subsample
${s}_{1}$
after accounting for non-response. The weights
at time
$t\mathrm{=1}$
were computed on the basis of the original
sampling weights, adjusted in two steps. First, response probabilities were
estimated by means of a model of Response Homogeneity Groups (RHGs), with 20
RHGs defined by using a logistic regression model with explanatory variables *Age
of the mother*, *Gemellary identity* and *Season of birth*. Then,
a calibration by means of the raking ratio method was performed on the binary
variables *Born within marriage*, *Immigrant mother* and *Gemellary
identity*.

When the children reached the age of two months, the
parents had the first phone interview with a response rate of 87%. This leads
to the subsample
${s}_{2}.$
The weights at time
$t\mathrm{=2}$
were computed on the basis on the weight
obtained at time
$t\mathrm{=1},$
with a two-step adjustment. First, response
probabilities were estimated by means of 20 RHGs, defined by using a logistic
regression with explanatory variables *Age of the mother*, *Mother
nationality* and *Father present at childbirth*. Then, a calibration by
the raking ratio method was performed on the same calibration variables as at
time
$t\mathrm{=1.}$

When the children were one year old, the parents were contacted by phone with a response rate of 77%. This led to the subsample ${s}_{3}.$ The weights at time $t\mathrm{=3}$ were computed on the basis on the weights obtained at time $t\mathrm{=2},$ with a two-step adjustment similar to that realized at time $t\mathrm{=2.}$

We considered three variables of interest: *Breastfeeding
exclusivity at the childbirth, at two month, at one year*. For each of these
variables, we computed the estimator
${\widehat{R}}_{t}$
and the calibrated estimator
${\widehat{R}}_{wt}$
for the percentage
$R\left(t\right)$
of breastfeeding among all the children at
time
$t,$
and the associated variance estimators. We
also computed the estimated coefficient of variation (in percent), defined as

$$\widehat{{\text{CV}}_{t}}\left({\widehat{Y}}_{t}\right)\text{\hspace{0.17em}}\mathrm{=\text{\hspace{0.17em}}100}\times \frac{\sqrt{{\widehat{V}}_{t}\left({\widehat{Y}}_{t}\right)}}{{\widehat{Y}}_{t}}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(6.1)$$

For each component ${\widehat{V}}_{ta}$ in the estimated variance ${\widehat{V}}_{t},$ we computed its contribution (in percent) defined as

$$\text{CONTR}\left({\widehat{V}}_{ta}\right)\mathrm{=100}\times \frac{{\widehat{V}}_{ta}-{\widehat{V}}_{t}}{{\widehat{V}}_{t}}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(6.2)$$

We also computed the simplified variance estimator for non-response ${\widehat{V}}_{t\mathrm{,}\text{\hspace{0.17em}}\text{simp}}^{\text{nr}},$ and the relative difference (in percent) with the approximately unbiased variance estimator ${\widehat{V}}^{\text{nr}}$ defined as

$$\text{RD}\left({\widehat{V}}_{t\mathrm{,}\text{\hspace{0.17em}}\text{simp}}^{\text{nr}}\right)\mathrm{=100}\times \frac{{\widehat{V}}_{t\mathrm{,}\text{\hspace{0.17em}}\text{simp}}^{\text{nr}}-{\widehat{V}}_{t}^{\text{nr}}}{{\widehat{V}}_{t}^{\text{nr}}}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(6.3)$$

The results are given in Table 6.1. As observed in the simulation study, the RD of the simplified variance estimator for non-response is negligible in all cases.

Breastfeeding exclusivity | $t\mathrm{=1}$ maternity |
$t\mathrm{=2}$ 2 months |
$t\mathrm{=3}$ 1 year |
$t\mathrm{=1}$ maternity |
$t\mathrm{=2}$ 2 months |
$t\mathrm{=3}$ 1 year |
---|---|---|---|---|---|---|

without calibration | with calibration | |||||

${\widehat{R}}_{t}\left(\%\right)$ | 59.0 | 30.6 | 3.3 | 59.4 | 31.0 | 3.4 |

$\widehat{V}\left({\widehat{R}}_{t}\right)$ | 1.34E-05 | 1.50E-05 | 2.58E-06 | 1.28E-05 | 1.48E-05 | 2.60E-06 |

$\widehat{C}\text{V}\left({\widehat{Y}}_{t}\right)\left(\%\right)$ | 0.6 | 1.3 | 4.8 | 0.6 | 1.2 | 4.7 |

$\text{CONTR}\left({\widehat{V}}_{t}^{p}\right)$ | 31 | 34 | 24 | 28 | 34 | 25 |

$\text{CONTR}\left({\widehat{V}}_{t}^{\text{nr}1}\right)$ | 69 | 51 | 42 | 72 | 51 | 41 |

$\text{CONTR}\left({\widehat{V}}_{t}^{\text{nr}2}\right)$ | - | 15 | 13 | - | 15 | 13 |

$\text{CONTR}\left({\widehat{V}}_{t}^{\text{nr}3}\right)$ | - | - | 21 | - | - | 21 |

$\text{RD}\left({\widehat{V}}_{t\mathrm{,}\text{\hspace{0.17em}}\text{simp}}^{\text{nr}}\right)$ | 2 | 2 | 0 | 1 | 2 | 0 |

- Date modified: