Variance estimation under monotone non-response for a panel survey Section 6. Illustration

In this section, we aim at illustrating our results on a real data set from the ELFE survey. The population of inference consists of infants born in one of the 544 French maternity units during 2011, except very premature infants. Our illustration is meant to mimic as closely as possible the methodology of the ELFE survey. In particular, the modeling of attrition at each time is performed with variables available at baseline as explanatory variables only. As pointed out by the Associate Editor, under the MAR assumption, the variables of interest measured at any times $\delta may also have been used to model attrition between times $t-1$ and $t.$

An original sample ${s}_{0}$ of about 35,600 infants was originally selected when the babies were just a few days old and were still at the maternity unit. The sample was selected using a cross-classified sampling design (Skinner, 2015; Juillard, Chauvet and Ruiz-Gazen, 2016). A sample of days and a sample of maternity units were independently selected, and both sample selections may be approximated by stratified simple random sampling (STSI). The sample consisted in all the infants born during one of the 25 selected days in one of the 320 selected maternity units.

Among the 35,600 infants originally selected, a total of 18,329 face-to-face interviews were completed with their families, which represents a response rate of 51%. This led to the subsample ${s}_{1}$ after accounting for non-response. The weights at time $t=1$ were computed on the basis of the original sampling weights, adjusted in two steps. First, response probabilities were estimated by means of a model of Response Homogeneity Groups (RHGs), with 20 RHGs defined by using a logistic regression model with explanatory variables Age of the mother, Gemellary identity and Season of birth. Then, a calibration by means of the raking ratio method was performed on the binary variables Born within marriage, Immigrant mother and Gemellary identity.

When the children reached the age of two months, the parents had the first phone interview with a response rate of 87%. This leads to the subsample ${s}_{2}.$ The weights at time $t=2$ were computed on the basis on the weight obtained at time $t=1,$ with a two-step adjustment. First, response probabilities were estimated by means of 20 RHGs, defined by using a logistic regression with explanatory variables Age of the mother, Mother nationality and Father present at childbirth. Then, a calibration by the raking ratio method was performed on the same calibration variables as at time $t=1.$

When the children were one year old, the parents were contacted by phone with a response rate of 77%. This led to the subsample ${s}_{3}.$ The weights at time $t=3$ were computed on the basis on the weights obtained at time $t=2,$ with a two-step adjustment similar to that realized at time $t=2.$

We considered three variables of interest: Breastfeeding exclusivity at the childbirth, at two month, at one year. For each of these variables, we computed the estimator ${\stackrel{^}{R}}_{t}$ and the calibrated estimator ${\stackrel{^}{R}}_{wt}$ for the percentage $R\left(t\right)$ of breastfeeding among all the children at time $t,$ and the associated variance estimators. We also computed the estimated coefficient of variation (in percent), defined as

$\stackrel{^}{{\text{CV}}_{t}}\left({\stackrel{^}{Y}}_{t}\right)\text{\hspace{0.17em}}=\text{\hspace{0.17em}}100×\frac{\sqrt{{\stackrel{^}{V}}_{t}\left({\stackrel{^}{Y}}_{t}\right)}}{{\stackrel{^}{Y}}_{t}}.\text{ }\text{ }\text{ }\text{ }\text{ }\left(6.1\right)$

For each component ${\stackrel{^}{V}}_{ta}$ in the estimated variance ${\stackrel{^}{V}}_{t},$ we computed its contribution (in percent) defined as

$\text{CONTR}\left({\stackrel{^}{V}}_{ta}\right)=100×\frac{{\stackrel{^}{V}}_{ta}-{\stackrel{^}{V}}_{t}}{{\stackrel{^}{V}}_{t}}.\text{ }\text{ }\text{ }\text{ }\text{ }\left(6.2\right)$

We also computed the simplified variance estimator for non-response ${\stackrel{^}{V}}_{t,\text{\hspace{0.17em}}\text{simp}}^{\text{nr}},$ and the relative difference (in percent) with the approximately unbiased variance estimator ${\stackrel{^}{V}}^{\text{nr}}$ defined as

$\text{RD}\left({\stackrel{^}{V}}_{t,\text{\hspace{0.17em}}\text{simp}}^{\text{nr}}\right)=100×\frac{{\stackrel{^}{V}}_{t,\text{\hspace{0.17em}}\text{simp}}^{\text{nr}}-{\stackrel{^}{V}}_{t}^{\text{nr}}}{{\stackrel{^}{V}}_{t}^{\text{nr}}}.\text{ }\text{ }\text{ }\text{ }\text{ }\left(6.3\right)$

The results are given in Table 6.1. As observed in the simulation study, the RD of the simplified variance estimator for non-response is negligible in all cases.

Table 6.1
Estimates for a ratio, variance estimates, coefficient of variation, relative contributions of variance components and relative difference of a simplified variance estimator for a variable in the ELFE survey
Table summary
This table displays the results of Estimates for a ratio. The information is grouped by Breastfeeding exclusivity (appearing as row headers), $t=1$
maternity, $t=2$
2 months, $t=3$
1 year (appearing as column headers).
Breastfeeding exclusivity $t=1$
maternity
$t=2$
2 months
$t=3$
1 year
$t=1$
maternity
$t=2$
2 months
$t=3$
1 year
without calibration with calibration
${\stackrel{^}{R}}_{t}\left(%\right)$ 59.0 30.6 3.3 59.4 31.0 3.4
$\stackrel{^}{V}\left({\stackrel{^}{R}}_{t}\right)$ 1.34E-05 1.50E-05 2.58E-06 1.28E-05 1.48E-05 2.60E-06
$\stackrel{^}{C}\text{V}\left({\stackrel{^}{Y}}_{t}\right)\left(%\right)$ 0.6 1.3 4.8 0.6 1.2 4.7
$\text{CONTR}\left({\stackrel{^}{V}}_{t}^{p}\right)$ 31 34 24 28 34 25
$\text{CONTR}\left({\stackrel{^}{V}}_{t}^{\text{nr}1}\right)$ 69 51 42 72 51 41
$\text{CONTR}\left({\stackrel{^}{V}}_{t}^{\text{nr}2}\right)$ - 15 13 - 15 13
$\text{CONTR}\left({\stackrel{^}{V}}_{t}^{\text{nr}3}\right)$ - - 21 - - 21
$\text{RD}\left({\stackrel{^}{V}}_{t,\text{\hspace{0.17em}}\text{simp}}^{\text{nr}}\right)$ 2 2 0 1 2 0
﻿
Date modified: