Variance estimation under monotone non-response for a panel survey
Section 1. Introduction

Surveys are not only used to produce estimators for one point in time (cross-sectional estimations), but also to measure the evolution of parameters (longitudinal estimations), and are thus repeated over time. In this paper, we are interested in estimation and variance estimation for panel surveys, in which measures are repeated over time for units in a same sample (Kalton, 2009). Among the panel surveys (also known as longitudinal surveys, see Lynn, 2009), cohort surveys are particular cases where the units in the sample are linked by a common original event, such as being born on the same year for children in the ELFE survey (Enquête longitudinale française depuis l’enfance), which is the motivating example for this work.

ELFE is the first longitudinal study of its kind in France, tracking children from birth to adulthood (Pirus, Bois, Dufourg, Lanoë, Vandentorren, Leridon and the ELFE team, 2010). Covering the whole metropolitan France, it was launched in 2011 and consists of more than 18,000 children whose parents consented to their inclusion. It will examine every aspect of these children’s lives from the perspectives of health, social sciences and environmental health. The ELFE survey suffers from unit non-response, which needs to be accounted for by using available auxiliary information, so as to limit the bias of estimators. Though the ELFE survey will be used for illustration in this paper, non-response occurs in virtually any panel survey so that the proposed methods are of general interest; see for example Laurie, Smith and Scott (1999) for the treatment of non-response of the British Household Panel Survey, or Vandecasteele and Debels (2007) for the European Community Household Panel.

Non-response is currently handled by modeling the response probabilities (Kim and Kim, 2007) and by reweighting respondents with the inverse of these estimated probabilities, which leads to the so-called propensity score adjusted estimator. A panel sample may suffer from three types of unit non-response (Hawkes and Plewis, 2009): initial non-response refers to the original absence of selected units; wave non-response occurs when some units in the panel sample temporarily do not answer at some point in time, while attrition occurs when some units in the panel sample permanently do not answer from some point in time. Wave non-response was fairly uncommon in the first waves of the ELFE survey which were at our disposal. We therefore simplify this set-up by assuming monotone non-response, where only initial non-response and attrition occur.

There is a vast literature on the treatment of unit non-response for surveys over time, see Ekholm and Laaksonen (1991), Fuller, Loughin and Baker (1994), Rizzo, Kalton and Brick (1996), Clarke and Tate (2002), Laaksonen and Chambers (2006), Hawkes and Plewis (2009), Rendtel and Harms (2009), Laaksonen (2007), Slud and Bailey (2010), Zhou and Kim (2012). Variance estimation for longitudinal estimators is considered in Tam (1984), Laniel (1988), Nordberg (2000), Berger (2004), Skinner and Vieira (2005), Qualité and Tillé (2008) and Chauvet and Goga (2018), but with focus on the sampling variance only. Variance estimation in case of non-response weighting adjustments on cross-sectional surveys is considered in Kim and Kim (2007). To the best of our knowledge, and despite the interest for applications, variance estimation accounting for non-response for panel surveys has not been treated in the literature, with the exception of Zhou and Kim (2012).

Zhou and Kim (2012) consider the estimation of a mean for a panel survey, in case of monotone non-response. Instead of using the propensity score adjusted estimator, Zhou and Kim (2012) define an optimal propensity score estimator. It is obtained by noting that for any variable of interest observed before time t , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFr0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8piea0lXxcrpe0db9Wqpepic9qr=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG0bGaaiilaaaa@3375@ the estimator produced at time t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFr0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8piea0lXxcrpe0db9Wqpepic9qr=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG0baaaa@32C5@ differs from the estimator obtained at the date when the variable was observed, which is based on a larger sample. Adjusting on these differences by means of some form of calibration leads to the estimator proposed by Zhou and Kim (2012). It makes full use of the information collected at previous times, and it is therefore expected to be more efficient than the propensity score adjusted estimator. However, a panel survey may include a large number of variables of interest observed at several times, and calibrating on a too large number of variables may lead to estimators whose performances are worsened (Silva and Skinner, 1997). A careful modeling exercise seems therefore necessary before applying the optimal estimator of Zhou and Kim (2012). In this work, we rather focus on the propensity score adjusted estimator, which is popular in practice.

Zhou and Kim (2012) also consider variance estimation for their optimal estimator, under the so-called reverse framework of Fay (1992). By viewing the sample obtained at time t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFr0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8piea0lXxcrpe0db9Wqpepic9qr=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG0baaaa@32C5@ as the result of a two-phase process, the first phase being associated to the original sampling design and the second phase to the successive non-response steps, it is assumed under the reverse framework that these two phases may be reversed. This requires the two-phase process to be strongly invariant as defined by Beaumont and Haziza (2016). In this paper, we propose a general variance estimator for the propensity score adjusted estimator, for which the strong invariance assumption is not needed. We also extend this variance estimator to account for estimation of complex parameters, possibly with calibrated weights, and to cover longitudinal estimators. In each case, a simplified conservative variance estimator, which may be easier to compute for secondary users, is also proposed.

The paper is organized as follows. In Section 2, we first define the notation. A parametric model is then postulated, leading to estimated response probabilities and to a reweighted estimator. A variance estimator is then derived by following the approach in Kim and Kim (2007), and a simplified version is also proposed. They are illustrated in the particular case of the logistic regression model. The proposed variance estimator is extended to cover calibrated estimators and complex parameters in Section 3. Longitudinal estimation is discussed in Section 4, and the proposed variance estimator is used to cover such cases. The variance estimators are compared in Section 5 through a simulation study, and an illustration on the ELFE data is proposed in Section 6. We draw some conclusions in Section 7.


Date modified: