# Variance estimation under monotone non-response for a panel survey

Section 4. Longitudinal estimators

We may be interested in a change in parameters, such as

$$\Delta \left(u\to t\right)\mathrm{=}Y\left(t\right)-Y\left(u\right)\mathrm{,}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(4.1)$$

the difference between the totals of a variable of interest measured at two different times $u\mathrm{<}t.$ Since the variable ${y}_{iu}$ is measured on all sub-samples ${s}_{{u}^{\prime}}$ for ${u}^{\prime}\mathrm{=}u\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}t,$ there are several possible estimators for $\Delta \left(u\to t\right).$ For ${u}^{\prime}\mathrm{=}u\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}t,$ we denote by

$${\widehat{\Delta}}_{{u}^{\prime}t}\left(u\to t\right)\mathrm{=}{\displaystyle \sum _{i\in {s}_{t}}}\text{\hspace{0.17em}}\frac{{y}_{it}}{{\pi}_{i}{\widehat{p}}_{i}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}}-{\displaystyle \sum _{i\in {s}_{{u}^{\prime}}}}\frac{{y}_{iu}}{{\pi}_{i}{\widehat{p}}_{i}^{1\to {u}^{\prime}}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(4.2)$$

the estimator which makes use of ${s}_{t}$ for the estimation of $Y\left(t\right),$ and of ${s}_{{u}^{\prime}}$ for the estimation of $Y\left(u\right).$ The case ${u}^{\prime}\mathrm{=}u$ corresponds to the estimation of $Y\left(u\right)$ on the largest available sub-sample, ${s}_{u}.$ The case ${u}^{\prime}\mathrm{=}t$ corresponds to the estimation of $Y\left(u\right)$ and $Y\left(t\right)$ on the common sub-sample ${s}_{t}.$

In the context of full response, several authors have recommended the estimator ${\widehat{\Delta}}_{tt}\left(u\to t\right)$ which makes use of the common sample only, if the variables ${y}_{ui}$ and ${y}_{ti}$ are strongly positively correlated; see Caron and Ravalet (2000), Qualité and Tillé (2008), Goga, Deville and Ruiz-Gazen (2009), Chauvet and Goga (2018). In our context, this choice may be heuristically justified as follows. For ${u}^{\prime}\mathrm{<}t,$ and by conditioning on the sub-sample ${s}_{{u}^{\prime}}\text{},$ we obtain

$$V\left\{{\widehat{\Delta}}_{{u}^{\prime}t}\left(u\to t\right)\right\}\simeq V\left\{{\displaystyle \sum _{i\in {s}_{{u}^{\prime}}}}\frac{{y}_{it}-{y}_{iu}}{{\pi}_{i}{\widehat{p}}_{i}^{1\to {u}^{\prime}}}\right\}+EV\left\{{\displaystyle \sum _{i\in {s}_{t}}}\frac{{y}_{it}}{{\pi}_{i}{\widehat{p}}_{i}^{1\to t}}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{s}_{{u}^{\prime}}\right\}\mathrm{,}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(4.3)$$

$$V\left\{{\widehat{\Delta}}_{tt}\left(u\to t\right)\right\}\simeq V\left\{{\displaystyle \sum _{i\in {s}_{{u}^{\prime}}}}\frac{{y}_{it}-{y}_{iu}}{{\pi}_{i}{\widehat{p}}_{i}^{1\to {u}^{\prime}}}\right\}+EV\left\{{\displaystyle \sum _{i\in {s}_{t}}}\frac{{y}_{it}-{y}_{iu}}{{\pi}_{i}{\widehat{p}}_{i}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}}\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{s}_{{u}^{\prime}}\right\}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(4.4)$$

In equations (4.3) and (4.4), the first term in the right-hand side is identical. Since the variables ${y}_{iu}$ and ${y}_{it}$ are expected to be positively correlated, the difference ${y}_{it}-{y}_{iu}$ is expected to be smaller than ${y}_{it}.$ Therefore, the estimator ${\widehat{\Delta}}_{tt}\left(u\to t\right)$ based on the common sample is expected to be more efficient in terms of variance. The results of a small simulation study in Section 5.2 support this heuristic reasoning. Therefore, we focus only in this Section on the estimator ${\widehat{\Delta}}_{tt}\left(u\to t\right)$ for the estimation of $\Delta \left(u\to t\right).$ As pointed out by a Referee, and following the approach in Zhou and Kim (2012), we may obtain a gain in efficiency by using the full information on ${s}_{u},$ namely by calibrating the weights ${\left({\pi}_{i}{\widehat{p}}_{i}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}\right)}^{-1}$ on the estimator ${\widehat{Y}}_{u}.$

Replacing in (2.11) the variable ${y}_{it}$ with ${y}_{it}-{y}_{iu}$ yields the estimator of the variance due to the sampling design

$${\widehat{V}}_{t}^{p}\left\{\text{\hspace{0.17em}}{\widehat{\Delta}}_{tt}\left(u\to t\right)\right\}\mathrm{=}{\displaystyle \sum _{i\mathrm{,}\text{\hspace{0.17em}}j\in {s}_{t}}}\frac{{\Delta}_{ij}}{{\pi}_{ij}}\frac{1}{{\widehat{p}}_{ij}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}}\frac{\left({y}_{it}-{y}_{iu}\right)}{{\pi}_{i}}\text{\hspace{0.17em}}\frac{\left({y}_{jt}-{y}_{ju}\right)}{{\pi}_{j}}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(4.5)$$

Similarly, replacing in (2.12) the variable ${y}_{it}$ with ${y}_{it}-{y}_{iu}$ yields the estimator of the variance due to the non-response

$${\widehat{V}}_{t}^{\text{nr}}\left\{\text{\hspace{0.17em}}{\widehat{\Delta}}_{tt}\left(u\to t\right)\right\}\mathrm{=}{\displaystyle \sum _{\delta \mathrm{=1}}^{t}}\text{\hspace{0.17em}}{\displaystyle \sum _{i\in {s}_{t}}}\text{\hspace{0.17em}}\frac{{\widehat{p}}_{i}^{\delta}\left(1-{\widehat{p}}_{i}^{\delta}\right)}{{\widehat{p}}_{i}^{\delta \text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}}{\left(\frac{{y}_{it}-{y}_{iu}}{{\pi}_{i}{\widehat{p}}_{i}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}\delta}}-{k}_{i}^{\delta}{\left({\widehat{h}}_{i}^{\delta}\right)}^{\top}\text{}{\widehat{\gamma}}_{t\Delta}^{\delta}\right)}^{2}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(4.6)$$

with

$${\widehat{\gamma}}_{t\Delta}^{\delta}\mathrm{=}{\left\{{\displaystyle \sum _{i\in {s}_{t}}}\text{\hspace{0.17em}}{k}_{i}^{\delta}\frac{{\widehat{p}}_{i}^{\delta}\left(1-{\widehat{p}}_{i}^{\delta}\right)}{{\widehat{p}}_{i}^{\delta \text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}}{\widehat{h}}_{i}^{\delta}{\left({\widehat{h}}_{i}^{\delta}\right)}^{\top}\right\}}^{-1}{\displaystyle \sum _{i\in {s}_{t}}}\frac{1-{\widehat{p}}_{i}^{\delta}}{{\widehat{p}}_{i}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}}\text{\hspace{0.17em}}{\widehat{h}}_{i}^{\delta}\text{\hspace{0.17em}}\frac{{y}_{it}-{y}_{iu}}{{\pi}_{i}}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(4.7)$$

The global variance estimator for ${\widehat{\Delta}}_{tt}\left(u\to t\right)$ is

$${\widehat{V}}_{t}\left\{\text{\hspace{0.17em}}{\widehat{\Delta}}_{tt}\left(u\to t\right)\right\}\mathrm{=}{\widehat{V}}_{t}^{p}\left\{\text{\hspace{0.17em}}{\widehat{\Delta}}_{tt}\left(u\to t\right)\right\}+{\widehat{V}}_{t}^{\text{nr}}\left\{\text{\hspace{0.17em}}{\widehat{\Delta}}_{tt}\left(u\to t\right)\right\}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(4.8)$$

Variance estimation for measures of change is also considered in Berger (2004), Qualité and Tillé (2008), Goga et al. (2009), Chauvet and Goga (2018), among others.

The simplified estimator of the variance due to non-response is

$${\widehat{V}}_{t\mathrm{,}\text{\hspace{0.17em}}\text{simp}}^{\text{nr}}\left\{\text{\hspace{0.17em}}{\widehat{\Delta}}_{tt}\left(u\to t\right)\right\}\mathrm{=}{\displaystyle \sum _{i\in {s}_{t}}}\text{\hspace{0.17em}}\frac{1-{\widehat{p}}_{i}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}}{{\left({\widehat{p}}_{i}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}\right)}^{2}}{\left(\frac{{y}_{it}-{y}_{iu}}{{\pi}_{i}}\right)}^{2}\text{}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(4.9)$$

If the variables ${y}_{it}$ and ${y}_{iu}$ are strongly positively correlated, the bias of the simplified variance estimator is expected to be small.

- Date modified: