# Variance estimation under monotone non-response for a panel survey

Section 5. A simulation study

In this section, several artificial populations are generated according to the model described in Section 5.1. In Section 5.2, we consider several estimators for a change between totals, which illustrates the heuristic reasoning in Section 4. A Monte Carlo experiment is presented in Section 5.3, and several variance estimators for estimating a total, a ratio or a parameter change are compared. The results from Tables 5.1 and 5.2 are readily reproducible using the R code provided in the Supplementary Material.

### 5.1 Simulation set-up

We consider seven populations of size 10,000, each containing three variables of interest ${y}_{i1},$ ${y}_{i2}$ and ${y}_{i3}$ observed at times $t\mathrm{=1,}\text{\hspace{0.17em}}2$ and 3, respectively. The variables of interest are generated according to the superpopulation model

$${y}_{i1}\mathrm{=}{\alpha}^{0}+{\alpha}^{a}{x}_{ai}+{\alpha}^{b}{x}_{bi}+\sigma {u}_{i1}\mathrm{,}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(5.1)$$

$${y}_{i2}\mathrm{=}\rho {y}_{i1}+\sigma {u}_{i2}\mathrm{,}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(5.2)$$

$${y}_{i3}\mathrm{=}\rho {y}_{i2}+\sigma {u}_{i3}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(5.3)$$

The auxiliary variables ${x}_{ai}$ and ${x}_{bi}$ are independently generated from a Gamma distribution with shape and scale parameters 2 and 1. Two auxiliary variables ${x}_{ci}$ and ${x}_{di},$ not related to the variables of interest, are generated similarly. The variables ${u}_{i1},$ ${u}_{i2}$ and ${u}_{i3}$ are independently generated according to a standard normal distribution. We use ${\alpha}^{0}\mathrm{=10},$ ${\alpha}^{a}\mathrm{=}{\alpha}^{b}\mathrm{=5}$ and $\sigma \mathrm{=10},$ which leads to a coefficient of determination $\left({R}^{2}\right)$ in model (5.1) approximately equal to 0.50. The parameter $\rho $ is set to 0, 0.2, 0.4, 0.6, 0.8, 1.0 and 1.2 for populations 1 to 7, respectively.

For each population, a simple random sample ${s}_{0}$ of size $n=$ 1,000 is selected. Three non-response phases are then successively simulated. At each phase $\delta \mathrm{=1,}\text{\hspace{0.17em}}\mathrm{2,}\text{\hspace{0.17em}}3,$ the sub-sample of respondents ${s}_{\delta}$ is obtained by Poisson sampling with a response probability ${p}_{i}^{\delta}$ for unit $i,$ defined as

$$\text{logit}\left({p}_{i}^{\delta}\right)\mathrm{=}{\beta}^{\delta 0}+{\beta}^{\delta a}{x}_{ai}+{\beta}^{\delta b}{x}_{bi}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(5.4)$$

We use ${\beta}^{\delta 0}\mathrm{=}-1$ at each phase $\delta \mathrm{=1,}\text{\hspace{0.17em}}\mathrm{2,}\text{\hspace{0.17em}}3.$ For $\delta \mathrm{=1},$ we use ${\beta}^{1a}\mathrm{=}{\beta}^{1b}=$ 0.60, which corresponds to an average response rate of 0.75. For $\delta \mathrm{=2,}\text{\hspace{0.17em}}3,$ we use ${\beta}^{\delta a}\mathrm{=}{\beta}^{\delta b}\mathrm{=}\text{\hspace{0.17em}}$ 0.75, which corresponds to an average response rate of 0.81. Inside each sub-sample ${s}_{\delta},$ the estimated response probabilities ${\widehat{p}}_{i}^{\delta}$ are obtained by means of an unweighted logistic regression.

### 5.2 Comparison of estimators for a difference of totals

In this section, we are interested in comparing the accuracy of two estimators for a difference of totals $\Delta \left(u\to t\right)$ for $u\mathrm{=1}$ and $t\mathrm{=2},$ for $u\mathrm{=1}$ and $t\mathrm{=3},$ and for $u\mathrm{=2}$ and $t\mathrm{=3.}$ We consider the estimator ${\widehat{\Delta}}_{ut}\left(u\to t\right),$ which makes use of the whole appropriate sub-samples for variables ${y}_{iu}$ and ${y}_{it},$ and the estimator ${\widehat{\Delta}}_{tt}\left(u\to t\right),$ which makes use of the common sub-sample only. These two estimators are compared through the relative difference (RD) of their variances, which are defined as follows:

$$\text{RD}\left(u\to t\right)\mathrm{=100}\times \frac{V\left\{\text{\hspace{0.17em}}{\widehat{\Delta}}_{ut}\left(u\to t\right)\right\}-V\left\{\text{\hspace{0.17em}}{\widehat{\Delta}}_{tt}\left(u\to t\right)\right\}}{V\left\{\text{\hspace{0.17em}}{\widehat{\Delta}}_{tt}\left(u\to t\right)\right\}}\mathrm{.}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(5.5)$$

The true variances are replaced by their Monte Carlo approximation, obtained by repeating $B=$ 100,000 times the sample selection and the non-response phases.

The results are presented in Table 5.1. A positive RD indicates that the use of the common sample only leads to a more accurate estimator. As could be expected, the RD increases in all cases with $\rho ,$ that is, when the correlation between ${y}_{it}$ and ${y}_{iu}$ increases. For $u\mathrm{=1}$ and $t\mathrm{=2},$ and for $u\mathrm{=2}$ and $t\mathrm{=3},$ the estimator ${\widehat{\Delta}}_{tt}\left(u\to t\right)$ is more accurate for $\rho $ greater than 0.6. For $u\mathrm{=1}$ and $t\mathrm{=3},$ ${\widehat{\Delta}}_{tt}\left(u\to t\right)$ is more accurate for $\rho $ greater than 0.8.

$\rho $ | $\text{RD}\left(1\to 2\right)$ | $\text{RD}\left(1\to 3\right)$ | $\text{RD}\left(2\to 3\right)$ |
---|---|---|---|

0.0 | -12 | -27 | -13 |

0.2 | -09 | -25 | -11 |

0.4 | -04 | -20 | -03 |

0.6 | -05 | -09 | 11 |

0.8 | 17 | 11 | 39 |

1.0 | 30 | 33 | 83 |

1.2 | 40 | 46 | 127 |

### 5.3 Performances of the variance estimators

In this section, we consider the artificial population 5 $\left(\rho \mathrm{=}\text{0}\text{.8}\right)$ generated as described in Section 5.1. The sample selection by means of simple random sampling of size $n=$ 1,000 and the three non-response phases are applied $B=$ 5,000 times. We are interested in evaluating the variance estimators and the simplified variance estimators, in case of estimating a total, a ratio or a change in totals.

As for the total $Y\left(t\right),$ we consider at each time $t\mathrm{=1,}\text{\hspace{0.17em}}\mathrm{2,}\text{\hspace{0.17em}}\mathrm{3,}$ three estimators. The estimator ${\widehat{Y}}_{t}$ makes use of the weights ${d}_{ti}={\pi}_{i}^{-1}{\left({\widehat{p}}_{i}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}\right)}^{-1}\text{}.$ The estimator ${\widehat{Y}}_{wt}$ makes use of the weights ${w}_{i},$ obtained by calibrating the weights ${d}_{ti}$ on the population size and on the totals of the auxiliary variables ${x}_{ai}$ and ${x}_{bi}.$ The estimator ${\widehat{Y}}_{\tilde{w}t}$ makes use of the weights ${\tilde{w}}_{i},$ obtained by calibrating the weights ${d}_{ti}$ on the population size and on the totals of the auxiliary variables ${x}_{ci}$ and ${x}_{di}.$ The working model is therefore well-specified for ${\widehat{Y}}_{wt},$ but not for ${\widehat{Y}}_{\tilde{w}t}.$ The proposed variance estimator for ${\widehat{Y}}_{t}$ is obtained from equation (2.16), and the simplified variance estimator is obtained by plugging in (2.16) the simplified variance estimator for non-response given in (2.17). The proposed variance estimators for ${\widehat{Y}}_{wt}$ and ${\widehat{Y}}_{\tilde{w}t}$ are obtained from equation (3.8), and the simplified variance estimators are obtained by plugging in (3.8) the simplified variance estimator for non-response given in (3.9).

We are also interested in estimating the ratio $R\left(t\right)\mathrm{=}Y\left(t\right)/Y\left(1\right)$ for $t\mathrm{=2,}\text{\hspace{0.17em}}3.$ At each time $t,$ we consider three estimators. The estimator ${\widehat{R}}_{t}$ makes use of the weights ${d}_{i}.$ The proposed variance estimator is obtained from equation (3.14), by using the estimated linearized variable ${u}_{it}\mathrm{=}{\left({\widehat{Y}}_{1}\right)}^{-1}\left({y}_{ti}-{\widehat{R}}_{t}{y}_{1i}\right).$ The simplified variance estimator is obtained by plugging in (3.14) the simplified variance estimator for non-response given in (3.15). The estimators ${\widehat{R}}_{wt}$ and ${\widehat{R}}_{\tilde{w}t}$ make use of the calibrated weights ${w}_{i}$ and ${\tilde{w}}_{i}.$ The proposed variance estimators are obtained from equation (3.21). The simplified variance estimators are obtained by plugging in (3.21) the simplified variance estimator for non-response given in (3.22).

Finally, we are interested in estimating the change in totals $\Delta \left(1\to t\right)$ for $t\mathrm{=2,}\text{\hspace{0.17em}}3.$ At each time $t,$ we consider three estimators. The estimator ${\widehat{\Delta}}_{tt}\left(1\to t\right)$ makes use of the weights ${d}_{i}.$ The proposed variance estimator is obtained from equation (4.8), and the simplified variance estimator is obtained by plugging in (4.8) the simplified variance estimator for non-response given in (4.9). The estimators ${\widehat{\Delta}}_{tt\mathrm{,}\text{\hspace{0.17em}}w}\left(1\to t\right)$ and ${\widehat{\Delta}}_{tt\mathrm{,}\text{\hspace{0.17em}}\tilde{w}}\left(1\to t\right)$ make use of the calibrated weights ${w}_{i}$ and ${\tilde{w}}_{i}.$ The proposed variance estimators are obtained from equation (4.8), by replacing ${y}_{it}-{y}_{iu}$ by the estimated residual for the weighted regression of ${y}_{it}-{y}_{iu}$ on the calibration variables. The simplified variance estimators are obtained by plugging in (4.8) the simplified variance estimator for non-response given in (4.9).

For a proposed variance estimator $\widehat{V},$ we computed the Monte Carlo Percent Relative Bias

$${\text{RB}}_{\text{mc}}\left(\widehat{V}\right)\mathrm{=100}\times \frac{{B}^{-1}{\displaystyle {\sum}_{b\mathrm{=1}}^{B}\text{\hspace{0.17em}}{\widehat{V}}^{\left(b\right)}}-V}{V}$$

where the global variance $V$ was approximated through an independent set of 100,000 simulations. To evaluate the contribution of some component ${\widehat{V}}_{a}$ into the variance estimator $\widehat{V},$ we computed the contribution (in percent)

$${\text{CONTR}}_{\text{mc}}\left({\widehat{V}}_{a}\right)\mathrm{=100}\times \frac{{\scriptscriptstyle \frac{1}{B}}{\displaystyle {\sum}_{b\mathrm{=1}}^{B}\text{\hspace{0.17em}}{\widehat{V}}_{a}^{\left(b\right)}}}{{\scriptscriptstyle \frac{1}{B}}{\displaystyle {\sum}_{b\mathrm{=1}}^{B}\text{\hspace{0.17em}}{\widehat{V}}^{\left(b\right)}}}\mathrm{.}$$

To evaluate the simplified variance estimator for the non-response ${\widehat{V}}_{\text{simp}}^{\text{nr}},$ we computed the Monte Carlo Percent Relative Bias

$${\text{RB}}_{\text{mc}}\left({\widehat{V}}_{\text{simp}}^{\text{nr}}\right)\mathrm{=100}\times \frac{{B}^{-1}{\displaystyle {\sum}_{b\mathrm{=1}}^{B}\text{\hspace{0.17em}}{\widehat{V}}_{\text{simp}}^{\left(b\right)}}-{V}^{\text{nr}}}{{V}^{\text{nr}}}\mathrm{,}$$

where the variance ${V}^{\text{nr}}$ due to non-response was approximated through an independent set of 100,000 simulations.

The simulation results are presented in Table 5.2. The proposed variance estimator is almost unbiased in all cases. As could be expected, the contribution of the variance due to the sampling design decreases with time, as the number of respondents decreases and as the variance due to non-response becomes larger. The simplified variance estimator is highly biased for the variance due to non-response in case of ${\widehat{Y}}_{t}.$ The bias decreases quickly with time, but remains large at time $t\mathrm{=3.}$ The simplified variance estimator is almost unbiased for a calibrated estimator when the working model is adequately specified, but is severely biased otherwise. This is consistent with our reasoning in Section 3.1. The simplified variance estimator is almost unbiased for the three estimators of the ratio, and for the calibrated estimators of the change in totals. In case of the non-calibrated estimator for the change in totals, the bias can be as high as 30%.

$t\mathrm{=1}$ | $t\mathrm{=2}$ | $t\mathrm{=3}$ | $t\mathrm{=1}$ | $t\mathrm{=2}$ | $t\mathrm{=3}$ | $t\mathrm{=1}$ | $t\mathrm{=2}$ | $t\mathrm{=3}$ | $t\mathrm{=1}$ | $t\mathrm{=2}$ | $t\mathrm{=3}$ | $t\mathrm{=1}$ | $t\mathrm{=2}$ | $t\mathrm{=3}$ | $t\mathrm{=1}$ | $t\mathrm{=2}$ | $t\mathrm{=3}$ | $t\mathrm{=1}$ | $t\mathrm{=2}$ | $t\mathrm{=3}$ | $t\mathrm{=1}$ | $t\mathrm{=2}$ | $t\mathrm{=3}$ | $t\mathrm{=1}$ | $t\mathrm{=2}$ | $t\mathrm{=3}$ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

${\widehat{Y}}_{t}$ | ${\widehat{Y}}_{wt}$ | ${\widehat{Y}}_{\tilde{w}t}$ | ${\widehat{R}}_{t}$ | ${\widehat{R}}_{wt}$ | ${\widehat{R}}_{\tilde{w}t}$ | ${\widehat{\Delta}}_{tt}\left(1\to t\right)$ | ${\widehat{\Delta}}_{tt\mathrm{,}\text{\hspace{0.17em}}w}\left(1\to t\right)$ | ${\widehat{\Delta}}_{tt\mathrm{,}\text{\hspace{0.17em}}\tilde{w}}\left(1\to t\right)$ | |||||||||||||||||||

${\text{RB}}_{\text{mc}}\left(\widehat{V}\right)$ | 0 | -1 | -2 | -1 | -1 | -2 | -1 | -1 | -3 | - | 0 | -2 | - | -1 | -2 | - | -1 | -2 | - | 0 | -2 | - | 0 | -2 | - | -1 | -3 |

${\text{CONTR}}_{\text{mc}}\left({\widehat{V}}_{t}^{p}\right)$ | 81 | 57 | 35 | 69 | 49 | 32 | 80 | 56 | 35 | - | 49 | 32 | - | 49 | 32 | - | 50 | 33 | - | 50 | 33 | - | 49 | 32 | - | 50 | 33 |

${\text{CONTR}}_{\text{mc}}\left({\widehat{V}}_{t}^{\text{nr}1}\right)$ | 19 | 19 | 13 | 31 | 22 | 15 | 20 | 18 | 13 | - | 22 | 15 | - | 22 | 15 | - | 22 | 15 | - | 22 | 14 | - | 22 | 15 | - | 22 | 14 |

${\text{CONTR}}_{\text{mc}}\left({\widehat{V}}_{t}^{\text{nr}2}\right)$ | - | 25 | 18 | - | 28 | 19 | - | 25 | 17 | - | 28 | 19 | - | 28 | 19 | - | 28 | 19 | - | 28 | 18 | - | 28 | 19 | - | 28 | 18 |

${\text{CONTR}}_{\text{mc}}\left({\widehat{V}}_{t}^{\text{nr}3}\right)$ | - | - | 34 | - | - | 34 | - | - | 34 | - | - | 34 | - | - | 34 | - | - | 34 | - | - | 34 | - | - | 34 | - | - | 34 |

${\text{RB}}_{\text{mc}}\left({\widehat{V}}_{t\mathrm{,}\text{\hspace{0.17em}}\text{simp}}^{\text{nr}}\right)$ | 559 | 188 | 80 | 0 | -1 | -2 | 83 | 34 | 15 | - | 0 | 0 | - | -1 | -2 | - | -1 | -1 | - | 19 | 30 | - | -1 | -2 | - | 3 | 5 |

## Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

- Date modified: