# Variance estimation under monotone non-response for a panel survey Section 7. Conclusion

In this paper, we considered variance estimation accounting for weighting adjustments in panel surveys. We proposed both an approximately unbiased variance estimator and a simplified variance estimator for estimators of totals, complex parameters and measures of change, which covers most cases that may be encountered in practice. Our simulation results indicate that the proposed variance estimator performs well in all cases considered. The simplified variance estimator tends to overestimate the variance of the expansion estimator for totals, and to overestimate the variance for calibrated estimators of totals when the calibration variables lack of explanatory power for the variable of interest. However, the simplified variance estimator performs well for the estimation of ratios and change in totals with calibrated weights, even if the calibration model is not appropriate for the study variable.

The assumption of independent response behaviour is usually not tenable for multi-stage surveys, since units within clusters tend to be correlated with respect to the response behaviour. In this context, estimation of response probabilities based upon conditional logistic regression in the context of correlated responses has been studied by Skinner and D’Arrigo (2011), see also Kim, Kwon and Park (2016). Extending the present work in the context of correlated response behaviour is a challenging problem for further research.

## Acknowledgements

We thank the Editors, an Associate Editor and the referees for useful comments and suggestions which led to an improvement of the paper.

## Appendix

### Estimation of the variance due to non-response for Response Homogeneity Groups

We consider the model of Response Homogeneity Groups introduced in Section 2.5. Recall that this model may be summarized as follows: at each time $\delta =1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}t,$ the sub-sample ${s}_{\delta -1}$ is partitioned into $C\left(\delta -1\right)$ groups ${s}_{\delta -1}^{c},\text{\hspace{0.17em}}c=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}C\left(\delta -1\right).$ The response probabilities are assumed to be constant within the groups.

This model is equivalent to the logistic regression model in (2.18), with

${z}_{i}^{\delta }={\left[1\left\{i\in {s}_{\delta -1}^{1}\right\},\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}1\left\{i\in {s}_{\delta -1}^{C\left(\delta -1\right)}\right\}\right]}^{\top }\text{​}.\text{ }\text{ }\text{ }\text{ }\text{ }\left(\text{A}.1\right)$

The equation (2.2) leads to the estimated response probabilities

${\stackrel{^}{p}}_{i}^{\delta }=\frac{{\sum }_{i\in {s}_{\delta -1}^{c}}{k}_{i}^{\delta }{r}_{i}^{\delta }}{{\sum }_{i\in {s}_{\delta -1}^{c}}{k}_{i}^{\delta }}\text{ }\text{ }\text{for}\text{ }\text{ }i\in {s}_{\delta -1}^{c}.\text{ }\text{ }\text{ }\text{ }\text{ }\left(\text{A}.2\right)$

We first consider the case when the reweighted estimator is computed at time $t=1.$ In the estimator of the variance due to non-response given in (2.21), the vector ${\stackrel{^}{\gamma }}_{1}^{1}$ simplifies as

${\stackrel{^}{\gamma }}_{1}^{1}={\left(\frac{{\sum }_{i\in {s}_{1}\cap {s}_{0}^{1}}\frac{{y}_{i1}}{{\pi }_{i}}}{{\stackrel{^}{p}}_{1}^{1}{\sum }_{i\in {s}_{1}\cap {s}_{0}^{1}}{k}_{i}^{1}},\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}\frac{{\sum }_{i\in {s}_{1}\cap {s}_{0}^{C\left(0\right)}}\frac{{y}_{i1}}{{\pi }_{i}}}{{\stackrel{^}{p}}_{C\left(0\right)}^{1}{\sum }_{i\in {s}_{1}\cap {s}_{0}^{C\left(0\right)}}{k}_{i}^{1}}\right)}^{\top }.\text{ }\text{ }\text{ }\text{ }\text{ }\left(\text{A}.3\right)$

After some algebra, the variance estimator in (2.21) may be rewritten as

${\stackrel{^}{V}}_{1}^{\text{nr}}\left({\stackrel{^}{Y}}_{1}\right)=\sum _{c=1}^{C\left(0\right)}\frac{\left(1-{\stackrel{^}{p}}_{c}^{1}\right)}{{\left({\stackrel{^}{p}}_{c}^{1}\right)}^{2}}\sum _{i\in {s}_{1}\cap {s}_{0}^{c}}{\left(\frac{{y}_{i1}}{{\pi }_{i}}-{k}_{i}^{1}\frac{{\sum }_{j\in {s}_{1}\cap {s}_{0}^{c}}\frac{{y}_{j1}}{{\pi }_{j}}}{{\sum }_{j\in {s}_{1}\cap {s}_{0}^{c}}{k}_{j}^{1}}\right)}^{2}.\text{ }\text{ }\text{ }\text{ }\text{ }\left(\text{A}.4\right)$

We now consider the case when the reweighted estimator is computed at time $t=2.$ We focus on the simpler case when the same system of RHGs is kept over time. In the estimator of the variance due to non-response given in (2.22), the vectors ${\stackrel{^}{\gamma }}_{2}^{1}$ and ${\stackrel{^}{\gamma }}_{2}^{2}$ simplify as

${\stackrel{^}{\gamma }}_{2}^{1}={\left(\frac{{\sum }_{i\in {s}_{2}\cap {s}_{1}^{1}}\frac{{y}_{i2}}{{\pi }_{i}}}{{\stackrel{^}{p}}_{1}^{1}{\sum }_{i\in {s}_{2}\cap {s}_{1}^{1}}{k}_{i}^{1}},\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}\frac{{\sum }_{i\in {s}_{2}\cap {s}_{1}^{C\left(0\right)}}\frac{{y}_{i2}}{{\pi }_{i}}}{{\stackrel{^}{p}}_{C\left(0\right)}^{1}{\sum }_{i\in {s}_{2}\cap {s}_{1}^{C\left(0\right)}}{k}_{i}^{1}}\right)}^{\top }\text{​},\text{ }\text{ }\text{ }\text{ }\text{ }\left(\text{A}.5\right)$

${\stackrel{^}{\gamma }}_{2}^{2}={\left(\frac{{\sum }_{i\in {s}_{2}\cap {s}_{1}^{1}}\frac{{y}_{i2}}{{\pi }_{i}}}{{\stackrel{^}{p}}_{1}^{1}{\stackrel{^}{p}}_{1}^{2}{\sum }_{i\in {s}_{2}\cap {s}_{1}^{1}}{k}_{i}^{2}},\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}\frac{{\sum }_{i\in {s}_{2}\cap {s}_{1}^{C\left(0\right)}}\frac{{y}_{i2}}{{\pi }_{i}}}{{\stackrel{^}{p}}_{C\left(0\right)}^{1}{\stackrel{^}{p}}_{C\left(0\right)}^{2}{\sum }_{i\in {s}_{2}\cap {s}_{1}^{C\left(0\right)}}{k}_{i}^{2}}\right)}^{\top }.\text{ }\text{ }\text{ }\text{ }\text{ }\left(\text{A}.6\right)$

After some algebra, the variance estimator in (2.22) may be rewritten as

$\begin{array}{ll}{\stackrel{^}{V}}_{2}^{\text{nr}}\left({\stackrel{^}{Y}}_{2}\right)\hfill & =\sum _{c=1}^{C\left(0\right)}\frac{\left(1-{\stackrel{^}{p}}_{c}^{1}\right)}{{\stackrel{^}{p}}_{c}^{2}}\sum _{i\in {s}_{2}\cap {s}_{1}^{c}}{\left(\frac{{y}_{i2}}{{\pi }_{i}{\stackrel{^}{p}}_{c}^{1}}-{k}_{i}^{1}\frac{{\sum }_{j\in {s}_{2}\cap {s}_{1}^{c}}\frac{{y}_{j2}}{{\pi }_{j}}}{{\sum }_{j\in {s}_{2}\cap {s}_{1}^{c}}{k}_{j}^{1}}\right)}^{2}\hfill \\ \hfill & \text{\hspace{0.17em}}\text{\hspace{0.17em}}+\sum _{c=1}^{C\left(0\right)}\left(1-{\stackrel{^}{p}}_{c}^{2}\right)\sum _{i\in {s}_{2}\cap {s}_{1}^{c}}{\left(\frac{{y}_{i2}}{{\pi }_{i}{\stackrel{^}{p}}_{c}^{1}{\stackrel{^}{p}}_{c}^{2}}-{k}_{i}^{2}\frac{{\sum }_{j\in {s}_{2}\cap {s}_{1}^{c}}\frac{{y}_{j2}}{{\pi }_{j}}}{{\sum }_{j\in {s}_{2}\cap {s}_{1}^{c}}{k}_{j}^{2}}\right)}^{2}\text{​}.\text{ }\text{ }\text{ }\text{ }\text{ }\left(\text{A}.7\right)\hfill \end{array}$

If we further assume that ${k}_{i}^{\delta }$ is constant over times $\delta =1,\text{\hspace{0.17em}}2,$ and may thus be rewritten as ${k}_{i},$ the expression in (A.7) simplifies as

${\stackrel{^}{V}}_{2}^{\text{nr}}\left({\stackrel{^}{Y}}_{2}\right)=\sum _{c=1}^{C\left(0\right)}\frac{\left(1-{\stackrel{^}{p}}_{c}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}2}\right)}{{\left({\stackrel{^}{p}}_{c}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}2}\right)}^{2}}\sum _{i\in {s}_{2}\cap {s}_{1}^{c}}{\left(\frac{{y}_{i2}}{{\pi }_{i}}-{k}_{i}\frac{{\sum }_{j\in {s}_{2}\cap {s}_{1}^{c}}\frac{{y}_{j2}}{{\pi }_{j}}}{{\sum }_{j\in {s}_{2}\cap {s}_{1}^{c}}{k}_{j}}\right)}^{2}.\text{ }\text{ }\text{ }\text{ }\text{ }\left(\text{A}.8\right)$

with ${\stackrel{^}{p}}_{c}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}2}={\prod }_{\delta =1}^{2}\text{\hspace{0.17em}}{\stackrel{^}{p}}_{c}^{\delta }$ for $c=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}C\left(0\right).$ This simplification of the variance estimator can be extended to the reweighted estimator at time $t.$ Assuming that the RHGs are kept over time, and that ${k}_{i}^{\delta }={k}_{i}$ for any $\delta =1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}t,$ the variance estimator in (2.12) may be written as

${\stackrel{^}{V}}_{t}^{\text{nr}}\left({\stackrel{^}{Y}}_{t}\right)=\sum _{c=1}^{C\left(0\right)}\frac{\left(1-{\stackrel{^}{p}}_{c}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}\right)}{{\left({\stackrel{^}{p}}_{c}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}\right)}^{2}}\sum _{i\in {s}_{t}\cap {s}_{t-1}^{c}}{\left(\frac{{y}_{it}}{{\pi }_{i}}-{k}_{i}\frac{{\sum }_{j\in {s}_{t}\cap {s}_{t-1}^{c}}\frac{{y}_{jt}}{{\pi }_{j}}}{{\sum }_{j\in {s}_{t}\cap {s}_{t-1}^{c}}{k}_{j}}\right)}^{2}\text{ }\text{ }\text{ }\text{ }\text{ }\left(\text{A}.9\right)$

with ${\stackrel{^}{p}}_{c}^{1\text{\hspace{0.17em}}\to \text{\hspace{0.17em}}t}={\prod }_{\delta =1}^{t}\text{\hspace{0.17em}}{\stackrel{^}{p}}_{c}^{\delta }$ for $c=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}C\left(0\right).$

## References

Beaumont, J.-F. (2005). Calibrated imputation in surveys under a quasimodel-assisted approach. Journal of the Royal Statistical Society, Series B, 67, 445-458.

Beaumont, J.-F., and Haziza, D. (2016). A note on the concept of invariance in two-phase sampling designs. Survey Methodology, 42, 2, 319-323. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2016002/article/14662-eng.pdf.

Berger, Y. (2004). Variance estimation for measures of change in probability sampling. Canadian Journal of Statistics, 32, 4, 451-467.

Caron, N., and Ravalet, P. (2000). Estimation dans les enquêtes répétées : application à l’enquête emploi en continu. Technical report INSEE, Paris.

Chauvet, G., and Goga, C. (2018). Linearization versus bootstrap for variance estimation of the change between Gini indexes. Survey Methodology, 44, 1, 17-42. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2018001/article/54926-eng.pdf.

Clarke, P., and Tate, P. (2002). An application of non-ignorable non-response models for gross flows estimation in the British labour force survey. Australian & New Zealand Journal of Statistics, 4, 413-425.

Deville, J.-C., and Särndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376-382.

Ekholm, A., and Laaksonen, S. (1991). Weighting via response modeling in the finnish household budget survey. Journal of Official Statistics, 7, 325-327.

Fay, R. (1992). When are inferences from multiple imputation valid? Proceedings of the Survey Research Methods Section, American Statistical Association, 81, 1, 227-232.

Fuller, W., and An, A. (1998). Regression adjustment for non-response. Journal of the Indian Society of Agricultural Statistics, 51, 331-342.

Fuller, W.A., Loughin, M.M. and Baker, H.D. (1994). Regression weighting in the presence of nonresponse with application to the 1987-1988 Nationwide Food Consumption Survey. Survey Methodology, 20, 1, 75-85. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/1994001/article/14429-eng.pdf.

Goga, C., Deville, J.-C. and Ruiz-Gazen, A. (2009). Composite estimation and linearization method for two-sample survey data. Biometrika, 96, 691-709.

Hawkes, D., and Plewis, I. (2009). Modelling nonresponse in the national child development study. Journal of the Royal Statistical Society, Series A, 169, 479-491.

Juillard, H., Chauvet, G. and Ruiz-Gazen, A. (2017). Estimation under cross-classified sampling with application to a childhood survey. Journal of the American Statistical Association, 112, 850-858.

Kalton, G. (2009). Design for surveys over time. Handbook of Statistics, 29, 89-108.

Kim, J.K., and Kim, J.J. (2007). Nonresponse weighting adjustment using estimated response probability. Canadian Journal of Statistics, 35, 501-514.

Kim, J.K., Kwon, Y. and Park, M. (2016). Calibrated propensity score method for survey nonresponse in cluster sampling. Biometrika, 103, 461-473.

Laaksonen, S. (2007). Weighting for two-phase surveyed data. Survey Methodology, 33, 2, 121-130. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2007002/article/10489-eng.pdf.

Laaksonen, S., and Chambers, R.L. (2006). Survey estimation under informative nonresponse with follow-up. Journal of Official Statistics, 22, 81-95.

Laniel, N. (1988). Variances for a rotating sample from a changing population. Proceedings of the Business and Economics Statistics Section, American Statistical Association, 246-250.

Laurie, H., Smith, R. and Scott, L. (1999). Strategies for reducing nonresponse in a longitudinal panel survey. Journal of Official Statistics, 15, 269-282.

Lynn, P. (2009). Methods for longitudinal surveys. Methodology of Longitudinal Surveys, 1-19.

Nordberg, L. (2000). On variance estimation for measures of change when samples are coordinated by the use of permanent random numbers. Journal of Official Statistics, 16, 363-378.

Pirus, C., Bois, C., Dufourg, M., Lanoë, J., Vandentorren, S., Leridon, H. and the Elfe team (2010). Constructing a cohort: Experience with the French Elfe project. Population, 65, 637-670.

Qualité, L., and Tillé, Y. (2008). Variance estimation of changes in repeated surveys and its application to the Swiss survey of value added. Survey Methodology, 34, 2, 173-181. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/2008002/article/10758-eng.pdf.

Rendtel, U., and Harms, T. (2009). Weighting and calibration for household panels. Methodology of Longitudinal Surveys, 265-286.

Rizzo, L., Kalton, G. and Brick, J.M. (1996). A comparison of some weighting adjustment methods for panel nonresponse. Survey Methodology, 22, 1, 43-53. Paper available at https://www150.statcan.gc.ca/n1/pub/12-001-x/1996001/article/14386-eng.pdf.

Silva, P., and Skinner, C. (1997). Cross-classiffed sampling: Some estimation theory. Variable Selection for Regression Estimation in Finite Populations, 23, 23-32.

Skinner, C. (2015). Cross-classiffed sampling: Some estimation theory. Statistics & Probability Letters, 104, 163-168.

Skinner, C., and D’Arrigo, J. (2011). Inverse probability weighting for clustered non-response. Biometrika, 98, 953-966.

Skinner, C., and Vieira, M. (2005). Design effects in the analysis of longitudinal survey data. S3RI Methdology Working Papers, M05/13. Southampton, UK: Southampton Statistical Sciences Research Institute.

Slud, E.V., and Bailey, L. (2010). Evaluation and selection of models for attrition nonresponse adjustment. Journal of Official Statistics, 26, 1-18.

Tam, S. (1984). On covariance from overlapping samples. The American Statistician, 38, 1-18.

Vandecasteele, L., and Debels, A. (2007). Attrition in panel data: The effectiveness of weighting. European Sociological Review, 23, 1, 81-97.

Zhou, M., and Kim, J. (2012). An effcient method of estimation for longitudinal surveys with monotone missing data. Biometrika, 99, 631-648.

﻿

Is something not working? Is there information outdated? Can't find what you're looking for?