Browse by

7. Discussion

Cyril Favre Martinoz, David Haziza and Jean-François Beaumont

This paper outlined a proposed method for determining the threshold for winsorized estimators. This method has the advantage of being simple to apply in practice and can be used for sampling designs with unequal probabilities. We also proposed a calibration method that satisfies a consistency relation between the domain-level winsorized estimates and a population-level winsorized estimate. Although we applied the method in the case of winsorized estimators, it can be used with any type of robust estimator.

Acknowledgements

The authors are grateful to an associate editor and two reviewers for their comments and suggestions, which substantially improved the quality of this paper. David Haziza's research was funded by a grant from the Natural Sciences and Engineering Research Council of Canada.

Appendix

We want to show that there exists a solution to the equation

$- Δ (K) = \sum_{j \in S} a_{j} \max (0, d_{j} y_{j} - K) = \frac{{\hat{B}}_{min} + {\hat{B}}_{max}}{2} = \hat{t} - {\hat{t}}_{R}$

under the conditions $π_{i j} - π_{i} π_{j} \leq 0$ and $\frac{1}{2} ({\hat{B}}_{min} + {\hat{B}}_{max}) \geq 0.$

First, we arrange the units in order from the smallest value of $b_{i} = d_{i} y_{i}, i \in S,$ to the largest, so that unit 1 has the smallest value of $b_{i}$ and unit $n$ the largest value. We begin by considering the case of $\frac{1}{2} ({\hat{B}}_{min} + {\hat{B}}_{max}) = 0.$ We have to solve the equation $- Δ (K) = 0,$ and we can easily see that this equation is satisfied for all $K \geq b_{n} .$

We now turn to the case of $\frac{1}{2} ({\hat{B}}_{min} + {\hat{B}}_{max}) > 0.$ We note first that the function $- Δ (K)$ is continuous and piecewise linear for $0 \leq K \leq b_{n} .$ The pieces are defined by the intervals $[b_{j - 1}, b_{j} [, j = 1, ..., n,$ where $b_{0} = 0.$ We also note that $- Δ (0) = \sum_{j = m}^{n} a_{j} b_{j} > 0,$ where $m$ is the smallest index such that $b_{m} \geq 0.$ By the intermediate value theorem, there is a solution to equation (4.7) if we can show that

$- Δ (b_{n}) = 0 < \frac{1}{2} ({\hat{B}}_{min} + {\hat{B}}_{max}) \leq - Δ (0) = \sum_{j = m}^{n} a_{j} b_{j} . (A .1)$

The first inequality follows directly from the condition $\frac{1}{2} ({\hat{B}}_{min} + {\hat{B}}_{max}) > 0.$ To prove the second inequality, we first note that $\frac{1}{2} ({\hat{B}}_{min} + {\hat{B}}_{max}) \leq {\hat{B}}_{max} .$ If we use the estimator of the conditional bias (2.2) and the condition $π_{i j} - π_{i} π_{j} \leq 0,$ we observe that ${\hat{B}}_{max} \leq (d_{k} - 1) y_{k},$ index $k$ being associated with the unit that has the largest estimated conditional bias. For the Dalén-Tambay winsorized estimator, the last inequality can be rewritten as ${\hat{B}}_{max} \leq a_{k} b_{k} .$ It follows that $a_{k} b_{k} \leq - Δ (0) = \sum_{j = m}^{n} a_{j} b_{j},$ which completes the proof that there is a solution to equation (4.7). For the standard winsorized estimator, we can also easily show that ${\hat{B}}_{max} \leq a_{k} b_{k}$ and therefore that a solution exists. In addition, if the $y_{i}, i \in S,$ are all positive, the function $- Δ (K)$ is monotonically decreasing for $0 \leq K \leq b_{n}$ and the solution is unique.

To find the solution $K_{opt},$ we find the largest index $l$ such that $- Δ (b_{l}) \geq \frac{1}{2} ({\hat{B}}_{min} + {\hat{B}}_{max}),$ for $l \leq n .$ The solution can then be calculated by linear interpolation between points $b_{l}$ and $b_{l + 1};$ that is,

$K_{opt} = b_{l} \frac{Δ (b_{l + 1}) - Δ (K_{opt})}{Δ (b_{l + 1}) - Δ (b_{l})} + b_{l + 1} \frac{Δ (K_{opt}) - Δ (b_{l})}{Δ (b_{l + 1}) - Δ (b_{l})},$

where $Δ (K_{opt}) = - \frac{1}{2} ({\hat{B}}_{min} + {\hat{B}}_{max}) .$

References

Beaumont, J.-F., Haziza, D. and Ruiz-Gazen, A. (2013). A unified approach to robust estimation in finite population sampling. Biometrika, 100, 555-569.

Berger, Y.G. (1998). Rate of convergence for asymptotic variance of the Horvitz-Thompson estimator. Journal of Statistical Planning and Inference, 74, 149-168.

Clark, R.G. (1995). Winsorization methods in sample surveys. Masters Thesis, Department of Statistics, Australian National University.

Dalén, J. (1987). Practical estimators of a population total which reduce the impact of large observations. R and D Report. Statistics Sweden.

Datta, G.S., Gosh, M., Steorts, R. and Maple, J. (2011). Bayesian benchmarking with applications to small area estimation. Test, 20, 574-588.

Deville, J.-C., and Särndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376-382.

Fattorini, L. (2006). Applying the Horvitz-Thompson criterion in complex designs: A computer-intensive perspective for estimating inclusion probabilities. Biometrika, 93, 269-278.

Haziza, D., Mecatti, F. and Rao, J.N.K. (2008). Evaluation of some approximate variance estimators under the Rao-Sampford unequal probability sampling design. Metron, 66, 91-108.

Kokic, P.N., and Bell, P.A. (1994). Optimal Winsorizing cutoffs for a stratified finite population estimator. Journal of Official Statistics, 10, 419-435.

Moreno-Rebollo, J.L., Muñoz-Reyez, A.M. and Muñoz-Pichardo, J.M. (1999). Influence diagnostics in survey sampling: Conditional bias. Biometrika, 86, 923-928.

Moreno-Rebollo, J.L., Muñoz-Reyez, A.M., Jimenez-Gamero, M.D. and Muñoz-Pichardo, J. (2002). Influence diagnostics in survey sampling: Estimating the conditional bias. Metrika, 55, 209-214.

Rivest, L.-P. (1994). Statistical properties of Winsorized means for skewed distributions. Biometrika, 81, 373-383.

Rivest, L.-P., and Hidiroglou, M. (2004). Outlier treatment for disaggregated estimates. Proceedings of the Survey Research Methods Section, American Statistical Association, Alexandria, Virginia, 4248-4256.

Rivest, L.-P., and Hurtubise, D. (1995). On Searls' Winsorized mean for skewed populations. Survey Methodology, 21, 2, 107-116.

Tambay, J.-L. (1988). An integrated approach for the treatment of outliers in sub-annual surveys. Proceedings of the Survey Research Methods Section, American Statistical Association, Alexandria, Virginie, 229-234.

Thompson, M.E., and Wu, C. (2008). Simulation-based randomized systematic PPS sampling under substitution of units. Survey Methodology, 34, 1, 3-10.

You, Y., Rao, J.N.K. and Dick, P. (2004). Benchmarking hierarchical Bayes small area estimators in the Canadian census undercoverage estimation. Statistics in Transition, 6, 631-640.

Date modified:: 2015-11-27

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

7. Discussion

Acknowledgements

Appendix

References