5. Conclusion
Guillaume Chauvet and Guylène Tandeau de Marsac
Previous
We examined the Hartley
(1962), Kalton and Anderson (1986) and Bankier (1986) estimators to pool the
samples resulting from two survey waves. More
particularly, we studied the case where the first sample represents the entire
population (completely representative sample), while the second represents only
a part (partially representative sample). Within
the framework considered in the simulations (also see the Appendix for a more
general framework), using the partially representative sample did not improve
accuracy: if its size increases, the
accuracy of the estimators in the Hartley class remains stable or improves
slightly, while the accuracy of the Kalton and Anderson and Bankier estimators
is worsened. Hartley’s optimal estimator
itself, although more complex to calculate, offers accuracy that is only
slightly improved as compared to the classic Horvitz-Thompson estimator
calculated on the fully representative sample. Although
our simulation study is limited, the results suggest that the estimator should
be chosen carefully when there are multiple survey frames, and that a simple
estimator is sometimes preferable, even if it uses only part of the information
collected.
Acknowledgements
The authors would
like to thank an associate editor and referee for their careful reading and
comments, which helped to significantly improve the article, and David Haziza
for the useful discussions.
Appendix
A1. Comparison of Hartley’s
optimal estimator and the Horvitz-Thompson estimator
Let us take the framework and notations from Section
4: samples and are selected
using a two-stage frame with common first stage selection. Stratified simple random sampling is used at
the first stage, and simple random sampling in each primary sampling unit at
the second stage. The sampling frame corresponds to
the entire population, while the sampling frame covers only
part of the population.
With Hartley’s optimal estimator, the formula (3.6)
gives
After some calculation, we get
with , and .
The Horvitz-Thompson estimator based on the single
sample and Hartley’s
optimal estimator agree if the coefficient is equal to , which is the case if . This condition
will be verified in particular if in (A.1) the terms between the brackets agree
for each primary sampling unit . We get
therefore if
Let us suppose that the mean value of is
approximately the same in the frames and for each primary
sampling unit, i.e. that . Then, the
condition (A.2) will be verified approximately if is close to , with
In summary, the Horvitz-Thompson estimator based on
the single sample and Hartley’s
optimal estimator will be close if within each primary sampling unit : (a)
there is not much difference in the mean value of between the two
bases, and (b) the variable has low
dispersion within . In the
simulations, the condition (a) is approximately met since the distribution of
individuals between the sampling frames and is completely
random; the condition (b) is approximately met with values of varying from to for population
1, and from to for population
2.
References
Bankier, M.D.
(1986). Estimators based on several stratified samples with applications to multiple
frame surveys. Journal of the American
Statistical Association, 81, p. 1074-1079.
Bourdalle, G., Christine, M. and
Wilms, L. (2000). Échantillons maître et emploi. Série
INSEE Méthodes, 21, p. 139-173.
Hansen, M.H. and Hurwitz, W.N.
(1943). On the theory of sampling from finite populations. Annals of Mathematical Statistics, 14,
p. 333-362.
Hartley, H.O.
(1962). Multiple frame surveys. Proceedings
of the Social Statistics Section, American Statistical Association, p. 203-206.
Horvitz, D.G. and Thompson, D.J.
(1952). A generalization of sampling without replacement from a
finite universe. Journal of the American
Statistical Association, 47, p. 663-685.
Kalton, G. and Anderson, D.W. (1986).
Sampling rare populations. Journal of the Royal Statistical Society, A, 149, p. 65-82.
Lavallée, P. (2002). Le sondage indirect, ou la méthode
généralisée du partage des poids. Éditions de l'Université de Bruxelles (Belgium)
and Éditions Ellipses (France).
Lavallée, P.
(2007). Indirect sampling. New York:
Springer.
Lohr, S.L.
(2007). Recent developments in multiple frame surveys. Proceedings of the Survey Research Methods Section, American
Statistical Association, 3257-3264.
Lohr, S.L.
(2009). Multiple frame surveys. In Handbook
of Statistics, Sample Surveys: Design, Methods and Applications, Eds., D.
Pfeffermann and C.R. Rao. Amsterdam: North Holland, Vol. 29A, p. 71-88.
Lohr, S.L. (2011). Alternative survey
sample designs: Sampling with multiple overlapping frames. Survey Methodology, Vol.37 no.2, p. 197-213.
Mecatti, F. (2007). A single frame
multiplicity estimator for multiple frame surveys. Survey Methodology, Vol.33 no.2, p. 151-157.
Narain, R.D.
(1951). On sampling without replacement with varying probabilities. Journal of the Indian Society of
Agricultural Statistics, 3, p. 169-175.
Rao, J.N.K. and Wu, C. (2010). Pseudo-empirical
likelihood inference for dual frame surveys. Journal of the American Statistical Association, 105, p. 1494-1503.
Saigo, H.
(2010). Comparing four bootstrap methods for stratified three-stage sampling. Journal of Official Statistics, Vol. 26, No. 1, 2010, p. 193-207.
Previous