 |
|
Survey steps >
Scope and purpose
Sampling is the selection of a set of units from a survey population.
This set of units is referred to as the sample. The choice of
sampling method has a direct impact on data quality. It is influenced
by many factors, including the desired level of precision and detail of
the information to be produced, the availability of appropriate sampling
frames, the availability of suitable auxiliary variables for stratification
and sample selection, the estimation methods that will be used and the
available budgets.
Principles
Probability sampling is used to select a sample from the survey
population. The intention is to gather useful information from the sampled
units to allow inferences about the survey population. Probability sampling
implies a probabilistic selection of units from the frame in such a way
that all survey population units have known and positive inclusion
probabilities. Sample size is determined by the required precision
and available budget for observing the selected units. The probability
distribution that governs the sample selection, along with the stages
and units of sampling, the stratification, and so on, are collectively
called the sampling design or sample design. A combination
of sampling design and estimation method (see section on Estimation)
is chosen so that the resulting estimates attain the best possible precision
under the given budget, or so as to incur the lowest possible cost for
a fixed precision. Information collected for sampled units may, where
appropriate, be supplemented at the estimation stage, with auxiliary information
from other sources than the survey itself, (such as administrative records
and census projections) to improve the precision of the estimates. The
choice of sampling design will take into account the availability of such
auxiliary information. These concepts are discussed in Särndal, Swensson
and Wretman (1992) and Tillé (2001).
Guidelines
- Stratification consists of dividing the population
into subsets (called strata) within each of which an independent
sample is selected. The choice of strata is determined based on the
objective of the survey, the distribution characteristics of the variable
of interest, and the desired precision of the estimates. Most surveys
are used to produce estimates for various domains of interest
(e.g., provinces). If feasible, take this into account in the design
by stratifying appropriately (e.g., by province). Otherwise, it will
be necessary to consider special methods at the estimation stage to
produce estimates for these domains (see section on Estimation).
To achieve statistical efficiency, create strata in such a way that
each stratum contains units that are as homogeneous as possible with
respect to the information requested in the survey. For longitudinal
surveys, choose stratification variables that correspond to characteristics
that are stable through time.
- For highly skewed populations, create a stratum of large units
to be included in the survey with certainty. These large units would
normally account for a significant part of the estimates of the population
totals.
- Sometimes the information needed to stratify the population is not
available on the frame. In such cases, a two-phase sampling
scheme may be used, whereby a large sample is selected in the first
phase to obtain the required stratification information. This first
sample is then stratified and in the second phase, a subsample is selected
from each stratum within the first sample. Consider the cost of sampling
at each phase, the availability of the information required at each
phase, and the gain in precision obtained by stratifying the first-phase
sample.
- In practice, particularly in case of area frames, it is sometimes
difficult or not cost-effective to select or inconvenient to directly
select and contact the units that will report the requested information.
In such cases, a two-stage sampling scheme may be used by first
selecting clusters (called primary sampling units) of reporting
units, and then subsampling within each of the selected primary sampling
units to obtain a sample of the reporting units. Budgetary or other
constraints may necessitate more than two stages. Determine how many
stages of sampling are needed and which sampling units are appropriate
at each stage. For each possible type of unit, consider the availability
of a suitable frame of such units at each stage or the possibility of
creating such a frame for the survey, ease of contact and of data collection/measurement,
the quality of the data provided by the unit, and the cost of collection.
- When determining sample size, take into account the required levels
of precision needed for the survey estimates, the type of design and
estimator to be used, the availability of auxiliary information, budgetary
constraints, as well as both sampling factors (e.g., clustering, stratification)
and nonsampling factors (e.g., nonresponse, presence of out-of-scope
units, attrition in longitudinal surveys). For periodic surveys, take
into account expected births and deaths of units within the changing
survey population.
- It is important to remember that most surveys produce estimates for
many different variables, and optimizing the sample for one particular
variable may have detrimental effects on other important variables.
Handle this problem by first identifying the most important variables
and then using this subset of variables to determine the sampling strategy
to be adopted, which often requires a compromise between optimum strategies
for the variables in the subset.
- In determining sample allocation and size for stratified samples,
account for expected rates of misclassification of units and other deficiencies
on the frame. If not properly considered at the sampling stage, survey
estimates will not be as precise as planned. Address this problem at
the estimation stage (see section on Estimation).
- Conduct studies to evaluate alternative sampling methods, stratification
options and allocation possibilities. The usefulness of these studies
depends on the availability and vintage of data used to conduct the
studies, whether from previous censuses, surveys or administrative data
and their relation to the variables of importance to the survey.
- At the implementation stage, compare the size and characteristics
of the actual sample to what was expected. Compare the precision of
the estimates to the planned objectives.
- For periodic surveys that use designs in which the sample size
grows as the population increases, it is often appropriate to develop
a method to keep the sample size and therefore collection costs, stable.
The precision of survey estimates is usually influenced more
by the total sample size than by the sampling fraction (ratio
of the sample size to the population size).
- For periodic surveys, make the design as flexible as possible
to deal with future changes, such as increases or decreases in sample
size, restratification, resampling and updating of selection probabilities.
If estimates are required for specified domains of interest (e.g., subprovincial
estimates), form the strata by combining small stable units related
to the identified domains (e.g., small geographical areas), if possible.
Future changes in definitions of the strata would then be easier to
accommodate.
- For periodic surveys, if efficient estimates of change are required
or if response burden is a concern, use a rotation sampling
scheme that replaces part of the sample in each period. The choice of
the rotation rate will be a compromise between the precision required
for the estimates of change, and the response burden on the reporting
units. Lowering the rotation rate will increase the precision of the
estimates of change, but may lower the response rate over time. A low
rotation rate has the additional benefit of reducing costs if the first
contact is substantially more expensive than subsequent contacts.
- For periodic surveys, develop procedures to monitor the quality
of the sample design over time. Set up an update strategy for selective
redesign of strata that have suffered serious deterioration.
- For longitudinal panel surveys, determine the length of the
panel (its duration of time in the sample) by balancing the need for
duration data versus attrition and conditioning effects. Use a design
with overlapping panels (i.e., with overlapping time span) when there
is a need to produce cross-sectional estimates along with the longitudinal
ones.
- Use generalized sample selection software instead of tailor-made
systems. One such system is the Generalized Sampling System (GSAM) developed
by Statistics Canada. GSAM is especially useful for managing sample
selection and rotation for periodic surveys. Another option is the software
MICROSTRATE developed by Eurostat to control sample overlap. By using
generalized systems, one can expect fewer programming errors, as well
as some reduction in development costs and time.
References
Bethel, J. (1989). Sample allocation in multivariate surveys. Survey
Methodology, 15, 47-57.
Cochran, W.G. (1977). Sampling Techniques. Wiley,
New York.
Gambino, J.G., Singh, M.P., Dufour, J., Kennedy, B. and Lindeyer, J.
(1998). Methodology of the Canadian Labour Force Survey.
Statistics Canada, Catalogue No. 71-526.
Hidiroglou, M.A., (1994). Sampling and estimation for establishment
surveys: stumbling blocks and progress. Proceedings of the Section
on Survey Research Methods, American Statistical Association,
153- 162.
Hidiroglou, M.A, and Srinath, K.P. (1993). Problems associated with
designing sub annual business surveys. Journal of Economic Statistics,
11, 397-405.
Kalton, G. and Citro, C.F. (1993). Panel surveys: adding the fourth
dimension. Survey Methodology, 19, 205-215.
Kish, L. (1965). Survey Sampling. Wiley, New York.
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model
Assisted Survey Sampling. Springer-Verlag, New York.
Tillé, Y. (2001). Théorie des sondages –
Échantillonnage et estimation en populations finies.
Dunod, Paris.
|