12-539 Data Quality Guidelines

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

Survey steps >

Sampling

Scope and purpose

Sampling is the selection of a set of units from a survey population. This set of units is referred to as the sample. The choice of sampling method has a direct impact on data quality. It is influenced by many factors, including the desired level of precision and detail of the information to be produced, the availability of appropriate sampling frames, the availability of suitable auxiliary variables for stratification and sample selection, the estimation methods that will be used and the available budgets.

Principles

Probability sampling is used to select a sample from the survey population. The intention is to gather useful information from the sampled units to allow inferences about the survey population. Probability sampling implies a probabilistic selection of units from the frame in such a way that all survey population units have known and positive inclusion probabilities. Sample size is determined by the required precision and available budget for observing the selected units. The probability distribution that governs the sample selection, along with the stages and units of sampling, the stratification, and so on, are collectively called the sampling design or sample design. A combination of sampling design and estimation method (see section on Estimation) is chosen so that the resulting estimates attain the best possible precision under the given budget, or so as to incur the lowest possible cost for a fixed precision. Information collected for sampled units may, where appropriate, be supplemented at the estimation stage, with auxiliary information from other sources than the survey itself, (such as administrative records and census projections) to improve the precision of the estimates. The choice of sampling design will take into account the availability of such auxiliary information. These concepts are discussed in Särndal, Swensson and Wretman (1992) and Tillé (2001).

Guidelines

Stratification consists of dividing the population into subsets (called strata) within each of which an independent sample is selected. The choice of strata is determined based on the objective of the survey, the distribution characteristics of the variable of interest, and the desired precision of the estimates. Most surveys are used to produce estimates for various domains of interest (e.g., provinces). If feasible, take this into account in the design by stratifying appropriately (e.g., by province). Otherwise, it will be necessary to consider special methods at the estimation stage to produce estimates for these domains (see section on Estimation). To achieve statistical efficiency, create strata in such a way that each stratum contains units that are as homogeneous as possible with respect to the information requested in the survey. For longitudinal surveys, choose stratification variables that correspond to characteristics that are stable through time.
For highly skewed populations, create a stratum of large units to be included in the survey with certainty. These large units would normally account for a significant part of the estimates of the population totals.
Sometimes the information needed to stratify the population is not available on the frame. In such cases, a two-phase sampling scheme may be used, whereby a large sample is selected in the first phase to obtain the required stratification information. This first sample is then stratified and in the second phase, a subsample is selected from each stratum within the first sample. Consider the cost of sampling at each phase, the availability of the information required at each phase, and the gain in precision obtained by stratifying the first-phase sample.
In practice, particularly in case of area frames, it is sometimes difficult or not cost-effective to select or inconvenient to directly select and contact the units that will report the requested information. In such cases, a two-stage sampling scheme may be used by first selecting clusters (called primary sampling units) of reporting units, and then subsampling within each of the selected primary sampling units to obtain a sample of the reporting units. Budgetary or other constraints may necessitate more than two stages. Determine how many stages of sampling are needed and which sampling units are appropriate at each stage. For each possible type of unit, consider the availability of a suitable frame of such units at each stage or the possibility of creating such a frame for the survey, ease of contact and of data collection/measurement, the quality of the data provided by the unit, and the cost of collection.
When determining sample size, take into account the required levels of precision needed for the survey estimates, the type of design and estimator to be used, the availability of auxiliary information, budgetary constraints, as well as both sampling factors (e.g., clustering, stratification) and nonsampling factors (e.g., nonresponse, presence of out-of-scope units, attrition in longitudinal surveys). For periodic surveys, take into account expected births and deaths of units within the changing survey population.
It is important to remember that most surveys produce estimates for many different variables, and optimizing the sample for one particular variable may have detrimental effects on other important variables. Handle this problem by first identifying the most important variables and then using this subset of variables to determine the sampling strategy to be adopted, which often requires a compromise between optimum strategies for the variables in the subset.
In determining sample allocation and size for stratified samples, account for expected rates of misclassification of units and other deficiencies on the frame. If not properly considered at the sampling stage, survey estimates will not be as precise as planned. Address this problem at the estimation stage (see section on Estimation).
Conduct studies to evaluate alternative sampling methods, stratification options and allocation possibilities. The usefulness of these studies depends on the availability and vintage of data used to conduct the studies, whether from previous censuses, surveys or administrative data and their relation to the variables of importance to the survey.
At the implementation stage, compare the size and characteristics of the actual sample to what was expected. Compare the precision of the estimates to the planned objectives.
For periodic surveys that use designs in which the sample size grows as the population increases, it is often appropriate to develop a method to keep the sample size and therefore collection costs, stable. The precision of survey estimates is usually influenced more by the total sample size than by the sampling fraction (ratio of the sample size to the population size).
For periodic surveys, make the design as flexible as possible to deal with future changes, such as increases or decreases in sample size, restratification, resampling and updating of selection probabilities. If estimates are required for specified domains of interest (e.g., subprovincial estimates), form the strata by combining small stable units related to the identified domains (e.g., small geographical areas), if possible. Future changes in definitions of the strata would then be easier to accommodate.
For periodic surveys, if efficient estimates of change are required or if response burden is a concern, use a rotation sampling scheme that replaces part of the sample in each period. The choice of the rotation rate will be a compromise between the precision required for the estimates of change, and the response burden on the reporting units. Lowering the rotation rate will increase the precision of the estimates of change, but may lower the response rate over time. A low rotation rate has the additional benefit of reducing costs if the first contact is substantially more expensive than subsequent contacts.
For periodic surveys, develop procedures to monitor the quality of the sample design over time. Set up an update strategy for selective redesign of strata that have suffered serious deterioration.
For longitudinal panel surveys, determine the length of the panel (its duration of time in the sample) by balancing the need for duration data versus attrition and conditioning effects. Use a design with overlapping panels (i.e., with overlapping time span) when there is a need to produce cross-sectional estimates along with the longitudinal ones.
Use generalized sample selection software instead of tailor-made systems. One such system is the Generalized Sampling System (GSAM) developed by Statistics Canada. GSAM is especially useful for managing sample selection and rotation for periodic surveys. Another option is the software MICROSTRATE developed by Eurostat to control sample overlap. By using generalized systems, one can expect fewer programming errors, as well as some reduction in development costs and time.

Top of Page

References

Bethel, J. (1989). Sample allocation in multivariate surveys. Survey Methodology, 15, 47-57.

Cochran, W.G. (1977). Sampling Techniques. Wiley, New York.

Gambino, J.G., Singh, M.P., Dufour, J., Kennedy, B. and Lindeyer, J. (1998). Methodology of the Canadian Labour Force Survey. Statistics Canada, Catalogue No. 71-526.

Hidiroglou, M.A., (1994). Sampling and estimation for establishment surveys: stumbling blocks and progress. Proceedings of the Section on Survey Research Methods, American Statistical Association, 153- 162.

Hidiroglou, M.A, and Srinath, K.P. (1993). Problems associated with designing sub annual business surveys. Journal of Economic Statistics, 11, 397-405.

Kalton, G. and Citro, C.F. (1993). Panel surveys: adding the fourth dimension. Survey Methodology, 19, 205-215.

Kish, L. (1965). Survey Sampling. Wiley, New York.

Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer-Verlag, New York.

Tillé, Y. (2001). Théorie des sondages – Échantillonnage et estimation en populations finies. Dunod, Paris.

Home \| Search \| Contact Us \| Français
Date Modified: 2014-04-10	Important Notices