  Articles and reports: 12-001-X201100111447

    This paper introduces a R-package for the stratification of a survey population using a univariate stratification variable X and for the calculation of stratum sample sizes. Non iterative methods such as the cumulative root frequency method and the geometric stratum boundaries are implemented. Optimal designs, with stratum boundaries that minimize either the CV of the simple expansion estimator for a fixed sample size n or the n value for a fixed CV can be constructed. Two iterative algorithms are available to find the optimal stratum boundaries. The design can feature a user defined certainty stratum where all the units are sampled. Take-all and take-none strata can be included in the stratified design as they might lead to smaller sample sizes. The sample size calculations are based on the anticipated moments of the survey variable Y, given the stratification variable X. The package handles conditional distributions of Y given X that are either a heteroscedastic linear model, or a log-linear model. Stratum specific non-response can be accounted for in the design construction and in the sample size calculations.

    Release date: 2011-06-29

  Articles and reports: 12-001-X201100111448

    In two-phase sampling for stratification, the second-phase sample is selected by a stratified sample based on the information observed in the first-phase sample. We develop a replication-based bias adjusted variance estimator that extends the method of Kim, Navarro and Fuller (2006). The proposed method is also applicable when the first-phase sampling rate is not negligible and when second-phase sample selection is unequal probability Poisson sampling within each stratum. The proposed method can be extended to variance estimation for two-phase regression estimators. Results from a limited simulation study are presented.

    Release date: 2011-06-29

  Articles and reports: 12-001-X201100111449

    We analyze the statistical and economic efficiency of different designs of cluster surveys collected in two consecutive time periods, or waves. In an independent design, two cluster samples in two waves are taken independently from one another. In a cluster-panel design, the same clusters are used in both waves, but samples within clusters are taken independently in two time periods. In an observation-panel design, both clusters and observations are retained from one wave of data collection to another. By assuming a simple population structure, we derive design variances and costs of the surveys conducted according to these designs. We first consider a situation in which the interest lies in estimation of the change in the population mean between two time periods, and derive the optimal sample allocations for the three designs of interest. We then propose the utility maximization framework borrowed from microeconomics to illustrate a possible approach to the choice of the design that strives to optimize several variances simultaneously. Incorporating the contemporaneous means and their variances tends to shift the preferences from observation-panel towards simpler panel-cluster and independent designs if the panel mode of data collection is too expensive. We present numeric illustrations demonstrating how a survey designer may want to choose the efficient design given the population parameters and data collection cost.

    Release date: 2011-06-29

  Articles and reports: 12-001-X201000211382

    The size of the cell-phone-only population in the USA has increased rapidly in recent years and, correspondingly, researchers have begun to experiment with sampling and interviewing of cell-phone subscribers. We discuss statistical issues involved in the sampling design and estimation phases of cell-phone studies. This work is presented primarily in the context of a nonoverlapping dual-frame survey in which one frame and sample are employed for the landline population and a second frame and sample are employed for the cell-phone-only population. Additional considerations necessary for overlapping dual-frame surveys (where the cell-phone frame and sample include some of the landline population) are also discussed. We illustrate the methods using the design of the National Immunization Survey (NIS), which monitors the vaccination rates of children age 19-35 months and teens age 13-17 years. The NIS is a nationwide telephone survey, followed by a provider record check, conducted by the Centers for Disease Control and Prevention.

    Release date: 2010-12-21

  Articles and reports: 12-001-X201000211385

    In this short note, we show that simple random sampling without replacement and Bernoulli sampling have approximately the same entropy when the population size is large. An empirical example is given as an illustration.

    Release date: 2010-12-21

  Articles and reports: 12-001-X201000111243

    The 2003 National Assessment of Adult Literacy (NAAL) and the international Adult Literacy and Lifeskills (ALL) surveys each involved stratified multi-stage area sample designs. During the last stage, a household roster was constructed, the eligibility status of each individual was determined, and the selection procedure was invoked to randomly select one or two eligible persons within the household. The objective of this paper is to evaluate the within-household selection rules under a multi-stage design while improving the procedure in future literacy surveys. The analysis is based on the current US household size distribution and intracluster correlation coefficients using the adult literacy data. In our evaluation, several feasible household selection rules are studied, considering effects from clustering, differential sampling rates, cost per interview, and household burden. In doing so, an evaluation of within-household sampling under a two-stage design is extended to a four-stage design and some generalizations are made to multi-stage samples with different cost ratios.

    Release date: 2010-06-29

  Articles and reports: 12-001-X201000111249

    For many designs, there is a nonzero probability of selecting a sample that provides poor estimates for known quantities. Stratified random sampling reduces the set of such possible samples by fixing the sample size within each stratum. However, undesirable samples are still possible with stratification. Rejective sampling removes poor performing samples by only retaining a sample if specified functions of sample estimates are within a tolerance of known values. The resulting samples are often said to be balanced on the function of the variables used in the rejection procedure. We provide modifications to the rejection procedure of Fuller (2009a) that allow more flexibility on the rejection rules. Through simulation, we compare estimation properties of a rejective sampling procedure to those of cube sampling.

    Release date: 2010-06-29

  Articles and reports: 75F0002M2010002

    This report compares the aggregate income estimates as published by four different statistical programs. The System of National Accounts provides a portrait of economic activity at the macro economic level. The three other programs considered generate data from a micro-economic perspective: two are survey based (Census of Population and Survey of Labour and Income Dynamics) and the third derives all its results from administrative data (Annual Estimates for Census Families and Individuals). A review of the conceptual differences across the sources is followed by a discussion of coverage issues and processing discrepancies that might influence estimates. Aggregate income estimates with adjustments where possible to account for known conceptual differences are compared. Even allowing for statistical variability, some reconciliation issues remain. These are sometimes are explained by the use of different methodologies or data gathering instruments but they sometimes also remain unexplained.

    Release date: 2010-04-06

  Articles and reports: 12-001-X200900211036

    Surveys are frequently required to produce estimates for subpopulations, sometimes for a single subpopulation and sometimes for several subpopulations in addition to the total population. When membership of a rare subpopulation (or domain) can be determined from the sampling frame, selecting the required domain sample size is relatively straightforward. In this case the main issue is the extent of oversampling to employ when survey estimates are required for several domains and for the total population. Sampling and oversampling rare domains whose members cannot be identified in advance present a major challenge. A variety of methods has been used in this situation. In addition to large-scale screening, these methods include disproportionate stratified sampling, two-phase sampling, the use of multiple frames, multiplicity sampling, panel surveys, and the use of multi-purpose surveys. This paper illustrates the application of these methods in a range of social surveys.

    Release date: 2009-12-23

  Articles and reports: 12-001-X200900211037

    Randomized response strategies, which have originally been developed as statistical methods to reduce nonresponse as well as untruthful answering, can also be applied in the field of statistical disclosure control for public use microdata files. In this paper a standardization of randomized response techniques for the estimation of proportions of identifying or sensitive attributes is presented. The statistical properties of the standardized estimator are derived for general probability sampling. In order to analyse the effect of different choices of the method's implicit "design parameters" on the performance of the estimator we have to include measures of privacy protection in our considerations. These yield variance-optimum design parameters given a certain level of privacy protection. To this end the variables have to be classified into different categories of sensitivity. A real-data example applies the technique in a survey on academic cheating behaviour.

    Release date: 2009-12-23
  Articles and reports: 12-001-X197900254834
    Description: An alternative to the direct selection of sample is suggested, which while retaining the efficiency at the same level simplifies the selection and variance estimation processes in a wide variety of situations. If n* is the largest feasible pPS sample size that can be drawn from a given population of size N, then the proposed method entails selection of m (=N - n*) units using a pPS scheme and rejecting these units from the population such that the remainder is a pPS sample of n* units; the final sample of n units is then selected as a subsample from the remainder set. This method for selecting the pPS sample can be seen as an analogue of SRS where it is well known that the “unsampled” part of the population as well as any subsample from this part are also SRS from the entire population when SRS is the procedure used. The method is very practical for situations where m is less than the actual sample size n. Moreover, the method has the additional advantage in the context of continuing surveys, e.g. Canadian Labour Force Survey (LFS), where the number of primary sampling units (PSU’s) may have to be increased (or decreased) subsequent to the initial selection of the sample. The method also has advantages in the case of sample rotation. Main features of the proposed scheme and its limitations are given. Efficiency of the method is also evaluated empirically.
    Release date: 1979-12-15

  Articles and reports: 12-001-X197900254835
    Description: The problem considered in this paper is the estimation of various agricultural variables using a multiple frame approach. The list frame is completely contained within the area frame. The stratification for the list and area frames are based on different criteria. Overall, the multiple frame shows some gains in terms of variance over the area frame. However, a more careful analysis reveals problem areas associated with the list frame such as the method of stratification and the degeneration of list strata over time.
    Release date: 1979-12-15

  Articles and reports: 12-001-X197900100004
    Description: Let U = {1, 2, …, i, …, N} be a finite population of N identifiable units. A known “size measure” x_i is associated with unit i; i = 1, 2, ..., N. A sampling procedure for selecting a sample of size n (2 < n < N) with probability proportional to size (PPS) and without replacement (WOR) from the population is proposed. With this method, the inclusion probability is proportional to size (IPPS) for each unit in the population.
    Release date: 1979-06-15

  Articles and reports: 12-001-X197900100005
    Description: Approximate cutoff rules for stratifying a population into a take-all and take-some universe have been given by Dalenius (1950) and Glasser (1962). They expressed the cutoff value (that value which delineates the boundary of the take-all and take-some) as a function of the mean, the sampling weight and the population variance. Their cutoff values were derived on the assumption that a single random sample of size n was to be drawn without replacement from the population of size N.

    In the present context, exact and approximate cutoff rules have been worked out for a similar situation. Rather than providing the sample size of the sample, the precision (coefficient of variation) is given. Note that in many sampling situations, the sampler is given a set of objectives in terms of reliability and not sample size. The result is particularly useful for determining the take-all - take-some boundary for samples drawn from a known population. The procedure is also extended to ratio estimation.
    Release date: 1979-06-15

  Articles and reports: 12-001-X197800154832
    Description: This paper describes a survey design established to measure truck commodity flows in Peru. The article addresses the conceptual and operational features of the survey design as well as describing its elements and implementation techniques in the context of a pilot project. Finally, the paper illustrates how the results of this pilot might be used to design and implement a full-scale national survey.
    Release date: 1978-06-15

  Articles and reports: 12-001-X197500254824

    Madow [1968] has proposed a two-phase sampling scheme under which response bias can be eliminated from sample surveys by obtaining “true” values for a subsample of the original sample. Often in cases of Censuses or ongoing surveys, the subsample data are not used to correct the main survey estimates but to assess their reliability. The main purpose of this paper is to present methods by which reliability estimates can be obtained when true values can be determined for a subsample of units.

    Release date: 1975-12-15
Reference (1)

Reference (1) ((1 result))

  Surveys and statistical programs – Documentation: 75F0002M1992001

    Starting in 1994, the Survey of Labour and Income Dynamics (SLID) will follow individuals and families for at least six years, tracking their labour market experiences, changes in income and family circumstances. An initial proposal for the content of SLID, entitled "Content of the Survey of Labour and Income Dynamics : Discussion Paper", was distributed in February 1992.

    That paper served as a background document for consultation with and a review by interested users. The content underwent significant change during this process. Based upon the revised content, a large-scale test of SLID will be conducted in February and May 1993.

    The present document outlines the income and wealth content to be tested in May 1993. This document is really a continuation of SLID Research Paper Series 92-01A, which outlines the demographic and labour content used in the January /February 1993 test.

    Release date: 2008-02-29
