Survey design

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

Results

All (266)

All (266) (10 to 20 of 266 results)

11. A simulated annealing algorithm for joint stratification and sample allocation
Articles and reports: 12-001-X202200100010
Description:
This study combines simulated annealing with delta evaluation to solve the joint stratification and sample allocation problem. In this problem, atomic strata are partitioned into mutually exclusive and collectively exhaustive strata. Each partition of atomic strata is a possible solution to the stratification problem, the quality of which is measured by its cost. The Bell number of possible solutions is enormous, for even a moderate number of atomic strata, and an additional layer of complexity is added with the evaluation time of each solution. Many larger scale combinatorial optimisation problems cannot be solved to optimality, because the search for an optimum solution requires a prohibitive amount of computation time. A number of local search heuristic algorithms have been designed for this problem but these can become trapped in local minima preventing any further improvements. We add, to the existing suite of local search algorithms, a simulated annealing algorithm that allows for an escape from local minima and uses delta evaluation to exploit the similarity between consecutive solutions, and thereby reduces the evaluation time. We compared the simulated annealing algorithm with two recent algorithms. In both cases, the simulated annealing algorithm attained a solution of comparable quality in considerably less computation time.

Release date: 2022-06-21
12. Multiple-frame surveys for a multiple-data-source world
Articles and reports: 12-001-X202100200008
Description:
Multiple-frame surveys, in which independent probability samples are selected from each of Q sampling frames, have long been used to improve coverage, to reduce costs, or to increase sample sizes for subpopulations of interest. Much of the theory has been developed assuming that (1) the union of the frames covers the population of interest, (2) a full-response probability sample is selected from each frame, (3) the variables of interest are measured in each sample with no measurement error, and (4) sufficient information exists to account for frame overlap when computing estimates. After reviewing design, estimation, and calibration for traditional multiple-frame surveys, I consider modifications of the assumptions that allow a multiple-frame structure to serve as an organizing principle for other data combination methods such as mass imputation, sample matching, small area estimation, and capture-recapture estimation. Finally, I discuss how results from multiple-frame survey research can be used when designing and evaluating data collection systems that integrate multiple sources of data.
Release date: 2022-01-06
13. Growing Regression Trees that Use Sampling Frame Covariates to Explore Response Burden for Use in Survey Design Archived
Articles and reports: 11-522-X202100100024
Description: The Economic Directorate of the U.S. Census Bureau is developing coordinated design and sample selection procedures for the Annual Integrated Economic Survey. The unified sample will replace the directorate’s existing practice of independently developing sampling frames and sampling procedures for a suite of separate annual surveys, which optimizes sample design features at the cost of increased response burden. Size attributes of business populations, e.g., revenues and employment, are highly skewed. A high percentage of companies operate in more than one industry. Therefore, many companies are sampled into multiple surveys compounding the response burden, especially for “medium sized” companies.
This component of response burden is reduced by selecting a single coordinated sample but will not be completely alleviated. Response burden is a function of several factors, including (1) questionnaire length and complexity, (2) accessibility of data, (3) expected number of repeated measures, and (4) frequency of collection. The sample design can have profound effects on the third and fourth factors. To help inform decisions about the integrated sample design, we use regression trees to identify covariates from the sampling frame that are related to response burden. Using historic frame and response data from four independently sampled surveys, we test a variety of algorithms, then grow regression trees that explain relationships between expected levels of response burden (as measured by response rate) and frame covariates common to more than one survey. We validate initial findings by cross-validation, examining results over time. Finally, we make recommendations on how to incorporate our robust findings into the coordinated sample design.
Release date: 2021-10-29
14. Physician experiences during the COVID-19 pandemic in the United States: Adapting an annual survey to assess pandemic-related challenges Archived
Articles and reports: 11-522-X202100100007
Description: The National Center for Health Statistics (NCHS) annually administers the National Ambulatory Medical Care Survey (NAMCS) to assess practice characteristics and ambulatory care provided by office-based physicians in the United States, including interviews with sampled physicians. After the onset of the COVID-19 pandemic, NCHS adapted NAMCS methodology to assess the impacts of COVID-19 on office-based physicians, including: shortages of personal protective equipment; COVID-19 testing in physician offices; providers testing positive for COVID-19; and telemedicine use during the pandemic. This paper describes challenges and opportunities in administering the 2020 NAMCS and presents key findings regarding physician experiences during the COVID-19 pandemic.
Key Words: National Ambulatory Medical Care Survey (NAMCS); Office-based physicians; Telemedicine; Personal protective equipment.
Release date: 2021-10-22
15. Harnessing Natural Language Processing and Machine Learning to Enhance Identification of Opioid-involved Health Outcomes in the National Hospital Care Survey Archived
Articles and reports: 11-522-X202100100016
Description: To build data capacity and address the U.S. opioid public health emergency, the National Center for Health Statistics received funding for two projects. The projects involve development of algorithms that use all available structured and unstructured data submitted for the 2016 National Hospital Care Survey (NHCS) to enhance identification of opioid-involvement and the presence of co-occurring disorders (coexistence of a substance use disorder and a mental health issue). A description of the algorithm development process is provided, and lessons learned from integrating data science methods like natural language processing to produce official statistics are presented. Efforts to make the algorithms and analytic datafiles accessible to researchers are also discussed.
Key Words: Opioids; Co-Occurring Disorders; Data Science; Natural Language Processing; Hospital Care
Release date: 2021-10-22
16. A method to find an efficient and robust sampling strategy under model uncertainty
Articles and reports: 12-001-X202100100002
Description:
We consider the problem of deciding on sampling strategy, in particular sampling design. We propose a risk measure, whose minimizing value guides the choice. The method makes use of a superpopulation model and takes into account uncertainty about its parameters through a prior distribution. The method is illustrated with a real dataset, yielding satisfactory results. As a baseline, we use the strategy that couples probability proportional-to-size sampling with the difference estimator, as it is known to be optimal when the superpopulation model is fully known. We show that, even under moderate misspecifications of the model, this strategy is not robust and can be outperformed by some alternatives.
Release date: 2021-06-24
17. Probability-proportional-to-size ranked-set sampling from stratified populations
Articles and reports: 12-001-X202000200001
Description:
This paper constructs a probability-proportional-to-size (PPS) ranked-set sample from a stratified population. A PPS-ranked-set sample partitions the units in a PPS sample into groups of similar observations. The construction of similar groups relies on relative positions (ranks) of units in small comparison sets. Hence, the ranks induce more structure (stratification) in the sample in addition to the data structure created by unequal selection probabilities in a PPS sample. This added data structure makes the PPS-ranked-set sample more informative then a PPS-sample. The stratified PPS-ranked-set sample is constructed by selecting a PPS-ranked-set sample from each stratum population. The paper constructs unbiased estimators for the population mean, total and their variances. The new sampling design is applied to apple production data to estimate the total apple production in Turkey.
Release date: 2020-12-15
18. Local polynomial estimation for a small area mean under informative sampling
Articles and reports: 12-001-X202000100002
Description:
Model-based methods are required to estimate small area parameters of interest, such as totals and means, when traditional direct estimation methods cannot provide adequate precision. Unit level and area level models are the most commonly used ones in practice. In the case of the unit level model, efficient model-based estimators can be obtained if the sample design is such that the sample and population models coincide: that is, the sampling design is non-informative for the model. If on the other hand, the sampling design is informative for the model, the selection probabilities will be related to the variable of interest, even after conditioning on the available auxiliary data. This will imply that the population model no longer holds for the sample. Pfeffermann and Sverchkov (2007) used the relationships between the population and sample distribution of the study variable to obtain approximately unbiased semi-parametric predictors of the area means under informative sampling schemes. Their procedure is valid for both sampled and non-sampled areas.
Release date: 2020-06-30
19. Considering interviewer and design effects when planning sample sizes
Articles and reports: 12-001-X202000100005
Description:
Selecting the right sample size is central to ensure the quality of a survey. The state of the art is to account for complex sampling designs by calculating effective sample sizes. These effective sample sizes are determined using the design effect of central variables of interest. However, in face-to-face surveys empirical estimates of design effects are often suspected to be conflated with the impact of the interviewers. This typically leads to an over-estimation of design effects and consequently risks misallocating resources towards a higher sample size instead of using more interviewers or improving measurement accuracy. Therefore, we propose a corrected design effect that separates the interviewer effect from the effects of the sampling design on the sampling variance. The ability to estimate the corrected design effect is tested using a simulation study. In this respect, we address disentangling cluster and interviewer variance. Corrected design effects are estimated for data from the European Social Survey (ESS) round 6 and compared with conventional design effect estimates. Furthermore, we show that for some countries in the ESS round 6 the estimates of conventional design effect are indeed strongly inflated by interviewer effects.
Release date: 2020-06-30
20. Robust variance estimators for generalized regression estimators in cluster samples
Articles and reports: 12-001-X201900300001
Description:
Standard linearization estimators of the variance of the general regression estimator are often too small, leading to confidence intervals that do not cover at the desired rate. Hat matrix adjustments can be used in two-stage sampling that help remedy this problem. We present theory for several new variance estimators and compare them to standard estimators in a series of simulations. The proposed estimators correct negative biases and improve confidence interval coverage rates in a variety of situations that mirror ones that are met in practice.
Release date: 2019-12-17

Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (266)

Analysis (266) (20 to 30 of 266 results)

21. Cost optimal sampling for the integrated observation of different populations
Articles and reports: 12-001-X201900300004
Description:
Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.
Release date: 2019-12-17
22. A grouping genetic algorithm for joint stratification and sample allocation designs
Articles and reports: 12-001-X201900300007
Description:
Finding the optimal stratification and sample size in univariate and multivariate sample design is hard when the population frame is large. There are alternative ways of modelling and solving this problem, and one of the most natural uses genetic algorithms (GA) combined with the Bethel-Chromy evaluation algorithm. The GA iteratively searches for the minimum sample size necessary to meet precision constraints in partitionings of atomic strata created by the Cartesian product of auxiliary variables. We point out a drawback with classical GAs when applied to the grouping problem, and propose a new GA approach using “grouping” genetic operators instead of traditional operators. Experiments show a significant improvement in solution quality for similar computational effort.
Release date: 2019-12-17
23. On combining independent probability samples
Articles and reports: 12-001-X201900200003
Description:
Merging available sources of information is becoming increasingly important for improving estimates of population characteristics in a variety of fields. In presence of several independent probability samples from a finite population we investigate options for a combined estimator of the population total, based on either a linear combination of the separate estimators or on the combined sample approach. A linear combination estimator based on estimated variances can be biased as the separate estimators of the population total can be highly correlated to their respective variance estimators. We illustrate the possibility to use the combined sample to estimate the variances of the separate estimators, which results in general pooled variance estimators. These pooled variance estimators use all available information and have potential to significantly reduce bias of a linear combination of separate estimators.
Release date: 2019-06-27
24. An optimisation algorithm applied to the one-dimensional stratification problem
Articles and reports: 12-001-X201900200006
Description:
This paper presents a new algorithm to solve the one-dimensional optimal stratification problem, which reduces to just determining stratum boundaries. When the number of strata H and the total sample size n are fixed, the stratum boundaries are obtained by minimizing the variance of the estimator of a total for the stratification variable. This algorithm uses the Biased Random Key Genetic Algorithm (BRKGA) metaheuristic to search for the optimal solution. This metaheuristic has been shown to produce good quality solutions for many optimization problems in modest computing times. The algorithm is implemented in the R package stratbr available from CRAN (de Moura Brito, do Nascimento Silva and da Veiga, 2017a). Numerical results are provided for a set of 27 populations, enabling comparison of the new algorithm with some competing approaches available in the literature. The algorithm outperforms simpler approximation-based approaches as well as a couple of other optimization-based approaches. It also matches the performance of the best available optimization-based approach due to Kozak (2004). Its main advantage over Kozak’s approach is the coupling of the optimal stratification with the optimal allocation proposed by de Moura Brito, do Nascimento Silva, Silva Semaan and Maculan (2015), thus ensuring that if the stratification bounds obtained achieve the global optimal, then the overall solution will be the global optimum for the stratification bounds and sample allocation.
Release date: 2019-06-27
25. An alternative way of estimating a cumulative logistic model with complex survey data
Articles and reports: 12-001-X201900200007
Description:
When fitting an ordered categorical variable with L > 2 levels to a set of covariates onto complex survey data, it is common to assume that the elements of the population fit a simple cumulative logistic regression model (proportional-odds logistic-regression model). This means the probability that the categorical variable is at or below some level is a binary logistic function of the model covariates. Moreover, except for the intercept, the values of the logistic-regression parameters are the same at each level. The conventional “design-based” method used for fitting the proportional-odds model is based on pseudo-maximum likelihood. We compare estimates computed using pseudo-maximum likelihood with those computed by assuming an alternative design-sensitive robust model-based framework. We show with a simple numerical example how estimates using the two approaches can differ. The alternative approach is easily extended to fit a general cumulative logistic model, in which the parallel-lines assumption can fail. A test of that assumption easily follows.
Release date: 2019-06-27
26. Coordination of spatially balanced samples Archived
Articles and reports: 12-001-X201800254953
Description:
Sample coordination seeks to create a probabilistic dependence between the selection of two or more samples drawn from the same population or from overlapping populations. Positive coordination increases the expected sample overlap, while negative coordination decreases it. There are numerous applications for sample coordination with varying objectives. A spatially balanced sample is a sample that is well-spread in some space. Forcing a spread within the selected samples is a general and very efficient variance reduction technique for the Horvitz-Thompson estimator. The local pivotal method and the spatially correlated Poisson sampling are two general schemes for achieving well-spread samples. We aim to introduce coordination for these sampling methods based on the concept of permanent random numbers. The goal is to coordinate such samples while preserving spatial balance. The proposed methods are motivated by examples from forestry, environmental studies, and official statistics.
Release date: 2018-12-20
27. Using balanced sampling in creel surveys Archived
Articles and reports: 12-001-X201800254954
Description:
These last years, balanced sampling techniques have experienced a recrudescence of interest. They constrain the Horvitz Thompson estimators of the totals of auxiliary variables to be equal, at least approximately, to the corresponding true totals, to avoid the occurrence of bad samples. Several procedures are available to carry out balanced sampling; there is the cube method, see Deville and Tillé (2004), and an alternative, the rejective algorithm introduced by Hájek (1964). After a brief review of these sampling methods, motivated by the planning of an angler survey, we investigate using Monte Carlo simulations, the survey designs produced by these two sampling algorithms.
Release date: 2018-12-20
28. Optimizing a mixed allocation Archived
Articles and reports: 12-001-X201800254959
Description:
This article proposes a criterion for calculating the trade-off in so-called “mixed” allocations, which combine two classic allocations in sampling theory. In INSEE (National Institute of Statistics and Economic Studies) business surveys, it is common to use the arithmetic mean of a proportional allocation and a Neyman allocation (corresponding to a trade-off of 0.5). It is possible to obtain a trade-off value resulting in better properties for the estimators. This value belongs to a region that is obtained by solving an optimization program. Different methods for calculating the trade-off will be presented. An application for business surveys is presented, as well as a comparison with other usual trade-off allocations.
Release date: 2018-12-20
29. Model based inference using ranked set samples Archived
Articles and reports: 12-001-X201800154925
Description:
This paper develops statistical inference based on super population model in a finite population setting using ranked set samples (RSS). The samples are constructed without replacement. It is shown that the sample mean of RSS is model unbiased and has smaller mean square prediction error (MSPE) than the MSPE of a simple random sample mean. Using an unbiased estimator of MSPE, the paper also constructs a prediction confidence interval for the population mean. A small scale simulation study shows that estimator is as good as a simple random sample (SRS) estimator for poor ranking information. On the other hand it has higher efficiency than SRS estimator when the quality of ranking information is good, and the cost ratio of obtaining a single unit in RSS and SRS is not very high. Simulation study also indicates that coverage probabilities of prediction intervals are very close to the nominal coverage probabilities. Proposed inferential procedure is applied to a real data set.
Release date: 2018-06-21
30. Strategies for subsampling nonrespondents for economic programs Archived
Articles and reports: 12-001-X201800154929
Description:
The U.S. Census Bureau is investigating nonrespondent subsampling strategies for usage in the 2017 Economic Census. Design constraints include a mandated lower bound on the unit response rate, along with targeted industry-specific response rates. This paper presents research on allocation procedures for subsampling nonrespondents, conditional on the subsampling being systematic. We consider two approaches: (1) equal-probability sampling and (2) optimized allocation with constraints on unit response rates and sample size with the objective of selecting larger samples in industries that have initially lower response rates. We present a simulation study that examines the relative bias and mean squared error for the proposed allocations, assessing each procedure’s sensitivity to the size of the subsample, the response propensities, and the estimation procedure.
Release date: 2018-06-21

Reference (1)

Reference (1) ((1 result))

1. Content of the Survey of Labour and Income Dynamics Part B - Income and Wealth Content Archived
Surveys and statistical programs – Documentation: 75F0002M1992001
Description:
Starting in 1994, the Survey of Labour and Income Dynamics (SLID) will follow individuals and families for at least six years, tracking their labour market experiences, changes in income and family circumstances. An initial proposal for the content of SLID, entitled "Content of the Survey of Labour and Income Dynamics : Discussion Paper", was distributed in February 1992.
That paper served as a background document for consultation with and a review by interested users. The content underwent significant change during this process. Based upon the revised content, a large-scale test of SLID will be conducted in February and May 1993.
The present document outlines the income and wealth content to be tested in May 1993. This document is really a continuation of SLID Research Paper Series 92-01A, which outlines the demographic and labour content used in the January /February 1993 test.
Release date: 2008-02-29

Report a problem or mistake on this page

Date modified:: 2024-06-24