Sort Help
entries

Results

All (72)

All (72) (0 to 10 of 72 results)

  • Articles and reports: 12-001-X201400214089
    Description:

    This manuscript describes the use of multiple imputation to combine information from multiple surveys of the same underlying population. We use a newly developed method to generate synthetic populations nonparametrically using a finite population Bayesian bootstrap that automatically accounts for complex sample designs. We then analyze each synthetic population with standard complete-data software for simple random samples and obtain valid inference by combining the point and variance estimates using extensions of existing combining rules for synthetic data. We illustrate the approach by combining data from the 2006 National Health Interview Survey (NHIS) and the 2006 Medical Expenditure Panel Survey (MEPS).

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214090
    Description:

    When studying a finite population, it is sometimes necessary to select samples from several sampling frames in order to represent all individuals. Here we are interested in the scenario where two samples are selected using a two-stage design, with common first-stage selection. We apply the Hartley (1962), Bankier (1986) and Kalton and Anderson (1986) methods, and we show that these methods can be applied conditional on first-stage selection. We also compare the performance of several estimators as part of a simulation study. Our results suggest that the estimator should be chosen carefully when there are multiple sampling frames, and that a simple estimator is sometimes preferable, even if it uses only part of the information collected.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214091
    Description:

    Parametric fractional imputation (PFI), proposed by Kim (2011), is a tool for general purpose parameter estimation under missing data. We propose a fractional hot deck imputation (FHDI) which is more robust than PFI or multiple imputation. In the proposed method, the imputed values are chosen from the set of respondents and assigned proper fractional weights. The weights are then adjusted to meet certain calibration conditions, which makes the resulting FHDI estimator efficient. Two simulation studies are presented to compare the proposed method with existing methods.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214092
    Description:

    Survey methodologists have long studied the effects of interviewers on the variance of survey estimates. Statistical models including random interviewer effects are often fitted in such investigations, and research interest lies in the magnitude of the interviewer variance component. One question that might arise in a methodological investigation is whether or not different groups of interviewers (e.g., those with prior experience on a given survey vs. new hires, or CAPI interviewers vs. CATI interviewers) have significantly different variance components in these models. Significant differences may indicate a need for additional training in particular subgroups, or sub-optimal properties of different modes or interviewing styles for particular survey items (in terms of the overall mean squared error of survey estimates). Survey researchers seeking answers to these types of questions have different statistical tools available to them. This paper aims to provide an overview of alternative frequentist and Bayesian approaches to the comparison of variance components in different groups of survey interviewers, using a hierarchical generalized linear modeling framework that accommodates a variety of different types of survey variables. We first consider the benefits and limitations of each approach, contrasting the methods used for estimation and inference. We next present a simulation study, empirically evaluating the ability of each approach to efficiently estimate differences in variance components. We then apply the two approaches to an analysis of real survey data collected in the U.S. National Survey of Family Growth (NSFG). We conclude that the two approaches tend to result in very similar inferences, and we provide suggestions for practice given some of the subtle differences observed.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214096
    Description:

    In order to obtain better coverage of the population of interest and cost less, a number of surveys employ dual frame structure, in which independent samples are taken from two overlapping sampling frames. This research considers chi-squared tests in dual frame surveys when categorical data is encountered. We extend generalized Wald’s test (Wald 1943), Rao-Scott first-order and second-order corrected tests (Rao and Scott 1981) from a single survey to a dual frame survey and derive the asymptotic distributions. Simulation studies show that both Rao-Scott type corrected tests work well and thus are recommended for use in dual frame surveys. An example is given to illustrate the usage of the developed tests.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214097
    Description:

    When monthly business surveys are not completely overlapping, there are two different estimators for the monthly growth rate of the turnover: (i) one that is based on the monthly estimated population totals and (ii) one that is purely based on enterprises observed on both occasions in the overlap of the corresponding surveys. The resulting estimates and variances might be quite different. This paper proposes an optimal composite estimator for the growth rate as well as the population totals.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214110
    Description:

    In developing the sample design for a survey we attempt to produce a good design for the funds available. Information on costs can be used to develop sample designs that minimise the sampling variance of an estimator of total for fixed cost. Improvements in survey management systems mean that it is now sometimes possible to estimate the cost of including each unit in the sample. This paper develops relatively simple approaches to determine whether the potential gains arising from using this unit level cost information are likely to be of practical use. It is shown that the key factor is the coefficient of variation of the costs relative to the coefficient of variation of the relative error on the estimated cost coefficients.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214113
    Description:

    Rotating panel surveys are used to calculate estimates of gross flows between two consecutive periods of measurement. This paper considers a general procedure for the estimation of gross flows when the rotating panel survey has been generated from a complex survey design with random nonresponse. A pseudo maximum likelihood approach is considered through a two-stage model of Markov chains for the allocation of individuals among the categories in the survey and for modeling for nonresponse.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214118
    Description:

    Bagging is a powerful computational method used to improve the performance of inefficient estimators. This article is a first exploration of the use of bagging in survey estimation, and we investigate the effects of bagging on non-differentiable survey estimators including sample distribution functions and quantiles, among others. The theoretical properties of bagged survey estimators are investigated under both design-based and model-based regimes. In particular, we show the design consistency of the bagged estimators, and obtain the asymptotic normality of the estimators in the model-based context. The article describes how implementation of bagging for survey estimators can take advantage of replicates developed for survey variance estimation, providing an easy way for practitioners to apply bagging in existing surveys. A major remaining challenge in implementing bagging in the survey context is variance estimation for the bagged estimators themselves, and we explore two possible variance estimation approaches. Simulation experiments reveal the improvement of the proposed bagging estimator relative to the original estimator and compare the two variance estimation approaches.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214119
    Description:

    When considering sample stratification by several variables, we often face the case where the expected number of sample units to be selected in each stratum is very small and the total number of units to be selected is smaller than the total number of strata. These stratified sample designs are specifically represented by the tabular arrays with real numbers, called controlled selection problems, and are beyond the reach of conventional methods of allocation. Many algorithms for solving these problems have been studied over about 60 years beginning with Goodman and Kish (1950). Those developed more recently are especially computer intensive and always find the solutions. However, there still remains the unanswered question: In what sense are the solutions to a controlled selection problem obtained from those algorithms optimal? We introduce the general concept of optimal solutions, and propose a new controlled selection algorithm based on typical distance functions to achieve solutions. This algorithm can be easily performed by a new SAS-based software. This study focuses on two-way stratification designs. The controlled selection solutions from the new algorithm are compared with those from existing algorithms using several examples. The new algorithm successfully obtains robust solutions to two-way controlled selection problems that meet the optimality criteria.

    Release date: 2014-12-19
Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (72)

Articles and reports (72) (30 to 40 of 72 results)

  • Articles and reports: 11-522-X201300014273
    Description:

    More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term ‘Big Data’. Because of the fact that these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. However, first experiences obtained with analyses of large amounts of Dutch traffic loop detection records, call detail records of mobile phones and Dutch social media messages reveal that a number of challenges need to be addressed to enable the application of these data sources for official statistics. These and the lessons learned during these initial studies will be addressed and illustrated by examples. More specifically, the following topics are discussed: the three general types of Big Data discerned, the need to access and analyse large amounts of data, how we deal with noisy data and look at selectivity (and our own bias towards this topic), how to go beyond correlation, how we found people with the right skills and mind-set to perform the work, and how we have dealt with privacy and security issues.

    Release date: 2014-10-31

  • Articles and reports: 11-522-X201300014274
    Description:

    What is big data? Can it replace and or supplement official surveys? What are some of the challenges associated with utilizing big data for official statistics? What are some of the possible solutions? Last fall, Statistics Canada invested in a Big Data Pilot project to answer some of these questions. This was the first business survey project of its kind. This paper will cover some of the lessons learned from the Big Data Pilot Project using Smart Meter Data.

    Release date: 2014-10-31

  • Articles and reports: 11-522-X201300014275
    Description:

    Since July 2014, the Office for National Statistics has committed to a predominantly online 2021 UK Census. Item-level imputation will play an important role in adjusting the 2021 Census database. Research indicates that the internet may yield cleaner data than paper based capture and attract people with particular characteristics. Here, we provide preliminary results from research directed at understanding how we might manage these features in a 2021 UK Census imputation strategy. Our findings suggest that if using a donor-based imputation method, it may need to consider including response mode as a matching variable in the underlying imputation model.

    Release date: 2014-10-31

  • Articles and reports: 11-522-X201300014276
    Description:

    In France, budget restrictions are making it more difficult to hire casual interviewers to deal with collection problems. As a result, it has become necessary to adhere to a predetermined annual work quota. For surveys of the National Institute of Statistics and Economic Studies (INSEE), which use a master sample, problems arise when an interviewer is on extended leave throughout the entire collection period of a survey. When that occurs, an area may cease to be covered by the survey, and this effectively generates a bias. In response to this new problem, we have implemented two methods, depending on when the problem is identified: If an area is ‘abandoned’ before or at the very beginning of collection, we carry out a ‘sub-allocation’ procedure. The procedure involves interviewing a minimum number of households in each collection area at the expense of other areas in which no collection problems have been identified. The idea is to minimize the dispersion of weights while meeting collection targets. If an area is ‘abandoned’ during collection, we prioritize the remaining surveys. Prioritization is based on a representativeness indicator (R indicator) that measures the degree of similarity between a sample and the base population. The goal of this prioritization process during collection is to get as close as possible to equal response probability for respondents. The R indicator is based on the dispersion of the estimated response probabilities of the sampled households, and it is composed of partial R indicators that measure representativeness variable by variable. These R indicators are tools that we can use to analyze collection by isolating underrepresented population groups. We can increase collection efforts for groups that have been identified beforehand. In the oral presentation, we covered these two points concisely. By contrast, this paper deals exclusively with the first point: sub-allocation. Prioritization is being implemented for the first time at INSEE for the assets survey, and it will be covered in a specific paper by A. Rebecq.

    Release date: 2014-10-31

  • Articles and reports: 11-522-X201300014277
    Description:

    This article gives an overview of adaptive design elements introduced to the PASS panel survey in waves four to seven. The main focus is on experimental interventions in later phases of the fieldwork. These interventions aim at balancing the sample by prioritizing low-propensity sample members. In wave 7, interviewers received a double premium for completion of interviews with low-propensity cases in the final phase of the fieldwork. This premium was restricted to a random half of the cases with low estimated response propensity and no final status after four months of prior fieldwork. This incentive was effective in increasing interviewer effort, however, led to no significant increase in response rates.

    Release date: 2014-10-31

  • Articles and reports: 11-522-X201300014279
    Description:

    As part of the European SustainCity project, a microsimulation model of individuals and households was created to simulate the population of various European cities. The aim of the project was to combine several transportation and land-use microsimulation models (land-use modelling), add on a dynamic population module and apply these microsimulation approaches to three geographic areas of Europe (the Île-de-France region and the Brussels and Zurich agglomerations

    Release date: 2014-10-31

  • Articles and reports: 11-522-X201300014280
    Description:

    During the last decade, web panel surveys have been established as a fast and cost-efficient method in market surveys. The rationale for this is new developments in information technology, in particular the continued rapid growth of internet and computer use among the public. Also growing nonresponse rates and prices forced down in the survey industry lie behind this change. However, there are some serious inherent risks connected with web panel surveys, not least selection bias due to the self-selection of respondents. There are also risks of coverage and measurement errors. The absence of an inferential framework and of data quality indicators is an obstacle against using the web panel approach for high-quality statistics about general populations. Still, there seems to be increasing challenges for some national statistical institutes by a new form of competition for ad hoc statistics and even official statistics from web panel surveys.This paper explores the question of design and use of web panels in a scientifically sound way. An outline is given of a standard from the Swedish Survey Society for performance metrics to assess some quality aspects of results from web panel surveys. Decomposition of bias and mitigation of bias risks are discussed in some detail. Some ideas are presented for combining web panel surveys and traditional surveys to achieve controlled cost-efficient inference.

    Release date: 2014-10-31

  • Articles and reports: 11-522-X201300014281
    Description:

    Web surveys exclude the entire non-internet population and often have low response rates. Therefore, statistical inference based on Web survey samples will require availability of additional information about the non-covered population, careful choice of survey methods to account for potential biases, and caution with interpretation and generalization of the results to a target population. In this paper, we focus on non-coverage bias, and explore the use of weighted estimators and hot-deck imputation estimators for bias adjustment under the ideal scenario where covariate information was obtained for a simple random sample of individuals from the non-covered population. We illustrate empirically the performance of the proposed estimators under this scenario. Possible extensions of these approaches to more realistic scenarios are discussed.

    Release date: 2014-10-31

  • Articles and reports: 11-522-X201300014282
    Description:

    The IAB-Establishment Panel is the most comprehensive establishment survey in Germany with almost 16.000 firms participating every year. Face-to-face interviews with paper and pencil (PAPI) are conducted since 1993. An ongoing project examines possible effects of a change of the survey to computer aided personal interviews (CAPI) combined with a web based version of the questionnaire (CAWI). In a first step, questions about the internet access, the willingness to complete the questionnaire online and reasons for refusal were included in the 2012 wave. First results are indicating a widespread refusal to take part in a web survey. A closer look reveals that smaller establishments, long time participants and older respondents are reluctant to use the internet.

    Release date: 2014-10-31

  • Articles and reports: 11-522-X201300014283
    Description:

    The project MIAD of the Statistical Network aims at developing methodologies for an integrated use of administrative data (AD) in the statistical process. MIAD main target is providing guidelines for exploiting AD for statistical purposes. In particular, a quality framework has been developed, a mapping of possible uses has been provided and a schema of alternative informative contexts is proposed. This paper focuses on this latter aspect. In particular, we distinguish between dimensions that relate to features of the source connected with accessibility and with characteristics that are connected to the AD structure and their relationships with the statistical concepts. We denote the first class of features the framework for access and the second class of features the data framework. In this paper we mainly concentrate on the second class of characteristics that are related specifically with the kind of information that can be obtained from the secondary source. In particular, these features relate to the target administrative population and measurement on this population and how it is (or may be) connected with the target population and target statistical concepts.

    Release date: 2014-10-31
Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Date modified: