Inference and foundations

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

1 facets displayed. 0 facets selected.

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (69)

All (69) (0 to 10 of 69 results)

  • Articles and reports: 12-001-X201800254956
    Description:

    In Italy, the Labor Force Survey (LFS) is conducted quarterly by the National Statistical Institute (ISTAT) to produce estimates of the labor force status of the population at different geographical levels. In particular, ISTAT provides LFS estimates of employed and unemployed counts for local Labor Market Areas (LMAs). LMAs are 611 sub-regional clusters of municipalities and are unplanned domains for which direct estimates have overly large sampling errors. This implies the need of Small Area Estimation (SAE) methods. In this paper, we develop a new area level SAE method that uses a Latent Markov Model (LMM) as linking model. In LMMs, the characteristic of interest, and its evolution in time, is represented by a latent process that follows a Markov chain, usually of first order. Therefore, areas are allowed to change their latent state across time. The proposed model is applied to quarterly data from the LFS for the period 2004 to 2014 and fitted within a hierarchical Bayesian framework using a data augmentation Gibbs sampler. Estimates are compared with those obtained by the classical Fay-Herriot model, by a time-series area level SAE model, and on the basis of data coming from the 2011 Population Census.

    Release date: 2018-12-20

  • Articles and reports: 12-001-X201800154928
    Description:

    A two-phase process was used by the Substance Abuse and Mental Health Services Administration to estimate the proportion of US adults with serious mental illness (SMI). The first phase was the annual National Survey on Drug Use and Health (NSDUH), while the second phase was a random subsample of adult respondents to the NSDUH. Respondents to the second phase of sampling were clinically evaluated for serious mental illness. A logistic prediction model was fit to this subsample with the SMI status (yes or no) determined by the second-phase instrument treated as the dependent variable and related variables collected on the NSDUH from all adults as the model’s explanatory variables. Estimates were then computed for SMI prevalence among all adults and within adult subpopulations by assigning an SMI status to each NSDUH respondent based on comparing his (her) estimated probability of having SMI to a chosen cut point on the distribution of the predicted probabilities. We investigate alternatives to this standard cut point estimator such as the probability estimator. The latter assigns an estimated probability of having SMI to each NSDUH respondent. The estimated prevalence of SMI is the weighted mean of those estimated probabilities. Using data from NSDUH and its subsample, we show that, although the probability estimator has a smaller mean squared error when estimating SMI prevalence among all adults, it has a greater tendency to be biased at the subpopulation level than the standard cut point estimator.

    Release date: 2018-06-21

  • Articles and reports: 12-001-X201700254872
    Description:

    This note discusses the theoretical foundations for the extension of the Wilson two-sided coverage interval to an estimated proportion computed from complex survey data. The interval is shown to be asymptotically equivalent to an interval derived from a logistic transformation. A mildly better version is discussed, but users may prefer constructing a one-sided interval already in the literature.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700114822
    Description:

    We use a Bayesian method to infer about a finite population proportion when binary data are collected using a two-fold sample design from small areas. The two-fold sample design has a two-stage cluster sample design within each area. A former hierarchical Bayesian model assumes that for each area the first stage binary responses are independent Bernoulli distributions, and the probabilities have beta distributions which are parameterized by a mean and a correlation coefficient. The means vary with areas but the correlation is the same over areas. However, to gain some flexibility we have now extended this model to accommodate different correlations. The means and the correlations have independent beta distributions. We call the former model a homogeneous model and the new model a heterogeneous model. All hyperparameters have proper noninformative priors. An additional complexity is that some of the parameters are weakly identified making it difficult to use a standard Gibbs sampler for computation. So we have used unimodal constraints for the beta prior distributions and a blocked Gibbs sampler to perform the computation. We have compared the heterogeneous and homogeneous models using an illustrative example and simulation study. As expected, the two-fold model with heterogeneous correlations is preferred.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600214662
    Description:

    Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114545
    Description:

    The estimation of quantiles is an important topic not only in the regression framework, but also in sampling theory. A natural alternative or addition to quantiles are expectiles. Expectiles as a generalization of the mean have become popular during the last years as they not only give a more detailed picture of the data than the ordinary mean, but also can serve as a basis to calculate quantiles by using their close relationship. We show, how to estimate expectiles under sampling with unequal probabilities and how expectiles can be used to estimate the distribution function. The resulting fitted distribution function estimator can be inverted leading to quantile estimates. We run a simulation study to investigate and compare the efficiency of the expectile based estimator.

    Release date: 2016-06-22

  • Articles and reports: 11-522-X201700014704
    Description:

    We identify several research areas and topics for methodological research in official statistics. We argue why these are important, and why these are the most important ones for official statistics. We describe the main topics in these research areas and sketch what seems to be the most promising ways to address them. Here we focus on: (i) Quality of National accounts, in particular the rate of growth of GNI (ii) Big data, in particular how to create representative estimates and how to make the most of big data when this is difficult or impossible. We also touch upon: (i) Increasing timeliness of preliminary and final statistical estimates (ii) Statistical analysis, in particular of complex and coherent phenomena. These topics are elements in the present Strategic Methodological Research Program that has recently been adopted at Statistics Netherlands

    Release date: 2016-03-24

  • Articles and reports: 11-522-X201700014713
    Description:

    Big data is a term that means different things to different people. To some, it means datasets so large that our traditional processing and analytic systems can no longer accommodate them. To others, it simply means taking advantage of existing datasets of all sizes and finding ways to merge them with the goal of generating new insights. The former view poses a number of important challenges to traditional market, opinion, and social research. In either case, there are implications for the future of surveys that are only beginning to be explored.

    Release date: 2016-03-24

  • Articles and reports: 11-522-X201700014727
    Description:

    "Probability samples of near-universal frames of households and persons, administered standardized measures, yielding long multivariate data records, and analyzed with statistical procedures reflecting the design – these have been the cornerstones of the empirical social sciences for 75 years. That measurement structure have given the developed world almost all of what we know about our societies and their economies. The stored survey data form a unique historical record. We live now in a different data world than that in which the leadership of statistical agencies and the social sciences were raised. High-dimensional data are ubiquitously being produced from Internet search activities, mobile Internet devices, social media, sensors, retail store scanners, and other devices. Some estimate that these data sources are increasing in size at the rate of 40% per year. Together their sizes swamp that of the probability-based sample surveys. Further, the state of sample surveys in the developed world is not healthy. Falling rates of survey participation are linked with ever-inflated costs of data collection. Despite growing needs for information, the creation of new survey vehicles is hampered by strained budgets for official statistical agencies and social science funders. These combined observations are unprecedented challenges for the basic paradigm of inference in the social and economic sciences. This paper discusses alternative ways forward at this moment in history. "

    Release date: 2016-03-24

  • Articles and reports: 11-522-X201700014738
    Description:

    In the standard design approach to missing observations, the construction of weight classes and calibration are used to adjust the design weights for the respondents in the sample. Here we use these adjusted weights to define a Dirichlet distribution which can be used to make inferences about the population. Examples show that the resulting procedures have better performance properties than the standard methods when the population is skewed.

    Release date: 2016-03-24
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (69)

Analysis (69) (40 to 50 of 69 results)

  • Articles and reports: 11-522-X20020016745
    Description:

    The attractiveness of the Regression Discontinuity Design (RDD) rests on its close similarity to a normal experimental design. On the other hand, it is of limited applicability since it is not often the case that units are assigned to the treatment group on the basis of an observable (to the analyst) pre-program measure. Besides, it only allows identification of the mean impact on a very specific subpopulation. In this technical paper, we show that the RDD straightforwardly generalizes to the instances in which the units' eligibility is established on an observable pre-program measure with eligible units allowed to freely self-select into the program. This set-up also proves to be very convenient for building a specification test on conventional non-experimental estimators of the program mean impact. The data requirements are clearly described.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016750
    Description:

    Analyses of data from social and economic surveys sometimes use generalized variance function models to approximate the design variance of point estimators of population means and proportions. Analysts may use the resulting standard error estimates to compute associated confidence intervals or test statistics for the means and proportions of interest. In comparison with design-based variance estimators computed directly from survey microdata, generalized variance function models have several potential advantages, as will be discussed in this paper, including operational simplicity; increased stability of standard errors; and, for cases involving public-use datasets, reduction of disclosure limitation problems arising from the public release of stratum and cluster indicators.

    These potential advantages, however, may be offset in part by several inferential issues. First, the properties of inferential statistics based on generalized variance functions (e.g., confidence interval coverage rates and widths) depend heavily on the relative empirical magnitudes of the components of variability associated, respectively, with:

    (a) the random selection of a subset of items used in estimation of the generalized variance function model(b) the selection of sample units under a complex sample design (c) the lack of fit of the generalized variance function model (d) the generation of a finite population under a superpopulation model.

    Second, under conditions, one may link each of components (a) through (d) with different empirical measures of the predictive adequacy of a generalized variance function model. Consequently, these measures of predictive adequacy can offer us some insight into the extent to which a given generalized variance function model may be appropriate for inferential use in specific applications.

    Some of the proposed diagnostics are applied to data from the US Survey of Doctoral Recipients and the US Current Employment Survey. For the Survey of Doctoral Recipients, components (a), (c) and (d) are of principal concern. For the Current Employment Survey, components (b), (c) and (d) receive principal attention, and the availability of population microdata allow the development of especially detailed models for components (b) and (c).

    Release date: 2004-09-13

  • Articles and reports: 12-001-X20030026785
    Description:

    To avoid disclosures, one approach is to release partially synthetic, public use microdata sets. These comprise the units originally surveyed, but some collected values, for example sensitive values at high risk of disclosure or values of key identifiers, are replaced with multiple imputations. Although partially synthetic approaches are currently used to protect public use data, valid methods of inference have not been developed for them. This article presents such methods. They are based on the concepts of multiple imputation for missing data but use different rules for combining point and variance estimates. The combining rules also differ from those for fully synthetic data sets developed by Raghunathan, Reiter and Rubin (2003). The validity of these new rules is illustrated in simulation studies.

    Release date: 2004-01-27

  • Articles and reports: 12-001-X20030016610
    Description:

    In the presence of item nonreponse, unweighted imputation methods are often used in practice but they generally lead to biased estimators under uniform response within imputation classes. Following Skinner and Rao (2002), we propose a bias-adjusted estimator of a population mean under unweighted ratio imputation and random hot-deck imputation and derive linearization variance estimators. A small simulation study is conducted to study the performance of the methods in terms of bias and mean square error. Relative bias and relative stability of the variance estimators are also studied.

    Release date: 2003-07-31

  • Articles and reports: 92F0138M2003002
    Description:

    This working paper describes the preliminary 2006 census metropolitan areas and census agglomerations and is presented for user feedback. The paper briefly describes the factors that have resulted in changes to some of the census metropolitan areas and census agglomerations and includes tables and maps that list and illustrate these changes to their limits and to the component census subdivisions.

    Release date: 2003-07-11

  • Articles and reports: 92F0138M2003001
    Description:

    The goal of this working paper is to assess how well Canada's current method of delineating Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs) reflects the metropolitan nature of these geographic areas according to the facilities and services they provide. The effectiveness of Canada's delineation methodology can be evaluated by applying a functional model to Statistics Canada's CMAs and CAs.

    As a consequence of the research undertaken for this working paper, Statistics Canada has proposed lowering the urban core population threshold it uses to define CMAs: a CA will be promoted to a CMA if it has a total population of at least 100,000, of which 50,000 or more live in the urban core. User consultation on this proposal took place in the fall of 2002 as part of the 2006 Census content determination process.

    Release date: 2003-03-31

  • Articles and reports: 11F0019M2003199
    Geography: Canada
    Description:

    Using a nationally representative sample of establishments, we have examined whether selected alternative work practices (AWPs) tend to reduce quit rates. Overall, our analysis provides strong evidence of a negative association between these AWPs and quit rates among establishments of more than 10 employees operating in high-skill services. We also found some evidence of a negative association in low-skill services. However, the magnitude of this negative association was reduced substantially when we added an indicator of whether the workplace has a formal policy of information sharing. There was very little evidence of a negative association in manufacturing. While establishments with self-directed workgroups have lower quit rates than others, none of the bundles of work practices considered yielded a negative and statistically significant effect. We surmise that key AWPs might be more successful in reducing labour turnover in technologically complex environments than in low-skill ones.

    Release date: 2003-03-17

  • Articles and reports: 12-001-X20020026428
    Description:

    The analysis of survey data from different geographical areas where the data from each area are polychotomous can be easily performed using hierarchical Bayesian models, even if there are small cell counts in some of these areas. However, there are difficulties when the survey data have missing information in the form of non-response, especially when the characteristics of the respondents differ from the non-respondents. We use the selection approach for estimation when there are non-respondents because it permits inference for all the parameters. Specifically, we describe a hierarchical Bayesian model to analyse multinomial non-ignorable non-response data from different geographical areas; some of them can be small. For the model, we use a Dirichlet prior density for the multinomial probabilities and a beta prior density for the response probabilities. This permits a 'borrowing of strength' of the data from larger areas to improve the reliability in the estimates of the model parameters corresponding to the smaller areas. Because the joint posterior density of all the parameters is complex, inference is sampling-based and Markov chain Monte Carlo methods are used. We apply our method to provide an analysis of body mass index (BMI) data from the third National Health and Nutrition Examination Survey (NHANES III). For simplicity, the BMI is categorized into 3 natural levels, and this is done for each of 8 age-race-sex domains and 34 counties. We assess the performance of our model using the NHANES III data and simulated examples, which show our model works reasonably well.

    Release date: 2003-01-29

  • Articles and reports: 11-522-X20010016277
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    The advent of computerized record-linkage methodology has facilitated the conduct of cohort mortality studies in which exposure data in one database are electronically linked with mortality data from another database. In this article, the impact of linkage errors on estimates of epidemiological indicators of risk, such as standardized mortality ratios and relative risk regression model parameters, is explored. It is shown that these indicators can be subject to bias and additional variability in the presence of linkage errors, with false links and non-links leading to positive and negative bias, respectively, in estimates of the standardized mortality ratio. Although linkage errors always increase the uncertainty in the estimates, bias can be effectively eliminated in the special case in which the false positive rate equals the false negative rate within homogeneous states defined by cross-classification of the covariates of interest.

    Release date: 2002-09-12

  • Articles and reports: 89-552-M2000007
    Geography: Canada
    Description:

    This paper addresses the problem of statistical inference with ordinal variates and examines the robustness to alternative literacy measurement and scaling choices of rankings of average literacy and of estimates of the impact of literacy on individual earnings.

    Release date: 2000-06-02
Reference (3)

Reference (3) ((3 results))

  • Surveys and statistical programs – Documentation: 12-001-X19970013101
    Description:

    In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

    Release date: 1997-08-18

  • Surveys and statistical programs – Documentation: 12-001-X19970013102
    Description:

    The selection of auxiliary variables is considered for regression estimation in finite populations under a simple random sampling design. This problem is a basic one for model-based and model-assisted survey sampling approaches and is of practical importance when the number of variables available is large. An approach is developed in which a mean squared error estimator is minimised. This approach is compared to alternative approaches using a fixed set of auxiliary variables, a conventional significance test criterion, a condition number reduction approach and a ridge regression approach. The proposed approach is found to perform well in terms of efficiency. It is noted that the variable selection approach affects the properties of standard variance estimators and thus leads to a problem of variance estimation.

    Release date: 1997-08-18

  • Surveys and statistical programs – Documentation: 12-001-X19960022980
    Description:

    In this paper, we study a confidence interval estimation method for a finite population average when some auxiliairy information is available. As demonstrated by Royall and Cumberland in a series of empirical studies, naive use of existing methods to construct confidence intervals for population averages may result in very poor conditional coverage probabilities, conditional on the sample mean of the covariate. When this happens, we propose to transform the data to improve the precision of the normal approximation. The transformed data are then used to make inference on the original population average, and the auxiliary information is incorporated into the inference directly, or by calibration with empirical likelihood. Our approach is design-based. We apply our approach to six real populations and find that when transformation is needed, our approach performs well compared to the usual regression method.

    Release date: 1997-01-30
Date modified: