Inference and foundations

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

1 facets displayed. 0 facets selected.

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help

Results

All (69)

All (69) (0 to 10 of 69 results)

  • Articles and reports: 12-001-X201800254956
    Description:

    In Italy, the Labor Force Survey (LFS) is conducted quarterly by the National Statistical Institute (ISTAT) to produce estimates of the labor force status of the population at different geographical levels. In particular, ISTAT provides LFS estimates of employed and unemployed counts for local Labor Market Areas (LMAs). LMAs are 611 sub-regional clusters of municipalities and are unplanned domains for which direct estimates have overly large sampling errors. This implies the need of Small Area Estimation (SAE) methods. In this paper, we develop a new area level SAE method that uses a Latent Markov Model (LMM) as linking model. In LMMs, the characteristic of interest, and its evolution in time, is represented by a latent process that follows a Markov chain, usually of first order. Therefore, areas are allowed to change their latent state across time. The proposed model is applied to quarterly data from the LFS for the period 2004 to 2014 and fitted within a hierarchical Bayesian framework using a data augmentation Gibbs sampler. Estimates are compared with those obtained by the classical Fay-Herriot model, by a time-series area level SAE model, and on the basis of data coming from the 2011 Population Census.

    Release date: 2018-12-20

  • Articles and reports: 12-001-X201800154928
    Description:

    A two-phase process was used by the Substance Abuse and Mental Health Services Administration to estimate the proportion of US adults with serious mental illness (SMI). The first phase was the annual National Survey on Drug Use and Health (NSDUH), while the second phase was a random subsample of adult respondents to the NSDUH. Respondents to the second phase of sampling were clinically evaluated for serious mental illness. A logistic prediction model was fit to this subsample with the SMI status (yes or no) determined by the second-phase instrument treated as the dependent variable and related variables collected on the NSDUH from all adults as the model’s explanatory variables. Estimates were then computed for SMI prevalence among all adults and within adult subpopulations by assigning an SMI status to each NSDUH respondent based on comparing his (her) estimated probability of having SMI to a chosen cut point on the distribution of the predicted probabilities. We investigate alternatives to this standard cut point estimator such as the probability estimator. The latter assigns an estimated probability of having SMI to each NSDUH respondent. The estimated prevalence of SMI is the weighted mean of those estimated probabilities. Using data from NSDUH and its subsample, we show that, although the probability estimator has a smaller mean squared error when estimating SMI prevalence among all adults, it has a greater tendency to be biased at the subpopulation level than the standard cut point estimator.

    Release date: 2018-06-21

  • Articles and reports: 12-001-X201700254872
    Description:

    This note discusses the theoretical foundations for the extension of the Wilson two-sided coverage interval to an estimated proportion computed from complex survey data. The interval is shown to be asymptotically equivalent to an interval derived from a logistic transformation. A mildly better version is discussed, but users may prefer constructing a one-sided interval already in the literature.

    Release date: 2017-12-21

  • Articles and reports: 12-001-X201700114822
    Description:

    We use a Bayesian method to infer about a finite population proportion when binary data are collected using a two-fold sample design from small areas. The two-fold sample design has a two-stage cluster sample design within each area. A former hierarchical Bayesian model assumes that for each area the first stage binary responses are independent Bernoulli distributions, and the probabilities have beta distributions which are parameterized by a mean and a correlation coefficient. The means vary with areas but the correlation is the same over areas. However, to gain some flexibility we have now extended this model to accommodate different correlations. The means and the correlations have independent beta distributions. We call the former model a homogeneous model and the new model a heterogeneous model. All hyperparameters have proper noninformative priors. An additional complexity is that some of the parameters are weakly identified making it difficult to use a standard Gibbs sampler for computation. So we have used unimodal constraints for the beta prior distributions and a blocked Gibbs sampler to perform the computation. We have compared the heterogeneous and homogeneous models using an illustrative example and simulation study. As expected, the two-fold model with heterogeneous correlations is preferred.

    Release date: 2017-06-22

  • Articles and reports: 12-001-X201600214662
    Description:

    Two-phase sampling designs are often used in surveys when the sampling frame contains little or no auxiliary information. In this note, we shed some light on the concept of invariance, which is often mentioned in the context of two-phase sampling designs. We define two types of invariant two-phase designs: strongly invariant and weakly invariant two-phase designs. Some examples are given. Finally, we describe the implications of strong and weak invariance from an inference point of view.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201600114545
    Description:

    The estimation of quantiles is an important topic not only in the regression framework, but also in sampling theory. A natural alternative or addition to quantiles are expectiles. Expectiles as a generalization of the mean have become popular during the last years as they not only give a more detailed picture of the data than the ordinary mean, but also can serve as a basis to calculate quantiles by using their close relationship. We show, how to estimate expectiles under sampling with unequal probabilities and how expectiles can be used to estimate the distribution function. The resulting fitted distribution function estimator can be inverted leading to quantile estimates. We run a simulation study to investigate and compare the efficiency of the expectile based estimator.

    Release date: 2016-06-22

  • Articles and reports: 11-522-X201700014704
    Description:

    We identify several research areas and topics for methodological research in official statistics. We argue why these are important, and why these are the most important ones for official statistics. We describe the main topics in these research areas and sketch what seems to be the most promising ways to address them. Here we focus on: (i) Quality of National accounts, in particular the rate of growth of GNI (ii) Big data, in particular how to create representative estimates and how to make the most of big data when this is difficult or impossible. We also touch upon: (i) Increasing timeliness of preliminary and final statistical estimates (ii) Statistical analysis, in particular of complex and coherent phenomena. These topics are elements in the present Strategic Methodological Research Program that has recently been adopted at Statistics Netherlands

    Release date: 2016-03-24

  • Articles and reports: 11-522-X201700014713
    Description:

    Big data is a term that means different things to different people. To some, it means datasets so large that our traditional processing and analytic systems can no longer accommodate them. To others, it simply means taking advantage of existing datasets of all sizes and finding ways to merge them with the goal of generating new insights. The former view poses a number of important challenges to traditional market, opinion, and social research. In either case, there are implications for the future of surveys that are only beginning to be explored.

    Release date: 2016-03-24

  • Articles and reports: 11-522-X201700014727
    Description:

    "Probability samples of near-universal frames of households and persons, administered standardized measures, yielding long multivariate data records, and analyzed with statistical procedures reflecting the design – these have been the cornerstones of the empirical social sciences for 75 years. That measurement structure have given the developed world almost all of what we know about our societies and their economies. The stored survey data form a unique historical record. We live now in a different data world than that in which the leadership of statistical agencies and the social sciences were raised. High-dimensional data are ubiquitously being produced from Internet search activities, mobile Internet devices, social media, sensors, retail store scanners, and other devices. Some estimate that these data sources are increasing in size at the rate of 40% per year. Together their sizes swamp that of the probability-based sample surveys. Further, the state of sample surveys in the developed world is not healthy. Falling rates of survey participation are linked with ever-inflated costs of data collection. Despite growing needs for information, the creation of new survey vehicles is hampered by strained budgets for official statistical agencies and social science funders. These combined observations are unprecedented challenges for the basic paradigm of inference in the social and economic sciences. This paper discusses alternative ways forward at this moment in history. "

    Release date: 2016-03-24

  • Articles and reports: 11-522-X201700014738
    Description:

    In the standard design approach to missing observations, the construction of weight classes and calibration are used to adjust the design weights for the respondents in the sample. Here we use these adjusted weights to define a Dirichlet distribution which can be used to make inferences about the population. Examples show that the resulting procedures have better performance properties than the standard methods when the population is skewed.

    Release date: 2016-03-24
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (69)

Analysis (69) (60 to 70 of 69 results)

  • Articles and reports: 12-001-X199200214487
    Description:

    This paper reviews the idea of robustness for randomisation and model-based inference for descriptive and analytic surveys. The lack of robustness for model-based procedures can be partially overcome by careful design. In this paper a robust model-based approach to analysis is proposed based on smoothing methods.

    Release date: 1992-12-15

  • Articles and reports: 12-001-X199200214488
    Description:

    In many finite population sampling problems the design that is optimal in the sense of minimizing the variance of the best linear unbiased estimator under a particular working model is bad in the sense of robustness - it leaves the estimator extremely vulnerable to bias if the working model is incorrect. However there are some important models under which one design provides both efficiency and robustness. We present a theorem that identifies such models and their optimal designs.

    Release date: 1992-12-15

  • Articles and reports: 12-001-X199100214504
    Description:

    Simple or marginal quota surveys are analyzed using two methods: (1) behaviour modelling (superpopulation model) and prediction estimation, and (2) sample modelling (simple restricted random sampling) and estimation derived from the sample distribution. In both cases the limitations of the theory used to establish the variance formulas and estimates when measuring totals are described. An extension of the quota method (non-proportional quotas) is also briefly described and analyzed. In some cases, this may provide a very significant improvement in survey precision. The advantages of the quota method are compared with those of random sampling. The latter remains indispensable in the case of large scale surveys within the framework of Official Statistics.

    Release date: 1991-12-16

  • Articles and reports: 12-001-X199100114521
    Description:

    Marginal and approximate conditional likelihoods are given for the correlation parameters in a normal linear regression model with correlated errors. This general likelihood approach is applied to obtain marginal and approximate conditional likelihoods for the correlation parameters in sampling on successive occasions under both simple random sampling on each occasion and more complex surveys.

    Release date: 1991-06-14

  • Articles and reports: 12-001-X199000114560
    Description:

    Early developments in sampling theory and methods largely concentrated on efficient sampling designs and associated estimation techniques for population totals or means. More recently, the theoretical foundations of survey based estimation have also been critically examined, and formal frameworks for inference on totals or means have emerged. During the past 10 years or so, rapid progress has also been made in the development of methods for the analysis of survey data that take account of the complexity of the sampling design. The scope of this paper is restricted to an overview and appraisal of some of these developments.

    Release date: 1990-06-15

  • Articles and reports: 12-001-X198900214568
    Description:

    The paper describes a Monte Carlo study of simultaneous confidence interval procedures for k > 2 proportions, under a model of two-stage cluster sampling. The procedures investigated include: (i) standard multinomial intervals; (ii) Scheffé intervals based on sample estimates of the variances of cell proportions; (iii) Quesenberry-Hurst intervals adapted for clustered data using Rao and Scott’s first and second order adjustments to X^2; (iv) simple Bonferroni intervals; (v) Bonferroni intervals based on transformations of the estimated proportions; (vi) Bonferroni intervals computed using the critical points of Student’s t. In several realistic situations, actual coverage rates of the multinomial procedures were found to be seriously depressed compared to the nominal rate. The best performing intervals, from the point of view of coverage rates and coverage symmetry (an extension of an idea due to Jennings), were the t-based Bonferroni intervals derived using log and logit transformations. Of the Scheffé-like procedures, the best performance was provided by Quesenberry-Hurst intervals in combination with first-order Rao-Scott adjustments.

    Release date: 1989-12-15

  • Articles and reports: 12-001-X198500114364
    Description:

    Conventional methods of inference in survey sampling are critically examined. The need for conditioning the inference on recognizable subsets of the population is emphasized. A number of real examples involving random sample sizes are presented to illustrate inferences conditional on the realized sample configuration and associated difficulties. The examples include the following: estimation of (a) population mean under simple random sampling; (b) population mean in the presence of outliers; (c) domain total and domain mean; (d) population mean with two-way stratification; (e) population mean in the presence of non-responses; (f) population mean under general designs. The conditional bias and the conditional variance of estimators of a population mean (or a domain mean or total), and the associated confidence intervals, are examined.

    Release date: 1985-06-14

  • Articles and reports: 12-001-X198400114351
    Description:

    Most sample surveys conducted by organizations such as Statistics Canada or the U.S. Bureau of the Census employ complex designs. The design-based approach to statistical inference, typically the institutional standard of inference for simple population statistics such as means and totals, may be extended to parameters of analytic models as well. Most of this paper focuses on application of design-based inferences to such models, but rationales are offered for use of model-based alternatives in some instances, by way of explanation for the author’s observation that both modes of inference are used in practice at his own institution.

    Within the design-based approach to inference, the paper briefly describes experience with linear regression analysis. Recently, variance computations for a number of surveys of the Census Bureau have been implemented through “replicate weighting”; the principal application has been for variances of simple statistics, but this technique also facilitates variance computation for virtually any complex analytic model. Finally, approaches and experience with log-linear models are reported.

    Release date: 1984-06-15

  • Articles and reports: 12-001-X198100214319
    Description:

    The problems associated with making analytical inferences from data based on complex sample designs are reviewed. A basic issue is the definition of the parameter of interest and whether it is a superpopulation model parameter or a finite population parameter. General methods based on a generalized Wald Statistics and its modification or on modifications of classical test statistics are discussed. More detail is given on specific methods-on linear models and regression and on categorical data analysis.

    Release date: 1981-12-15
Reference (3)

Reference (3) ((3 results))

  • Surveys and statistical programs – Documentation: 12-001-X19970013101
    Description:

    In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

    Release date: 1997-08-18

  • Surveys and statistical programs – Documentation: 12-001-X19970013102
    Description:

    The selection of auxiliary variables is considered for regression estimation in finite populations under a simple random sampling design. This problem is a basic one for model-based and model-assisted survey sampling approaches and is of practical importance when the number of variables available is large. An approach is developed in which a mean squared error estimator is minimised. This approach is compared to alternative approaches using a fixed set of auxiliary variables, a conventional significance test criterion, a condition number reduction approach and a ridge regression approach. The proposed approach is found to perform well in terms of efficiency. It is noted that the variable selection approach affects the properties of standard variance estimators and thus leads to a problem of variance estimation.

    Release date: 1997-08-18

  • Surveys and statistical programs – Documentation: 12-001-X19960022980
    Description:

    In this paper, we study a confidence interval estimation method for a finite population average when some auxiliairy information is available. As demonstrated by Royall and Cumberland in a series of empirical studies, naive use of existing methods to construct confidence intervals for population averages may result in very poor conditional coverage probabilities, conditional on the sample mean of the covariate. When this happens, we propose to transform the data to improve the precision of the normal approximation. The transformed data are then used to make inference on the original population average, and the auxiliary information is incorporated into the inference directly, or by calibration with empirical likelihood. Our approach is design-based. We apply our approach to six real populations and find that when transformation is needed, our approach performs well compared to the usual regression method.

    Release date: 1997-01-30
Date modified: