Inference and foundations

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

1 facets displayed. 0 facets selected.

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (100)

All (100) (50 to 60 of 100 results)

  • Articles and reports: 11F0024M20050008805
    Description:

    This paper reports on the potential development of sub-annual indicators for selected service industries using Goods and Services Tax (GST) data. The services sector is now of central importance to advanced economies; however, our knowledge of this sector remains incomplete, partly due to a lack of data. The Voorburg Group on Service Statistics has been meeting for almost twenty years to develop and incorporate better measures for the services sector. Despite this effort, many sub-annual economic measures continue to rely on output data for the goods-producing sector and, with the exception of distributive trades, on employment data for service industries.

    The development of sub-annual indicators for service industries raises two questions regarding the national statistical program. First, is there a need for service output indicators to supplement existing sub-annual measures? And second, what service industries are the most promising for development? The paper begins by reviewing the importance of service industries and how they behave during economic downturns. Next, it examines considerations in determining which service industries to select as GST-based, sub-annual indicators. A case study of the accommodation services industry serves to illustrate improving timeliness and accuracy. We conclude by discussing the opportunities for, and limitations of, these indicators.

    Release date: 2005-10-20

  • Articles and reports: 12-002-X20050018030
    Description:

    People often wish to use survey micro-data to study whether the rate of occurrence of a particular condition in a subpopulation is the same as the rate of occurrence in the full population. This paper describes some alternatives for making inferences about such a rate difference and shows whether and how these alternatives may be implemented in three different survey software packages. The software packages illustrated - SUDAAN, WesVar and Bootvar - all can make use of bootstrap weights provided by the analyst to carry out variance estimation.

    Release date: 2005-06-23

  • Articles and reports: 12-001-X20040027753
    Description:

    Samplers often distrust model-based approaches to survey inference because of concerns about misspecification when models are applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator can be very inefficient. In Zheng and Little (2003, 2004) we used penalized splines (p-splines) to model smoothly - varying relationships between the outcome and the inclusion probabilities in one-stage probability proportional to size (PPS) samples. We showed that p spline model-based estimators are in general more efficient than the HT estimator, and can provide narrower confidence intervals with close to nominal confidence coverage. In this article, we extend this approach to two-stage sampling designs. We use a p-spline based mixed model that fits a nonparametric relationship between the primary sampling unit (PSU) means and a measure of PSU size, and incorporates random effects to model clustering. For variance estimation we consider the empirical Bayes model-based variance, the jackknife and balanced repeated replication (BRR) methods. Simulation studies on simulated data and samples drawn from public use microdata in the 1990 census demonstrate gains for the model-based p-spline estimator over the HT estimator and linear model-assisted estimators. Simulations also show the variance estimation methods yield confidence intervals with satisfactory confidence coverage. Interestingly, these gains can be seen for a common equal-probability design, where the first stage selection is PPS and the second stage selection probabilities are proportional to the inverse of the first stage inclusion probabilities, and the HT estimator leads to the unweighted mean. In situations that most favor the HT estimator, the model-based estimators have comparable efficiency.

    Release date: 2005-02-03

  • Articles and reports: 11-522-X20030017700
    Description:

    This paper suggests a useful framework for exploring the effects of moderate deviations from idealized conditions. It offers evaluation criteria for point estimators and interval estimators.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20030017722
    Description:

    This paper shows how to adapt design-based and model-based frameworks to the case of two-stage sampling.

    Release date: 2005-01-26

  • Surveys and statistical programs – Documentation: 12-002-X20040027035
    Description:

    As part of the processing of the National Longitudinal Survey of Children and Youth (NLSCY) cycle 4 data, historical revisions have been made to the data of the first 3 cycles, either to correct errors or to update the data. During processing, particular attention was given to the PERSRUK (Person Identifier) and the FIELDRUK (Household Identifier). The same level of attention has not been given to the other identifiers that are included in the data base, the CHILDID (Person identifier) and the _IDHD01 (Household identifier). These identifiers have been created for the public files and can also be found in the master files by default. The PERSRUK should be used to link records between files and the FIELDRUK to determine the household when using the master files.

    Release date: 2004-10-05

  • Articles and reports: 11-522-X20020016708
    Description:

    In this paper, we discuss the analysis of complex health survey data by using multivariate modelling techniques. Main interests are in various design-based and model-based methods that aim at accounting for the design complexities, including clustering, stratification and weighting. Methods covered include generalized linear modelling based on pseudo-likelihood and generalized estimating equations, linear mixed models estimated by restricted maximum likelihood, and hierarchical Bayes techniques using Markov Chain Monte Carlo (MCMC) methods. The methods will be compared empirically, using data from an extensive health interview and examination survey conducted in Finland in 2000 (Health 2000 Study).

    The data of the Health 2000 Study were collected using personal interviews, questionnaires and clinical examinations. A stratified two-stage cluster sampling design was used in the survey. The sampling design involved positive intra-cluster correlation for many study variables. For a closer investigation, we selected a small number of study variables from the health interview and health examination phases. In many cases, the different methods produced similar numerical results and supported similar statistical conclusions. Methods that failed to account for the design complexities sometimes led to conflicting conclusions. We also discuss the application of the methods in this paper by using standard statistical software products.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016717
    Description:

    In the United States, the National Health and Nutrition Examination Survey (NHANES) is linked to the National Health Interview Survey (NHIS) at the primary sampling unit level (the same counties, but not necessarily the same persons, are in both surveys). The NHANES examines about 5,000 persons per year, while the NHIS samples about 100,000 persons per year. In this paper, we present and develop properties of models that allow NHIS and administrative data to be used as auxiliary information for estimating quantities of interest in the NHANES. The methodology, related to Fay-Herriot (1979) small-area models and to calibration estimators in Deville and Sarndal (1992), accounts for the survey designs in the error structure.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016719
    Description:

    This study takes a look at the modelling methods used for public health data. Public health has a renewed interest in the impact of the environment on health. Ecological or contextual studies ideally investigate these relationships using public health data augmented with environmental characteristics in multilevel or hierarchical models. In these models, individual respondents in health data are the first level and community data are the second level. Most public health data use complex sample survey designs, which require analyses accounting for the clustering, nonresponse, and poststratification to obtain representative estimates of prevalence of health risk behaviours.

    This study uses the Behavioral Risk Factor Surveillance System (BRFSS), a state-specific US health risk factor surveillance system conducted by the Center for Disease Control and Prevention, which assesses health risk factors in over 200,000 adults annually. BRFSS data are now available at the metropolitan statistical area (MSA) level and provide quality health information for studies of environmental effects. MSA-level analyses combining health and environmental data are further complicated by joint requirements of the survey sample design and the multilevel analyses.

    We compare three modelling methods in a study of physical activity and selected environmental factors using BRFSS 2000 data. Each of the methods described here is a valid way to analyse complex sample survey data augmented with environmental information, although each accounts for the survey design and multilevel data structure in a different manner and is thus appropriate for slightly different research questions.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016727
    Description:

    The census data are widely used in the distribution and targeting of resources at national, regional and local levels. In the United Kingdom (UK), a population census is conducted every 10 years. As time elapses, the census data become outdated and less relevant, thus making the distribution of resources less equitable. This paper examines alternative methods in rectifying this.

    A number of small area methods have been developed for producing postcensal estimates, including the Structural Preserving Estimation technique as a result of Purcell and Kish (1980). This paper develops an alternative approach that is based on a linear mixed modelling approach to producing postcensal estimates. The validity of the methodology is tested on simulated data from the Finnish population register and the technique is applied to producing updated estimates for a number of the 1991 UK census variables.

    Release date: 2004-09-13
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (92)

Analysis (92) (10 to 20 of 92 results)

  • Articles and reports: 12-001-X202200200008
    Description:

    This response contains additional remarks on a few selected issues raised by the discussants.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200200011
    Description:

    Two-phase sampling is a cost effective sampling design employed extensively in surveys. In this paper a method of most efficient linear estimation of totals in two-phase sampling is proposed, which exploits optimally auxiliary survey information. First, a best linear unbiased estimator (BLUE) of any total is formally derived in analytic form, and shown to be also a calibration estimator. Then, a proper reformulation of such a BLUE and estimation of its unknown coefficients leads to the construction of an “optimal” regression estimator, which can also be obtained through a suitable calibration procedure. A distinctive feature of such calibration is the alignment of estimates from the two phases in an one-step procedure involving the combined first-and-second phase samples. Optimal estimation is feasible for certain two-phase designs that are used often in large scale surveys. For general two-phase designs, an alternative calibration procedure gives a generalized regression estimator as an approximate optimal estimator. The proposed general approach to optimal estimation leads to the most effective use of the available auxiliary information in any two-phase survey. The advantages of this approach over existing methods of estimation in two-phase sampling are shown both theoretically and through a simulation study.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200100004
    Description:

    When the sample size of an area is small, borrowing information from neighbors is a small area estimation technique to provide more reliable estimates. One of the famous models in small area estimation is a multinomial-Dirichlet hierarchical model for multinomial counts. Due to natural characteristics of the data, making unimodal order restriction assumption to parameter spaces is relevant. In our application, body mass index is more likely at an overweight level, which means the unimodal order restriction may be reasonable. The same unimodal order restriction for all areas may be too strong to be true for some cases. To increase flexibility, we add uncertainty to the unimodal order restriction. Each area will have similar unimodal patterns, but not the same. Since the order restriction with uncertainty increases the inference difficulty, we make comparison with the posterior summaries and approximated log-pseudo marginal likelihood.

    Release date: 2022-06-21

  • Articles and reports: 12-001-X202200100009
    Description:

    In finite population estimation, the inverse probability or Horvitz-Thompson estimator is a basic tool. Even when auxiliary information is available to model the variable of interest, it is still used to estimate the model error. Here, the inverse probability estimator is generalized by introducing a positive definite matrix. The usual inverse probability estimator is a special case of the generalized estimator, where the positive definite matrix is the identity matrix. Since calibration estimation seeks weights that are close to the inverse probability weights, it too can be generalized by seeking weights that are close to those of the generalized inverse probability estimator. Calibration is known to be optimal, in the sense that it asymptotically attains the Godambe-Joshi lower bound. That lower bound has been derived under a model where no correlation is present. This too, can be generalized to allow for correlation. With the correct choice of the positive definite matrix that generalizes the calibration estimators, this generalized lower bound can be asymptotically attained. There is often no closed-form formula for the generalized estimators. However, simple explicit examples are given here to illustrate how the generalized estimators take advantage of the correlation. This simplicity is achieved here, by assuming a correlation of one between some population units. Those simple estimators can still be useful, even if the correlation is smaller than one. Simulation results are used to compare the generalized estimators to the ordinary estimators.

    Release date: 2022-06-21

  • Articles and reports: 12-001-X202100200003
    Description:

    Calibration weighting is a statistically efficient way for handling unit nonresponse. Assuming the response (or output) model justifying the calibration-weight adjustment is correct, it is often possible to measure the variance of estimates in an asymptotically unbiased manner. One approach to variance estimation is to create jackknife replicate weights. Sometimes, however, the conventional method for computing jackknife replicate weights for calibrated analysis weights fails. In that case, an alternative method for computing jackknife replicate weights is usually available. That method is described here and then applied to a simple example.

    Release date: 2022-01-06

  • Articles and reports: 12-001-X202100200006
    Description:

    Sample-based calibration occurs when the weights of a survey are calibrated to control totals that are random, instead of representing fixed population-level totals. Control totals may be estimated from different phases of the same survey or from another survey. Under sample-based calibration, valid variance estimation requires that the error contribution due to estimating the control totals be accounted for. We propose a new variance estimation method that directly uses the replicate weights from two surveys, one survey being used to provide control totals for calibration of the other survey weights. No restrictions are set on the nature of the two replication methods and no variance-covariance estimates need to be computed, making the proposed method straightforward to implement in practice. A general description of the method for surveys with two arbitrary replication methods with different numbers of replicates is provided. It is shown that the resulting variance estimator is consistent for the asymptotic variance of the calibrated estimator, when calibration is done using regression estimation or raking. The method is illustrated in a real-world application, in which the demographic composition of two surveys needs to be harmonized to improve the comparability of the survey estimates.

    Release date: 2022-01-06

  • Articles and reports: 12-001-X202000100001
    Description:

    For several decades, national statistical agencies around the world have been using probability surveys as their preferred tool to meet information needs about a population of interest. In the last few years, there has been a wind of change and other data sources are being increasingly explored. Five key factors are behind this trend: the decline in response rates in probability surveys, the high cost of data collection, the increased burden on respondents, the desire for access to “real-time” statistics, and the proliferation of non-probability data sources. Some people have even come to believe that probability surveys could gradually disappear. In this article, we review some approaches that can reduce, or even eliminate, the use of probability surveys, all the while preserving a valid statistical inference framework. All the approaches we consider use data from a non-probability source; data from a probability survey are also used in most cases. Some of these approaches rely on the validity of model assumptions, which contrasts with approaches based on the probability sampling design. These design-based approaches are generally not as efficient; yet, they are not subject to the risk of bias due to model misspecification.

    Release date: 2020-06-30

  • Articles and reports: 12-001-X201800254956
    Description:

    In Italy, the Labor Force Survey (LFS) is conducted quarterly by the National Statistical Institute (ISTAT) to produce estimates of the labor force status of the population at different geographical levels. In particular, ISTAT provides LFS estimates of employed and unemployed counts for local Labor Market Areas (LMAs). LMAs are 611 sub-regional clusters of municipalities and are unplanned domains for which direct estimates have overly large sampling errors. This implies the need of Small Area Estimation (SAE) methods. In this paper, we develop a new area level SAE method that uses a Latent Markov Model (LMM) as linking model. In LMMs, the characteristic of interest, and its evolution in time, is represented by a latent process that follows a Markov chain, usually of first order. Therefore, areas are allowed to change their latent state across time. The proposed model is applied to quarterly data from the LFS for the period 2004 to 2014 and fitted within a hierarchical Bayesian framework using a data augmentation Gibbs sampler. Estimates are compared with those obtained by the classical Fay-Herriot model, by a time-series area level SAE model, and on the basis of data coming from the 2011 Population Census.

    Release date: 2018-12-20

  • Articles and reports: 12-001-X201800154928
    Description:

    A two-phase process was used by the Substance Abuse and Mental Health Services Administration to estimate the proportion of US adults with serious mental illness (SMI). The first phase was the annual National Survey on Drug Use and Health (NSDUH), while the second phase was a random subsample of adult respondents to the NSDUH. Respondents to the second phase of sampling were clinically evaluated for serious mental illness. A logistic prediction model was fit to this subsample with the SMI status (yes or no) determined by the second-phase instrument treated as the dependent variable and related variables collected on the NSDUH from all adults as the model’s explanatory variables. Estimates were then computed for SMI prevalence among all adults and within adult subpopulations by assigning an SMI status to each NSDUH respondent based on comparing his (her) estimated probability of having SMI to a chosen cut point on the distribution of the predicted probabilities. We investigate alternatives to this standard cut point estimator such as the probability estimator. The latter assigns an estimated probability of having SMI to each NSDUH respondent. The estimated prevalence of SMI is the weighted mean of those estimated probabilities. Using data from NSDUH and its subsample, we show that, although the probability estimator has a smaller mean squared error when estimating SMI prevalence among all adults, it has a greater tendency to be biased at the subpopulation level than the standard cut point estimator.

    Release date: 2018-06-21

  • Articles and reports: 12-001-X201700254872
    Description:

    This note discusses the theoretical foundations for the extension of the Wilson two-sided coverage interval to an estimated proportion computed from complex survey data. The interval is shown to be asymptotically equivalent to an interval derived from a logistic transformation. A mildly better version is discussed, but users may prefer constructing a one-sided interval already in the literature.

    Release date: 2017-12-21
Reference (8)

Reference (8) ((8 results))

No content available at this time.

Date modified: