Keyword search

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (39)

All (39) (0 to 10 of 39 results)

  • Articles and reports: 12-001-X202300200003
    Description: We investigate small area prediction of general parameters based on two models for unit-level counts. We construct predictors of parameters, such as quartiles, that may be nonlinear functions of the model response variable. We first develop a procedure to construct empirical best predictors and mean square error estimators of general parameters under a unit-level gamma-Poisson model. We then use a sampling importance resampling algorithm to develop predictors for a generalized linear mixed model (GLMM) with a Poisson response distribution. We compare the two models through simulation and an analysis of data from the Iowa Seat-Belt Use Survey.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202200200002
    Description:

    We provide a critical review and some extended discussions on theoretical and practical issues with analysis of non-probability survey samples. We attempt to present rigorous inferential frameworks and valid statistical procedures under commonly used assumptions, and address issues on the justification and verification of assumptions in practical applications. Some current methodological developments are showcased, and problems which require further investigation are mentioned. While the focus of the paper is on non-probability samples, the essential role of probability survey samples with rich and relevant information on auxiliary variables is highlighted.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202100200005
    Description:

    Variance estimation is a challenging problem in surveys because there are several nontrivial factors contributing to the total survey error, including sampling and unit non-response. Initially devised to capture the variance of non-trivial statistics based on independent and identically distributed data, the bootstrap method has since been adapted in various ways to address survey-specific elements/factors. In this paper we look into one of those variants, the with-replacement bootstrap. We consider household surveys, with or without sub-sampling of individuals. We make explicit the benchmark variance estimators that the with-replacement bootstrap aims at reproducing. We explain how the bootstrap can be used to account for the impact sampling, treatment of non-response and calibration have on total survey error. For clarity, the proposed methods are illustrated on a running example. They are evaluated through a simulation study, and applied to a French Panel for Urban Policy. Two SAS macros to perform the bootstrap methods are also developed.

    Release date: 2022-01-06

  • Articles and reports: 12-001-X201900100002
    Description:

    Item nonresponse is frequently encountered in sample surveys. Hot-deck imputation is commonly used to fill in missing item values within homogeneous groups called imputation classes. We propose a fractional hot-deck imputation procedure and an associated empirical likelihood for inference on the population mean of a function of a variable of interest with missing data under probability proportional to size sampling with negligible sampling fractions. We derive the limiting distributions of the maximum empirical likelihood estimator and empirical likelihood ratio, and propose two related asymptotically valid bootstrap procedures to construct confidence intervals for the population mean. Simulation studies show that the proposed bootstrap procedures outperform the customary bootstrap procedures which are shown to be asymptotically incorrect when the number of random draws in the fractional imputation is fixed. Moreover, the proposed bootstrap procedure based on the empirical likelihood ratio is seen to perform significantly better than the method based on the limiting distribution of the maximum empirical likelihood estimator when the inclusion probabilities vary considerably or when the sample size is not large.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201900100003
    Description:

    In this short article, I will attempt to provide some highlights of my chancy life as a Statistician in chronological order spanning over sixty years, 1954 to present.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201800154926
    Description:

    This paper investigates the linearization and bootstrap variance estimation for the Gini coefficient and the change between Gini indexes at two periods of time. For the one-sample case, we use the influence function linearization approach suggested by Deville (1999), the without-replacement bootstrap suggested by Gross (1980) for simple random sampling without replacement and the with-replacement of primary sampling units described in Rao and Wu (1988) for multistage sampling. To obtain a two-sample variance estimator, we use the linearization technique by means of partial influence functions (Goga, Deville and Ruiz-Gazen, 2009). We also develop an extension of the studied bootstrap procedures for two-dimensional sampling. The two approaches are compared on simulated data.

    Release date: 2018-06-21

  • Articles and reports: 12-001-X201500214238
    Description:

    Félix-Medina and Thompson (2004) proposed a variant of link-tracing sampling to sample hidden and/or hard-to-detect human populations such as drug users and sex workers. In their variant, an initial sample of venues is selected and the people found in the sampled venues are asked to name other members of the population to be included in the sample. Those authors derived maximum likelihood estimators of the population size under the assumption that the probability that a person is named by another in a sampled venue (link-probability) does not depend on the named person (homogeneity assumption). In this work we extend their research to the case of heterogeneous link-probabilities and derive unconditional and conditional maximum likelihood estimators of the population size. We also propose profile likelihood and bootstrap confidence intervals for the size of the population. The results of simulations studies carried out by us show that in presence of heterogeneous link-probabilities the proposed estimators perform reasonably well provided that relatively large sampling fractions, say larger than 0.5, be used, whereas the estimators derived under the homogeneity assumption perform badly. The outcomes also show that the proposed confidence intervals are not very robust to deviations from the assumed models.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114200
    Description:

    We consider the observed best prediction (OBP; Jiang, Nguyen and Rao 2011) for small area estimation under the nested-error regression model, where both the mean and variance functions may be misspecified. We show via a simulation study that the OBP may significantly outperform the empirical best linear unbiased prediction (EBLUP) method not just in the overall mean squared prediction error (MSPE) but also in the area-specific MSPE for every one of the small areas. A bootstrap method is proposed for estimating the design-based area-specific MSPE, which is simple and always produces positive MSPE estimates. The performance of the proposed MSPE estimator is evaluated through a simulation study. An application to the Television School and Family Smoking Prevention and Cessation study is considered.

    Release date: 2015-06-29

  • Articles and reports: 12-002-X201400111901
    Description:

    This document is for analysts/researchers who are considering doing research with data from a survey where both survey weights and bootstrap weights are provided in the data files. This document gives directions, for some selected software packages, about how to get started in using survey weights and bootstrap weights for an analysis of survey data. We give brief directions for obtaining survey-weighted estimates, bootstrap variance estimates (and other desired error quantities) and some typical test statistics for each software package in turn. While these directions are provided just for the chosen examples, there will be information about the range of weighted and bootstrapped analyses that can be carried out by each software package.

    Release date: 2014-08-07

  • Articles and reports: 12-001-X201400114003
    Description:

    Outside of the survey sampling literature, samples are often assumed to be generated by simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

    Release date: 2014-06-27
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (38)

Analysis (38) (0 to 10 of 38 results)

  • Articles and reports: 12-001-X202300200003
    Description: We investigate small area prediction of general parameters based on two models for unit-level counts. We construct predictors of parameters, such as quartiles, that may be nonlinear functions of the model response variable. We first develop a procedure to construct empirical best predictors and mean square error estimators of general parameters under a unit-level gamma-Poisson model. We then use a sampling importance resampling algorithm to develop predictors for a generalized linear mixed model (GLMM) with a Poisson response distribution. We compare the two models through simulation and an analysis of data from the Iowa Seat-Belt Use Survey.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202200200002
    Description:

    We provide a critical review and some extended discussions on theoretical and practical issues with analysis of non-probability survey samples. We attempt to present rigorous inferential frameworks and valid statistical procedures under commonly used assumptions, and address issues on the justification and verification of assumptions in practical applications. Some current methodological developments are showcased, and problems which require further investigation are mentioned. While the focus of the paper is on non-probability samples, the essential role of probability survey samples with rich and relevant information on auxiliary variables is highlighted.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202100200005
    Description:

    Variance estimation is a challenging problem in surveys because there are several nontrivial factors contributing to the total survey error, including sampling and unit non-response. Initially devised to capture the variance of non-trivial statistics based on independent and identically distributed data, the bootstrap method has since been adapted in various ways to address survey-specific elements/factors. In this paper we look into one of those variants, the with-replacement bootstrap. We consider household surveys, with or without sub-sampling of individuals. We make explicit the benchmark variance estimators that the with-replacement bootstrap aims at reproducing. We explain how the bootstrap can be used to account for the impact sampling, treatment of non-response and calibration have on total survey error. For clarity, the proposed methods are illustrated on a running example. They are evaluated through a simulation study, and applied to a French Panel for Urban Policy. Two SAS macros to perform the bootstrap methods are also developed.

    Release date: 2022-01-06

  • Articles and reports: 12-001-X201900100002
    Description:

    Item nonresponse is frequently encountered in sample surveys. Hot-deck imputation is commonly used to fill in missing item values within homogeneous groups called imputation classes. We propose a fractional hot-deck imputation procedure and an associated empirical likelihood for inference on the population mean of a function of a variable of interest with missing data under probability proportional to size sampling with negligible sampling fractions. We derive the limiting distributions of the maximum empirical likelihood estimator and empirical likelihood ratio, and propose two related asymptotically valid bootstrap procedures to construct confidence intervals for the population mean. Simulation studies show that the proposed bootstrap procedures outperform the customary bootstrap procedures which are shown to be asymptotically incorrect when the number of random draws in the fractional imputation is fixed. Moreover, the proposed bootstrap procedure based on the empirical likelihood ratio is seen to perform significantly better than the method based on the limiting distribution of the maximum empirical likelihood estimator when the inclusion probabilities vary considerably or when the sample size is not large.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201900100003
    Description:

    In this short article, I will attempt to provide some highlights of my chancy life as a Statistician in chronological order spanning over sixty years, 1954 to present.

    Release date: 2019-05-07

  • Articles and reports: 12-001-X201800154926
    Description:

    This paper investigates the linearization and bootstrap variance estimation for the Gini coefficient and the change between Gini indexes at two periods of time. For the one-sample case, we use the influence function linearization approach suggested by Deville (1999), the without-replacement bootstrap suggested by Gross (1980) for simple random sampling without replacement and the with-replacement of primary sampling units described in Rao and Wu (1988) for multistage sampling. To obtain a two-sample variance estimator, we use the linearization technique by means of partial influence functions (Goga, Deville and Ruiz-Gazen, 2009). We also develop an extension of the studied bootstrap procedures for two-dimensional sampling. The two approaches are compared on simulated data.

    Release date: 2018-06-21

  • Articles and reports: 12-001-X201500214238
    Description:

    Félix-Medina and Thompson (2004) proposed a variant of link-tracing sampling to sample hidden and/or hard-to-detect human populations such as drug users and sex workers. In their variant, an initial sample of venues is selected and the people found in the sampled venues are asked to name other members of the population to be included in the sample. Those authors derived maximum likelihood estimators of the population size under the assumption that the probability that a person is named by another in a sampled venue (link-probability) does not depend on the named person (homogeneity assumption). In this work we extend their research to the case of heterogeneous link-probabilities and derive unconditional and conditional maximum likelihood estimators of the population size. We also propose profile likelihood and bootstrap confidence intervals for the size of the population. The results of simulations studies carried out by us show that in presence of heterogeneous link-probabilities the proposed estimators perform reasonably well provided that relatively large sampling fractions, say larger than 0.5, be used, whereas the estimators derived under the homogeneity assumption perform badly. The outcomes also show that the proposed confidence intervals are not very robust to deviations from the assumed models.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114200
    Description:

    We consider the observed best prediction (OBP; Jiang, Nguyen and Rao 2011) for small area estimation under the nested-error regression model, where both the mean and variance functions may be misspecified. We show via a simulation study that the OBP may significantly outperform the empirical best linear unbiased prediction (EBLUP) method not just in the overall mean squared prediction error (MSPE) but also in the area-specific MSPE for every one of the small areas. A bootstrap method is proposed for estimating the design-based area-specific MSPE, which is simple and always produces positive MSPE estimates. The performance of the proposed MSPE estimator is evaluated through a simulation study. An application to the Television School and Family Smoking Prevention and Cessation study is considered.

    Release date: 2015-06-29

  • Articles and reports: 12-002-X201400111901
    Description:

    This document is for analysts/researchers who are considering doing research with data from a survey where both survey weights and bootstrap weights are provided in the data files. This document gives directions, for some selected software packages, about how to get started in using survey weights and bootstrap weights for an analysis of survey data. We give brief directions for obtaining survey-weighted estimates, bootstrap variance estimates (and other desired error quantities) and some typical test statistics for each software package in turn. While these directions are provided just for the chosen examples, there will be information about the range of weighted and bootstrapped analyses that can be carried out by each software package.

    Release date: 2014-08-07

  • Articles and reports: 12-001-X201400114003
    Description:

    Outside of the survey sampling literature, samples are often assumed to be generated by simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

    Release date: 2014-06-27
Reference (1)

Reference (1) ((1 result))

  • Surveys and statistical programs – Documentation: 11-522-X19980015017
    Description:

    Longitudinal studies with repeated observations on individuals permit better characterizations of change and assessment of possible risk factors, but there has been little experience applying sophisticated models for longitudinal data to the complex survey setting. We present results from a comparison of different variance estimation methods for random effects models of change in cognitive function among older adults. The sample design is a stratified sample of people 65 and older, drawn as part of a community-based study designed to examine risk factors for dementia. The model summarizes the population heterogeneity in overall level and rate of change in cognitive function using random effects for intercept and slope. We discuss an unweighted regression including covariates for the stratification variables, a weighted regression, and bootstrapping; we also did preliminary work into using balanced repeated replication and jackknife repeated replication.

    Release date: 1999-10-22
Date modified: