Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (901)

All (901) (10 to 20 of 901 results)

  • Articles and reports: 12-001-X202400100010
    Description: This discussion summarizes the interesting new findings around measurement errors in opt-in surveys by Kennedy, Mercer and Lau (KML). While KML enlighten readers about “bogus responding” and possible patterns in them, this discussion suggests combining these new-found results with other avenues of research in nonprobability sampling, such as improvement of representativeness.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100011
    Description: Kennedy, Mercer, and Lau explore misreporting by respondents in non-probability samples and discover a new feature, namely that of deliberate misreporting of demographic characteristics. This finding suggests that the “arms race” between researchers and those determined to disrupt the practice of social science is not over and researchers need to account for such respondents if using high-quality probability surveys to help reduce error in non-probability samples.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100012
    Description: Nonprobability samples are quick and low-cost and have become popular for some types of survey research. Kennedy, Mercer and Lau examine data quality issues associated with opt-in nonprobability samples frequently used in the United States. They show that the estimates from these samples have serious problems that go beyond representativeness. A total survey error perspective is important for evaluating all types of surveys.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100013
    Description: Statistical approaches developed for nonprobability samples generally focus on nonrandom selection as the primary reason survey respondents might differ systematically from the target population. Well-established theory states that in these instances, by conditioning on the necessary auxiliary variables, selection can be rendered ignorable and survey estimates will be free of bias. But this logic rests on the assumption that measurement error is nonexistent or small. In this study we test this assumption in two ways. First, we use a large benchmarking study to identify subgroups for which errors in commercial, online nonprobability samples are especially large in ways that are unlikely due to selection effects. Then we present a follow-up study examining one cause of the large errors: bogus responding (i.e., survey answers that are fraudulent, mischievous or otherwise insincere). We find that bogus responding, particularly among respondents identifying as young or Hispanic, is a significant and widespread problem in commercial, online nonprobability samples, at least in the United States. This research highlights the need for statisticians working with commercial nonprobability samples to address bogus responding and issues of representativeness – not just the latter.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100014
    Description: This paper is an introduction to the special issue on the use of nonprobability samples featuring three papers that were presented at the 29th Morris Hansen Lecture by Courtney Kennedy, Yan Li and Jean-François Beaumont.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202300200001
    Description: When a Medicare healthcare provider is suspected of billing abuse, a population of payments X made to that provider over a fixed timeframe is isolated. A certified medical reviewer, in a time-consuming process, can determine the overpayment Y = X - (amount justified by the evidence) associated with each payment. Typically, there are too many payments in the population to examine each with care, so a probability sample is selected. The sample overpayments are then used to calculate a 90% lower confidence bound for the total population overpayment. This bound is the amount demanded for recovery from the provider. Unfortunately, classical methods for calculating this bound sometimes fail to provide the 90% confidence level, especially when using a stratified sample.

    In this paper, 166 redacted samples from Medicare integrity investigations are displayed and described, along with 156 associated payment populations. The 7,588 examined (Y, X) sample pairs show (1) Medicare audits have high error rates: more than 76% of these payments were considered to have been paid in error; and (2) the patterns in these samples support an “All-or-Nothing” mixture model for (Y, X) previously defined in the literature. Model-based Monte Carlo testing procedures for Medicare sampling plans are discussed, as well as stratification methods based on anticipated model moments. In terms of viability (achieving the 90% confidence level) a new stratification method defined here is competitive with the best of the many existing methods tested and seems less sensitive to choice of operating parameters. In terms of overpayment recovery (equivalent to precision) the new method is also comparable to the best of the many existing methods tested. Unfortunately, no stratification algorithm tested was ever viable for more than about half of the 104 test populations.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200002
    Description: Being able to quantify the accuracy (bias, variance) of published output is crucial in official statistics. Output in official statistics is nearly always divided into subpopulations according to some classification variable, such as mean income by categories of educational level. Such output is also referred to as domain statistics. In the current paper, we limit ourselves to binary classification variables. In practice, misclassifications occur and these contribute to the bias and variance of domain statistics. Existing analytical and numerical methods to estimate this effect have two disadvantages. The first disadvantage is that they require that the misclassification probabilities are known beforehand and the second is that the bias and variance estimates are biased themselves. In the current paper we present a new method, a Gaussian mixture model estimated by an Expectation-Maximisation (EM) algorithm combined with a bootstrap, referred to as the EM bootstrap method. This new method does not require that the misclassification probabilities are known beforehand, although it is more efficient when a small audit sample is used that yields a starting value for the misclassification probabilities in the EM algorithm. We compared the performance of the new method with currently available numerical methods: the bootstrap method and the SIMEX method. Previous research has shown that for non-linear parameters the bootstrap outperforms the analytical expressions. For nearly all conditions tested, the bias and variance estimates that are obtained by the EM bootstrap method are closer to their true values than those obtained by the bootstrap and SIMEX methods. We end this paper by discussing the results and possible future extensions of the method.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200003
    Description: We investigate small area prediction of general parameters based on two models for unit-level counts. We construct predictors of parameters, such as quartiles, that may be nonlinear functions of the model response variable. We first develop a procedure to construct empirical best predictors and mean square error estimators of general parameters under a unit-level gamma-Poisson model. We then use a sampling importance resampling algorithm to develop predictors for a generalized linear mixed model (GLMM) with a Poisson response distribution. We compare the two models through simulation and an analysis of data from the Iowa Seat-Belt Use Survey.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200004
    Description: We present a novel methodology to benchmark county-level estimates of crop area totals to a preset state total subject to inequality constraints and random variances in the Fay-Herriot model. For planted area of the National Agricultural Statistics Service (NASS), an agency of the United States Department of Agriculture (USDA), it is necessary to incorporate the constraint that the estimated totals, derived from survey and other auxiliary data, are no smaller than administrative planted area totals prerecorded by other USDA agencies except NASS. These administrative totals are treated as fixed and known, and this additional coherence requirement adds to the complexity of benchmarking the county-level estimates. A fully Bayesian analysis of the Fay-Herriot model offers an appealing way to incorporate the inequality and benchmarking constraints, and to quantify the resulting uncertainties, but sampling from the posterior densities involves difficult integration, and reasonable approximations must be made. First, we describe a single-shrinkage model, shrinking the means while the variances are assumed known. Second, we extend this model to accommodate double shrinkage, borrowing strength across means and variances. This extended model has two sources of extra variation, but because we are shrinking both means and variances, it is expected that this second model should perform better in terms of goodness of fit (reliability) and possibly precision. The computations are challenging for both models, which are applied to simulated data sets with properties resembling the Illinois corn crop.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202300200005
    Description: Population undercoverage is one of the main hurdles faced by statistical analysis with non-probability survey samples. We discuss two typical scenarios of undercoverage, namely, stochastic undercoverage and deterministic undercoverage. We argue that existing estimation methods under the positivity assumption on the propensity scores (i.e., the participation probabilities) can be directly applied to handle the scenario of stochastic undercoverage. We explore strategies for mitigating biases in estimating the mean of the target population under deterministic undercoverage. In particular, we examine a split population approach based on a convex hull formulation, and construct estimators with reduced biases. A doubly robust estimator can be constructed if a followup subsample of the reference probability survey with measurements on the study variable becomes feasible. Performances of six competing estimators are investigated through a simulation study and issues which require further investigation are briefly discussed.
    Release date: 2024-01-03
Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (897)

Articles and reports (897) (0 to 10 of 897 results)

  • Articles and reports: 12-001-X202400100001
    Description: Inspired by the two excellent discussions of our paper, we offer some new insights and developments into the problem of estimating participation probabilities for non-probability samples. First, we propose an improvement of the method of Chen, Li and Wu (2020), based on best linear unbiased estimation theory, that more efficiently leverages the available probability and non-probability sample data. We also develop a sample likelihood approach, similar in spirit to the method of Elliott (2009), that properly accounts for the overlap between both samples when it can be identified in at least one of the samples. We use best linear unbiased prediction theory to handle the scenario where the overlap is unknown. Interestingly, our two proposed approaches coincide in the case of unknown overlap. Then, we show that many existing methods can be obtained as a special case of a general unbiased estimating function. Finally, we conclude with some comments on nonparametric estimation of participation probabilities.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100002
    Description: We provide comparisons among three parametric methods for the estimation of participation probabilities and some brief comments on homogeneous groups and post-stratification.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100003
    Description: Beaumont, Bosa, Brennan, Charlebois and Chu (2024) propose innovative model selection approaches for estimation of participation probabilities for non-probability sample units. We focus our discussion on the choice of a likelihood and parameterization of the model, which are key for the effectiveness of the techniques developed in the paper. We consider alternative likelihood and pseudo-likelihood based methods for estimation of participation probabilities and present simulations implementing and comparing the AIC based variable selection. We demonstrate that, under important practical scenarios, the approach based on a likelihood formulated over the observed pooled non-probability and probability samples performed better than the pseudo-likelihood based alternatives. The contrast in sensitivity of the AIC criteria is especially large for small probability sample sizes and low overlap in covariates domains.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100004
    Description: Non-probability samples are being increasingly explored in National Statistical Offices as an alternative to probability samples. However, it is well known that the use of a non-probability sample alone may produce estimates with significant bias due to the unknown nature of the underlying selection mechanism. Bias reduction can be achieved by integrating data from the non-probability sample with data from a probability sample provided that both samples contain auxiliary variables in common. We focus on inverse probability weighting methods, which involve modelling the probability of participation in the non-probability sample. First, we consider the logistic model along with pseudo maximum likelihood estimation. We propose a variable selection procedure based on a modified Akaike Information Criterion (AIC) that properly accounts for the data structure and the probability sampling design. We also propose a simple rank-based method of forming homogeneous post-strata. Then, we extend the Classification and Regression Trees (CART) algorithm to this data integration scenario, while again properly accounting for the probability sampling design. A bootstrap variance estimator is proposed that reflects two sources of variability: the probability sampling design and the participation model. Our methods are illustrated using Statistics Canada’s crowdsourcing and survey data.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100005
    Description: In this rejoinder, I address the comments from the discussants, Dr. Takumi Saegusa, Dr. Jae-Kwang Kim and Ms. Yonghyun Kwon. Dr. Saegusa’s comments about the differences between the conditional exchangeability (CE) assumption for causal inferences versus the CE assumption for finite population inferences using nonprobability samples, and the distinction between design-based versus model-based approaches for finite population inference using nonprobability samples, are elaborated and clarified in the context of my paper. Subsequently, I respond to Dr. Kim and Ms. Kwon’s comprehensive framework for categorizing existing approaches for estimating propensity scores (PS) into conditional and unconditional approaches. I expand their simulation studies to vary the sampling weights, allow for misspecified PS models, and include an additional estimator, i.e., scaled adjusted logistic propensity estimator (Wang, Valliant and Li (2021), denoted by sWBS). In my simulations, it is observed that the sWBS estimator consistently outperforms or is comparable to the other estimators under the misspecified PS model. The sWBS, as well as WBS or ABS described in my paper, do not assume that the overlapped units in both the nonprobability and probability reference samples are negligible, nor do they require the identification of overlap units as needed by the estimators proposed by Dr. Kim and Ms. Kwon.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100006
    Description: In some of non-probability sample literature, the conditional exchangeability assumption is considered to be necessary for valid statistical inference. This assumption is rooted in causal inference though its potential outcome framework differs greatly from that of non-probability samples. We describe similarities and differences of two frameworks and discuss issues to consider when adopting the conditional exchangeability assumption in non-probability sample setups. We also discuss the role of finite population inference in different approaches of propensity scores and outcome regression modeling to non-probability samples.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100007
    Description: Pseudo weight construction for data integration can be understood in the two-phase sampling framework. Using the two-phase sampling framework, we discuss two approaches to the estimation of propensity scores and develop a new way to construct the propensity score function for data integration using the conditional maximum likelihood method. Results from a limited simulation study are also presented.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100008
    Description: Nonprobability samples emerge rapidly to address time-sensitive priority topics in different areas. These data are timely but subject to selection bias. To reduce selection bias, there has been wide literature in survey research investigating the use of propensity-score (PS) adjustment methods to improve the population representativeness of nonprobability samples, using probability-based survey samples as external references. Conditional exchangeability (CE) assumption is one of the key assumptions required by PS-based adjustment methods. In this paper, I first explore the validity of the CE assumption conditional on various balancing score estimates that are used in existing PS-based adjustment methods. An adaptive balancing score is proposed for unbiased estimation of population means. The population mean estimators under the three CE assumptions are evaluated via Monte Carlo simulation studies and illustrated using the NIH SARS-CoV-2 seroprevalence study to estimate the proportion of U.S. adults with COVID-19 antibodies from April 01-August 04, 2020.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100009
    Description: Our comments respond to discussion from Sen, Brick, and Elliott. We weigh the potential upside and downside of Sen’s suggestion of using machine learning to identify bogus respondents through interactions and improbable combinations of variables. We join Brick in reflecting on bogus respondents’ impact on the state of commercial nonprobability surveys. Finally, we consider Elliott’s discussion of solutions to the challenge raised in our study.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100010
    Description: This discussion summarizes the interesting new findings around measurement errors in opt-in surveys by Kennedy, Mercer and Lau (KML). While KML enlighten readers about “bogus responding” and possible patterns in them, this discussion suggests combining these new-found results with other avenues of research in nonprobability sampling, such as improvement of representativeness.
    Release date: 2024-06-25
Journals and periodicals (4)

Journals and periodicals (4) ((4 results))

  • Journals and periodicals: 12-001-X
    Geography: Canada
    Description: The journal publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves.
    Release date: 2024-06-25

  • Journals and periodicals: 11-008-X
    Geography: Canada
    Description:

    This publication discusses the social, economic, and demographic changes affecting the lives of Canadians.

    Free downloadable PDF and HTML files: Published every six weeks Printed issue: Published every six months (twice per year)

    Release date: 2012-07-30

  • Journals and periodicals: 11-010-X
    Geography: Canada
    Description:

    This monthly periodical is Statistics Canada's flagship publication for economic statistics. Each issue contains a monthly summary of the economy, major economic events and a feature article. A statistical summary contains a wide range of tables and graphs on the principal economic indicators for Canada, the provinces and the major industrial nations. A historical listing of this same data is contained in the Canadian economic observer: historical supplement (Catalogue no. 11-210-XPB and XIB).

    Release date: 2012-06-15

  • Journals and periodicals: 87-003-X
    Geography: Canada
    Description:

    Travel-log is a quarterly tourism newsletter that examines international travel trends, international travel accounts and the travel price index. It also features the latest tourism indicators and includes feature articles related to tourism.

    Release date: 2005-01-26
Date modified: