Keyword search

Sort Help
entries

Results

All (24)

All (24) (0 to 10 of 24 results)

  • Articles and reports: 12-001-X202400100009
    Description: Our comments respond to discussion from Sen, Brick, and Elliott. We weigh the potential upside and downside of Sen’s suggestion of using machine learning to identify bogus respondents through interactions and improbable combinations of variables. We join Brick in reflecting on bogus respondents’ impact on the state of commercial nonprobability surveys. Finally, we consider Elliott’s discussion of solutions to the challenge raised in our study.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100011
    Description: Kennedy, Mercer, and Lau explore misreporting by respondents in non-probability samples and discover a new feature, namely that of deliberate misreporting of demographic characteristics. This finding suggests that the “arms race” between researchers and those determined to disrupt the practice of social science is not over and researchers need to account for such respondents if using high-quality probability surveys to help reduce error in non-probability samples.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100013
    Description: Statistical approaches developed for nonprobability samples generally focus on nonrandom selection as the primary reason survey respondents might differ systematically from the target population. Well-established theory states that in these instances, by conditioning on the necessary auxiliary variables, selection can be rendered ignorable and survey estimates will be free of bias. But this logic rests on the assumption that measurement error is nonexistent or small. In this study we test this assumption in two ways. First, we use a large benchmarking study to identify subgroups for which errors in commercial, online nonprobability samples are especially large in ways that are unlikely due to selection effects. Then we present a follow-up study examining one cause of the large errors: bogus responding (i.e., survey answers that are fraudulent, mischievous or otherwise insincere). We find that bogus responding, particularly among respondents identifying as young or Hispanic, is a significant and widespread problem in commercial, online nonprobability samples, at least in the United States. This research highlights the need for statisticians working with commercial nonprobability samples to address bogus responding and issues of representativeness – not just the latter.
    Release date: 2024-06-25

  • Journals and periodicals: 12-206-X
    Description: This report summarizes the annual achievements of the Methodology Research and Development Program (MRDP) sponsored by the Modern Statistical Methods and Data Science Branch at Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the agency’s statistical programs; these activities would otherwise be less likely to be carried out during the provision of regular methodology services to those programs. The MRDP also includes activities that provide support in the application of past successful developments in order to promote the use of the results of research and development work. Selected prospective research activities are also presented.
    Release date: 2023-10-11

  • Articles and reports: 12-001-X202100100007
    Description:

    We consider the estimation of a small area mean under the basic unit-level model. The sum of the resulting model-dependent estimators may not add up to estimates obtained with a direct survey estimator that is deemed to be accurate for the union of these small areas. Benchmarking forces the model-based estimators to agree with the direct estimator at the aggregated area level. The generalized regression estimator is the direct estimator that we benchmark to. In this paper we compare small area benchmarked estimators based on four procedures. The first procedure produces benchmarked estimators by ratio adjustment. The second procedure is based on the empirical best linear unbiased estimator obtained under the unit-level model augmented with a suitable variable that ensures benchmarking. The third procedure uses pseudo-empirical estimators constructed with suitably chosen sampling weights so that, when aggregated, they agree with the reliable direct estimator for the larger area. The fourth procedure produces benchmarked estimators that are the result of a minimization problem subject to the constraint given by the benchmark condition. These benchmark procedures are applied to the small area estimators when the sampling rates are non-negligible. The resulting benchmarked estimators are compared in terms of relative bias and mean squared error using both a design-based simulation study as well as an example with real survey data.

    Release date: 2021-06-24

  • Surveys and statistical programs – Documentation: 12-539-X
    Description:

    This document brings together guidelines and checklists on many issues that need to be considered in the pursuit of quality objectives in the execution of statistical activities. Its focus is on how to assure quality through effective and appropriate design or redesign of a statistical project or program from inception through to data evaluation, dissemination and documentation. These guidelines draw on the collective knowledge and experience of many Statistics Canada employees. It is expected that Quality Guidelines will be useful to staff engaged in the planning and design of surveys and other statistical projects, as well as to those who evaluate and analyze the outputs of these projects.

    Release date: 2019-12-04

  • Articles and reports: 12-001-X201900200002
    Description:

    The National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA) is responsible for estimating average cash rental rates at the county level. A cash rental rate refers to the market value of land rented on a per acre basis for cash only. Estimates of cash rental rates are useful to farmers, economists, and policy makers. NASS collects data on cash rental rates using a Cash Rent Survey. Because realized sample sizes at the county level are often too small to support reliable direct estimators, predictors based on mixed models are investigated. We specify a bivariate model to obtain predictors of 2010 cash rental rates for non-irrigated cropland using data from the 2009 Cash Rent Survey and auxiliary variables from external sources such as the 2007 Census of Agriculture. We use Bayesian methods for inference and present results for Iowa, Kansas, and Texas. Incorporating the 2009 survey data through a bivariate model leads to predictors with smaller mean squared errors than predictors based on a univariate model.

    Release date: 2019-06-27

  • Articles and reports: 13-605-X201900100004
    Description:

    The revisions to the National Tourism Indicators are the result of new benchmarks from the 2015 supply and use tables and revisions to the Canadian System of Macroeconomic Accounts. Constant dollar estimates were also updated to base year 2012.

    Release date: 2019-03-28

  • Articles and reports: 12-001-X201800154927
    Description:

    Benchmarking monthly or quarterly series to annual data is a common practice in many National Statistical Institutes. The benchmarking problem arises when time series data for the same target variable are measured at different frequencies and there is a need to remove discrepancies between the sums of the sub-annual values and their annual benchmarks. Several benchmarking methods are available in the literature. The Growth Rates Preservation (GRP) benchmarking procedure is often considered the best method. It is often claimed that this procedure is grounded on an ideal movement preservation principle. However, we show that there are important drawbacks to GRP, relevant for practical applications, that are unknown in the literature. Alternative benchmarking models will be considered that do not suffer from some of GRP’s side effects.

    Release date: 2018-06-21

  • Articles and reports: 12-001-X201500114193
    Description:

    Imputed micro data often contain conflicting information. The situation may e.g., arise from partial imputation, where one part of the imputed record consists of the observed values of the original record and the other the imputed values. Edit-rules that involve variables from both parts of the record will often be violated. Or, inconsistency may be caused by adjustment for errors in the observed data, also referred to as imputation in Editing. Under the assumption that the remaining inconsistency is not due to systematic errors, we propose to make adjustments to the micro data such that all constraints are simultaneously satisfied and the adjustments are minimal according to a chosen distance metric. Different approaches to the distance metric are considered, as well as several extensions of the basic situation, including the treatment of categorical data, unit imputation and macro-level benchmarking. The properties and interpretations of the proposed methods are illustrated using business-economic data.

    Release date: 2015-06-29
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (21)

Analysis (21) (0 to 10 of 21 results)

  • Articles and reports: 12-001-X202400100009
    Description: Our comments respond to discussion from Sen, Brick, and Elliott. We weigh the potential upside and downside of Sen’s suggestion of using machine learning to identify bogus respondents through interactions and improbable combinations of variables. We join Brick in reflecting on bogus respondents’ impact on the state of commercial nonprobability surveys. Finally, we consider Elliott’s discussion of solutions to the challenge raised in our study.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100011
    Description: Kennedy, Mercer, and Lau explore misreporting by respondents in non-probability samples and discover a new feature, namely that of deliberate misreporting of demographic characteristics. This finding suggests that the “arms race” between researchers and those determined to disrupt the practice of social science is not over and researchers need to account for such respondents if using high-quality probability surveys to help reduce error in non-probability samples.
    Release date: 2024-06-25

  • Articles and reports: 12-001-X202400100013
    Description: Statistical approaches developed for nonprobability samples generally focus on nonrandom selection as the primary reason survey respondents might differ systematically from the target population. Well-established theory states that in these instances, by conditioning on the necessary auxiliary variables, selection can be rendered ignorable and survey estimates will be free of bias. But this logic rests on the assumption that measurement error is nonexistent or small. In this study we test this assumption in two ways. First, we use a large benchmarking study to identify subgroups for which errors in commercial, online nonprobability samples are especially large in ways that are unlikely due to selection effects. Then we present a follow-up study examining one cause of the large errors: bogus responding (i.e., survey answers that are fraudulent, mischievous or otherwise insincere). We find that bogus responding, particularly among respondents identifying as young or Hispanic, is a significant and widespread problem in commercial, online nonprobability samples, at least in the United States. This research highlights the need for statisticians working with commercial nonprobability samples to address bogus responding and issues of representativeness – not just the latter.
    Release date: 2024-06-25

  • Journals and periodicals: 12-206-X
    Description: This report summarizes the annual achievements of the Methodology Research and Development Program (MRDP) sponsored by the Modern Statistical Methods and Data Science Branch at Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the agency’s statistical programs; these activities would otherwise be less likely to be carried out during the provision of regular methodology services to those programs. The MRDP also includes activities that provide support in the application of past successful developments in order to promote the use of the results of research and development work. Selected prospective research activities are also presented.
    Release date: 2023-10-11

  • Articles and reports: 12-001-X202100100007
    Description:

    We consider the estimation of a small area mean under the basic unit-level model. The sum of the resulting model-dependent estimators may not add up to estimates obtained with a direct survey estimator that is deemed to be accurate for the union of these small areas. Benchmarking forces the model-based estimators to agree with the direct estimator at the aggregated area level. The generalized regression estimator is the direct estimator that we benchmark to. In this paper we compare small area benchmarked estimators based on four procedures. The first procedure produces benchmarked estimators by ratio adjustment. The second procedure is based on the empirical best linear unbiased estimator obtained under the unit-level model augmented with a suitable variable that ensures benchmarking. The third procedure uses pseudo-empirical estimators constructed with suitably chosen sampling weights so that, when aggregated, they agree with the reliable direct estimator for the larger area. The fourth procedure produces benchmarked estimators that are the result of a minimization problem subject to the constraint given by the benchmark condition. These benchmark procedures are applied to the small area estimators when the sampling rates are non-negligible. The resulting benchmarked estimators are compared in terms of relative bias and mean squared error using both a design-based simulation study as well as an example with real survey data.

    Release date: 2021-06-24

  • Articles and reports: 12-001-X201900200002
    Description:

    The National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA) is responsible for estimating average cash rental rates at the county level. A cash rental rate refers to the market value of land rented on a per acre basis for cash only. Estimates of cash rental rates are useful to farmers, economists, and policy makers. NASS collects data on cash rental rates using a Cash Rent Survey. Because realized sample sizes at the county level are often too small to support reliable direct estimators, predictors based on mixed models are investigated. We specify a bivariate model to obtain predictors of 2010 cash rental rates for non-irrigated cropland using data from the 2009 Cash Rent Survey and auxiliary variables from external sources such as the 2007 Census of Agriculture. We use Bayesian methods for inference and present results for Iowa, Kansas, and Texas. Incorporating the 2009 survey data through a bivariate model leads to predictors with smaller mean squared errors than predictors based on a univariate model.

    Release date: 2019-06-27

  • Articles and reports: 13-605-X201900100004
    Description:

    The revisions to the National Tourism Indicators are the result of new benchmarks from the 2015 supply and use tables and revisions to the Canadian System of Macroeconomic Accounts. Constant dollar estimates were also updated to base year 2012.

    Release date: 2019-03-28

  • Articles and reports: 12-001-X201800154927
    Description:

    Benchmarking monthly or quarterly series to annual data is a common practice in many National Statistical Institutes. The benchmarking problem arises when time series data for the same target variable are measured at different frequencies and there is a need to remove discrepancies between the sums of the sub-annual values and their annual benchmarks. Several benchmarking methods are available in the literature. The Growth Rates Preservation (GRP) benchmarking procedure is often considered the best method. It is often claimed that this procedure is grounded on an ideal movement preservation principle. However, we show that there are important drawbacks to GRP, relevant for practical applications, that are unknown in the literature. Alternative benchmarking models will be considered that do not suffer from some of GRP’s side effects.

    Release date: 2018-06-21

  • Articles and reports: 12-001-X201500114193
    Description:

    Imputed micro data often contain conflicting information. The situation may e.g., arise from partial imputation, where one part of the imputed record consists of the observed values of the original record and the other the imputed values. Edit-rules that involve variables from both parts of the record will often be violated. Or, inconsistency may be caused by adjustment for errors in the observed data, also referred to as imputation in Editing. Under the assumption that the remaining inconsistency is not due to systematic errors, we propose to make adjustments to the micro data such that all constraints are simultaneously satisfied and the adjustments are minimal according to a chosen distance metric. Different approaches to the distance metric are considered, as well as several extensions of the basic situation, including the treatment of categorical data, unit imputation and macro-level benchmarking. The properties and interpretations of the proposed methods are illustrated using business-economic data.

    Release date: 2015-06-29

  • Articles and reports: 12-001-X201300111830
    Description:

    We consider two different self-benchmarking methods for the estimation of small area means based on the Fay-Herriot (FH) area level model: the method of You and Rao (2002) applied to the FH model and the method of Wang, Fuller and Qu (2008) based on augmented models. We derive an estimator of the mean squared prediction error (MSPE) of the You-Rao (YR) estimator of a small area mean that, under the true model, is correct to second-order terms. We report the results of a simulation study on the relative bias of the MSPE estimator of the YR estimator and the MSPE estimator of the Wang, Fuller and Qu (WFQ) estimator obtained under an augmented model. We also study the MSPE and the estimators of MSPE for the YR and WFQ estimators obtained under a misspecified model.

    Release date: 2013-06-28
Reference (3)

Reference (3) ((3 results))

  • Surveys and statistical programs – Documentation: 12-539-X
    Description:

    This document brings together guidelines and checklists on many issues that need to be considered in the pursuit of quality objectives in the execution of statistical activities. Its focus is on how to assure quality through effective and appropriate design or redesign of a statistical project or program from inception through to data evaluation, dissemination and documentation. These guidelines draw on the collective knowledge and experience of many Statistics Canada employees. It is expected that Quality Guidelines will be useful to staff engaged in the planning and design of surveys and other statistical projects, as well as to those who evaluate and analyze the outputs of these projects.

    Release date: 2019-12-04

  • Surveys and statistical programs – Documentation: 15-547-X
    Description:

    Like most statistical agencies, Statistics Canada publishes three Gross Domestic Product (GDP) series. These are the output-based GDP, the income-based GDP and the expenditure-based GDP. This document is aimed at describing the concepts, definitions, classifications and statistical methods underlying the output-based GDP series, also known as GDP by industry or simply monthly GDP.

    The report is organized into seven chapters. Chapter 1 defines what GDP by industry is, describes its various uses and how it connects with the other components of the Canadian System of National Accounts. Chapter 2 deals with the calculation of the GDP by industry estimates. Chapter 3 examines industry and commodity classification schemes. Chapter 4 discusses the subject of deflation. The choice of deflators, the role of the base year and the method of rebasing are all addressed in this chapter. Chapter 5 looks at such technical issues as benchmarking, trading day and seasonal adjustment. Chapter 6 is devoted to the presentation of the GDP by industry, detailing the format, release dates and modes of dissemination, as well as the need and the frequency of revising the estimates. Finally, Chapter 7 reviews the historical development of monthly GDP from 1926 to the present.

    Release date: 2002-11-29

  • Surveys and statistical programs – Documentation: 13F0031M2001009
    Description:

    The work on Input-output (IO) tables in Canada started in the early 1960s. At the very beginning, it was decided that IO tables must fulfill several roles and provide: (a) an audit and management tool to improve economic statistics for their consistency, accuracy and comprehensiveness; (b) benchmarks for gross domestic product (GDP), its income side and components, its expenditures side and components and GDP by industry estimates, both at current prices and constant prices and (c) a framework for structural analysis.

    Release date: 2001-04-10
Date modified: