Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (91)

All (91) (0 to 10 of 91 results)

  • Articles and reports: 11-522-X202200100008
    Description: The publication of more disaggregated data can increase transparency and provide important information on underrepresented groups. Developing more readily available access options increases the amount of information available to and produced by researchers. Increasing the breadth and depth of the information released allows for a better representation of the Canadian population, but also puts a greater responsibility on Statistics Canada to do this in a way that preserves confidentiality, and thus it is helpful to develop tools which allow Statistics Canada to quantify the risk from the additional data granularity. In an effort to evaluate the risk of a database reconstruction attack on Statistics Canada’s published Census data, this investigation follows the strategy of the US Census Bureau, who outlined a method to use a Boolean satisfiability (SAT) solver to reconstruct individual attributes of residents of a hypothetical US Census block, based just on a table of summary statistics. The technique is expanded to attempt to reconstruct a small fraction of Statistics Canada’s Census microdata. This paper will discuss the findings of the investigation, the challenges involved in mounting a reconstruction attack, and the effect of an existing confidentiality measure in mitigating these attacks. Furthermore, the existing strategy is compared to other potential methods used to protect data – in particular, releasing tabular data perturbed by some random mechanism, such as those suggested by differential privacy.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100013
    Description: Respondents to typical household surveys tend to significantly underreport their potential use of food aid distributed by associations. This underreporting is most likely related to the social stigma felt by people experiencing great financial difficulty. As a result, survey estimates of the number of recipients of that aid are much lower than the direct counts from the associations. Those counts tend to overestimate due to double counting. Through its adapted protocol, the Enquête Aide alimentaire (EAA) collected in late 2021 in France at a sample of sites of food aid distribution associations, controls the biases that affect the other sources and determines to what extent this aid is used.
    Release date: 2024-03-25

  • Articles and reports: 12-001-X202300200017
    Description: Jean-Claude Deville, who passed away in October 2021, was one of the most influential researchers in the field of survey statistics over the past 40 years. This article traces some of his contributions that have had a profound impact on both survey theory and practice. This article will cover the topics of balanced sampling using the cube method, calibration, the weight-sharing method, the development of variance expressions of complex estimators using influence function and quota sampling.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202200200009
    Description:

    Multiple imputation (MI) is a popular approach for dealing with missing data arising from non-response in sample surveys. Multiple imputation by chained equations (MICE) is one of the most widely used MI algorithms for multivariate data, but it lacks theoretical foundation and is computationally intensive. Recently, missing data imputation methods based on deep learning models have been developed with encouraging results in small studies. However, there has been limited research on evaluating their performance in realistic settings compared to MICE, particularly in big surveys. We conduct extensive simulation studies based on a subsample of the American Community Survey to compare the repeated sampling properties of four machine learning based MI methods: MICE with classification trees, MICE with random forests, generative adversarial imputation networks, and multiple imputation using denoising autoencoders. We find the deep learning imputation methods are superior to MICE in terms of computational time. However, with the default choice of hyperparameters in the common software packages, MICE with classification trees consistently outperforms, often by a large margin, the deep learning imputation methods in terms of bias, mean squared error, and coverage under a range of realistic settings.

    Release date: 2022-12-15

  • Stats in brief: 89-20-00082021001
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to perform the dominance and homogeneity test while using the Census.
    Release date: 2022-04-29

  • Stats in brief: 89-20-00082021002
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to create proportion output for researchers working with confidential data.
    Release date: 2022-04-27

  • Stats in brief: 89-20-00082021003
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to create proportion output for researchers working with confidential data.
    Release date: 2022-04-27

  • Stats in brief: 89-20-00082021004
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to perform the dominance and homogeneity test while using the Census.
    Release date: 2022-04-27

  • Stats in brief: 89-20-00082021005
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to create proportion output for researchers working with confidential data.
    Release date: 2022-04-27

  • Stats in brief: 89-20-00082021006
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to perform the dominance and homogeneity test while using the Census.
    Release date: 2022-04-27
Stats in brief (8)

Stats in brief (8) ((8 results))

  • Stats in brief: 89-20-00082021001
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to perform the dominance and homogeneity test while using the Census.
    Release date: 2022-04-29

  • Stats in brief: 89-20-00082021002
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use SAS to create proportion output for researchers working with confidential data.
    Release date: 2022-04-27

  • Stats in brief: 89-20-00082021003
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to create proportion output for researchers working with confidential data.
    Release date: 2022-04-27

  • Stats in brief: 89-20-00082021004
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use Stata to perform the dominance and homogeneity test while using the Census.
    Release date: 2022-04-27

  • Stats in brief: 89-20-00082021005
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to create proportion output for researchers working with confidential data.
    Release date: 2022-04-27

  • Stats in brief: 89-20-00082021006
    Description: This video is part of the confidentiality vetting support series and presents examples of how to use R to perform the dominance and homogeneity test while using the Census.
    Release date: 2022-04-27

  • Stats in brief: 11-627-M2020072
    Description:

    This infographic provides an overview of the Canadian Research and Development Classification (CRDC), a national standard jointly developed by the Canada Foundation for Innovation (CFI), the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), the Social Sciences and Humanities Research Council of Canada (SSHRC), and Statistics Canada.

    Release date: 2020-10-05

  • Stats in brief: 11-627-M2020051
    Description:

    This infographic provides an overview of national statistical standards, explaining what they are and where they are used, the advantages of using them, and the role they play in the collection and dissemination of disaggregated data.

    Release date: 2020-07-24
Articles and reports (82)

Articles and reports (82) (0 to 10 of 82 results)

  • Articles and reports: 11-522-X202200100008
    Description: The publication of more disaggregated data can increase transparency and provide important information on underrepresented groups. Developing more readily available access options increases the amount of information available to and produced by researchers. Increasing the breadth and depth of the information released allows for a better representation of the Canadian population, but also puts a greater responsibility on Statistics Canada to do this in a way that preserves confidentiality, and thus it is helpful to develop tools which allow Statistics Canada to quantify the risk from the additional data granularity. In an effort to evaluate the risk of a database reconstruction attack on Statistics Canada’s published Census data, this investigation follows the strategy of the US Census Bureau, who outlined a method to use a Boolean satisfiability (SAT) solver to reconstruct individual attributes of residents of a hypothetical US Census block, based just on a table of summary statistics. The technique is expanded to attempt to reconstruct a small fraction of Statistics Canada’s Census microdata. This paper will discuss the findings of the investigation, the challenges involved in mounting a reconstruction attack, and the effect of an existing confidentiality measure in mitigating these attacks. Furthermore, the existing strategy is compared to other potential methods used to protect data – in particular, releasing tabular data perturbed by some random mechanism, such as those suggested by differential privacy.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100013
    Description: Respondents to typical household surveys tend to significantly underreport their potential use of food aid distributed by associations. This underreporting is most likely related to the social stigma felt by people experiencing great financial difficulty. As a result, survey estimates of the number of recipients of that aid are much lower than the direct counts from the associations. Those counts tend to overestimate due to double counting. Through its adapted protocol, the Enquête Aide alimentaire (EAA) collected in late 2021 in France at a sample of sites of food aid distribution associations, controls the biases that affect the other sources and determines to what extent this aid is used.
    Release date: 2024-03-25

  • Articles and reports: 12-001-X202300200017
    Description: Jean-Claude Deville, who passed away in October 2021, was one of the most influential researchers in the field of survey statistics over the past 40 years. This article traces some of his contributions that have had a profound impact on both survey theory and practice. This article will cover the topics of balanced sampling using the cube method, calibration, the weight-sharing method, the development of variance expressions of complex estimators using influence function and quota sampling.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X202200200009
    Description:

    Multiple imputation (MI) is a popular approach for dealing with missing data arising from non-response in sample surveys. Multiple imputation by chained equations (MICE) is one of the most widely used MI algorithms for multivariate data, but it lacks theoretical foundation and is computationally intensive. Recently, missing data imputation methods based on deep learning models have been developed with encouraging results in small studies. However, there has been limited research on evaluating their performance in realistic settings compared to MICE, particularly in big surveys. We conduct extensive simulation studies based on a subsample of the American Community Survey to compare the repeated sampling properties of four machine learning based MI methods: MICE with classification trees, MICE with random forests, generative adversarial imputation networks, and multiple imputation using denoising autoencoders. We find the deep learning imputation methods are superior to MICE in terms of computational time. However, with the default choice of hyperparameters in the common software packages, MICE with classification trees consistently outperforms, often by a large margin, the deep learning imputation methods in terms of bias, mean squared error, and coverage under a range of realistic settings.

    Release date: 2022-12-15

  • Articles and reports: 11-522-X202100100016
    Description: To build data capacity and address the U.S. opioid public health emergency, the National Center for Health Statistics received funding for two projects. The projects involve development of algorithms that use all available structured and unstructured data submitted for the 2016 National Hospital Care Survey (NHCS) to enhance identification of opioid-involvement and the presence of co-occurring disorders (coexistence of a substance use disorder and a mental health issue). A description of the algorithm development process is provided, and lessons learned from integrating data science methods like natural language processing to produce official statistics are presented. Efforts to make the algorithms and analytic datafiles accessible to researchers are also discussed.

    Key Words: Opioids; Co-Occurring Disorders; Data Science; Natural Language Processing; Hospital Care

    Release date: 2021-10-22

  • Articles and reports: 18-001-X2020001
    Description:

    This paper presents the methodology used to generate the first nationwide database of proximity measures and the results obtained with a first set of ten measures. The computational methods are presented as a generalizable model due to the fact that it is now possible to apply similar methods to a multitude of other services or amenities, in a variety of alternative specifications.

    Release date: 2021-02-15

  • Articles and reports: 11-633-X2021001
    Description:

    Using data from the Canadian Housing Survey, this project aimed to construct a measure of social inclusion, using indicators identified by the Canada Mortgage and Housing Corporation (CMHC), to report a social inclusion score for each geographic stratum separately for dwellings that are and are not in social and affordable housing. This project also sought to examine associations between social inclusion and a set of economic, social and health variables.

    Release date: 2021-01-05

  • Articles and reports: 12-001-X201900300006
    Description:

    High nonresponse is a very common problem in sample surveys today. In statistical terms we are worried about increased bias and variance of estimators for population quantities such as totals or means. Different methods have been suggested in order to compensate for this phenomenon. We can roughly divide them into imputation and calibration and it is the latter approach we will focus on here. A wide spectrum of possibilities is included in the class of calibration estimators. We explore linear calibration, where we suggest using a nonresponse version of the design-based optimal regression estimator. Comparisons are made between this estimator and a GREG type estimator. Distance measures play a very important part in the construction of calibration estimators. We show that an estimator of the average response propensity (probability) can be included in the “optimal” distance measure under nonresponse, which will help to reduce the bias of the resulting estimator. To illustrate empirically the theoretically derived results for the suggested estimators, a simulation study has been carried out. The population is called KYBOK and consists of clerical municipalities in Sweden, where the variables include financial as well as size measurements. The results are encouraging for the “optimal” estimator in combination with the estimated average response propensity, where the bias was reduced for most of the Poisson sampling cases in the study.

    Release date: 2019-12-17

  • Articles and reports: 12-001-X201900200005
    Description:

    We present an approach for imputation of missing items in multivariate categorical data nested within households. The approach relies on a latent class model that (i) allows for household-level and individual-level variables, (ii) ensures that impossible household configurations have zero probability in the model, and (iii) can preserve multivariate distributions both within households and across households. We present a Gibbs sampler for estimating the model and generating imputations. We also describe strategies for improving the computational efficiency of the model estimation. We illustrate the performance of the approach with data that mimic the variables collected in typical population censuses.

    Release date: 2019-06-27

  • Articles and reports: 11-633-X2019002
    Description:

    Survey data collection through mobile devices, such as tablets and smartphones, is underway in Canada. However, little is known about the representativeness of the data collected through these devices. In March 2017, Statistics Canada commissioned survey data collection through the Carrot Rewards Application and included 11 questions on the Carrot Rewards Mobile App Survey (Carrot) drawn from the 2017 Canadian Community Health Survey (CCHS).

    Release date: 2019-06-04
Journals and periodicals (1)

Journals and periodicals (1) ((1 result))

  • Journals and periodicals: 84F0013X
    Geography: Canada, Province or territory
    Description:

    This study was initiated to test the validity of probabilistic linkage methods used at Statistics Canada. It compared the results of data linkages on infant deaths in Canada with infant death data from Nova Scotia and Alberta. It also compared the availability of fetal deaths on the national and provincial files.

    Release date: 1999-10-08
Date modified: