Keyword search

Sort Help
entries

Results

All (56)

All (56) (0 to 10 of 56 results)

  • Stats in brief: 11-627-M2021092
    Description:

    This infographic provides a high-level overview of Statistics Canada’s Disaggregated Data Action Plan, which will produce detailed statistical information on specific population groups. This plan is essential to highlight the lived experiences of diverse groups of people in Canada, such as women, Indigenous peoples, racialized populations and people living with disabilities.

    Release date: 2021-12-08

  • Articles and reports: 11-633-X2021008
    Description:

    The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years. The IMDB includes Immigration, Refugees and Citizenship Canada (IRCC) administrative records which contain exhaustive information about immigrants who were admitted to Canada since 1952. It also includes data about non-permanent residents who have been issued temporary resident permits since 1980. This report will discuss the IMDB data sources, concepts and variables, record linkage, data processing, dissemination, data evaluation and quality indicators, comparability with other immigration datasets, and the analyses possible with the IMDB.

    Release date: 2021-12-06

  • Articles and reports: 11-633-X2021007
    Description:

    Statistics Canada continues to use a variety of data sources to provide neighbourhood-level variables across an expanding set of domains, such as sociodemographic characteristics, income, services and amenities, crime, and the environment. Yet, despite these advances, information on the social aspects of neighbourhoods is still unavailable. In this paper, answers to the Canadian Community Health Survey on respondents’ sense of belonging to their local community were pooled over the four survey years from 2016 to 2019. Individual responses were aggregated up to the census tract (CT) level.

    Release date: 2021-11-16

  • Articles and reports: 75F0002M2021007
    Description:

    This discussion paper describes the proposed methodology for a Northern Market Basket Measure (MBM-N) for Yukon and the Northwest Territories, as well as identifies research which could be conducted in preparation for the 2023 review. The paper presents initial MBM-N thresholds and provides preliminary poverty estimates for reference years 2018 and 2019. A review period will follow the release of this paper, during which time Statistics Canada and Employment and Social Development Canada will welcome feedback from interested parties and work with experts, stakeholders, indigenous organizations, federal, provincial and territorial officials to validate the results.

    Release date: 2021-11-12

  • Articles and reports: 11-522-X202100100010
    Description:

    As part of processing for the 2021 Canadian Census, the write-in responses to 31 census questions must be coded. Up until, and including, 2016, this was a three stage process, including an “interactive (human) coding” step as the second stage. This human coding step is both lengthy and expensive, spanning many months and requiring the hiring and training of a large number of temporary employees. With this in mind, for 2021, this stage was either augmented with or replaced entirely by machine learning models using the "fastText" algorithm. This presentation will discuss the implementation of this algorithm and the challenges and decisions taken along the way.

    Key Words: Natural Language Processing, Machine Learning, fastText, Coding

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100011
    Description: The ways in which AI may affect the world of official statistics are manifold and Statistics Netherlands (CBS) is actively exploring how it can use AI within its societal role. The paper describes a number of AI-related areas where CBS is currently active: use of AI for its own statistics production and statistical R&D, the development of a national AI monitor, the support of other government bodies with expertise on fair data and fair algorithms, data sharing under safe and secure conditions, and engaging in AI-related collaborations.

    Key Words: Artificial Intelligence; Official Statistics; Data Sharing; Fair Algorithms; AI monitoring; Collaboration.

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100012
    Description: The modernization of price statistics by National Statistical Offices (NSO) such as Statistics Canada focuses on the adoption of alternative data sources that include the near-universe of all products sold in the country, a scale that requires machine learning classification of the data. The process of evaluating classifiers to select appropriate ones for production, as well as monitoring classifiers once in production, needs to be based on robust metrics to measure misclassification. As commonly utilized metrics, such as the Fß-score may not take into account key aspects applicable to prices statistics in all cases, such as unequal importance of categories, a careful consideration of the metric space is necessary to select appropriate methods to evaluate classifiers. This working paper provides insight on the metric space applicable to price statistics and proposes an operational framework to evaluate and monitor classifiers, focusing specifically on the needs of the Canadian Consumer Prices Index and demonstrating discussed metrics using a publicly available dataset.

    Key Words: Consumer price index; supervised classification; evaluation metrics; taxonomy

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100013
    Description: Statistics Canada’s Labour Force Survey (LFS) plays a fundamental role in the mandate of Statistics Canada. The labour market information provided by the LFS is among the most timely and important measures of the Canadian economy’s overall performance. An integral part of the LFS monthly data processing is the coding of respondent’s industry according to the North American Industrial Classification System (NAICS), occupation according to the National Occupational Classification System (NOC) and the Primary Class of Workers (PCOW). Each month, up to 20,000 records are coded manually. In 2020, Statistics Canada worked on developing Machine Learning models using fastText to code responses to the LFS questionnaire according to the three classifications mentioned previously. This article will provide an overview on the methodology developed and results obtained from a potential application of the use of fastText into the LFS coding process. 

    Key Words: Machine Learning; Labour Force Survey; Text classification; fastText.

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100029
    Description:

    In line with the path taken by the European Statistical System, Istat is investing on innovative methods to harness Big Data sources and to use them for the production of new and enriched Official Statistics products. Big Data sources are not, in general, directly tractable with traditional statistical techniques, just think of specific data types such as images and texts that are examples of the Variety dimension of Big Data. This motivates and justifies the growing interest of National Statistical Institutes in data science techniques. Istat is currently using data science techniques, including machine learning techniques, in innovation projects and for the publication of experimental statistics. This paper will provide an overview of the main current projects by Istat and will focus on two specific Big Data-based production pipelines, related to the processing of respectively text sources and imagery sources. The paper will highlight the main challenges these two pipelines and the solutions put in place to solve them.

    Key Words: Machine Learning; Text Processing; Image Processing; Big Data

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100008
    Description:

    Non-probability samples are being increasingly explored by National Statistical Offices as a complement to probability samples. We consider the scenario where the variable of interest and auxiliary variables are observed in both a probability and non-probability sample. Our objective is to use data from the non-probability sample to improve the efficiency of survey-weighted estimates obtained from the probability sample. Recently, Sakshaug, Wisniowski, Ruiz and Blom (2019) and Wisniowski, Sakshaug, Ruiz and Blom (2020) proposed a Bayesian approach to integrating data from both samples for the estimation of model parameters. In their approach, non-probability sample data are used to determine the prior distribution of model parameters, and the posterior distribution is obtained under the assumption that the probability sampling design is ignorable (or not informative). We extend this Bayesian approach to the prediction of finite population parameters under non-ignorable (or informative) sampling by conditioning on appropriate survey-weighted statistics. We illustrate the properties of our predictor through a simulation study.

    Key Words: Bayesian prediction; Gibbs sampling; Non-ignorable sampling; Statistical data integration.

    Release date: 2021-10-29
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (53)

Analysis (53) (0 to 10 of 53 results)

  • Stats in brief: 11-627-M2021092
    Description:

    This infographic provides a high-level overview of Statistics Canada’s Disaggregated Data Action Plan, which will produce detailed statistical information on specific population groups. This plan is essential to highlight the lived experiences of diverse groups of people in Canada, such as women, Indigenous peoples, racialized populations and people living with disabilities.

    Release date: 2021-12-08

  • Articles and reports: 11-633-X2021008
    Description:

    The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years. The IMDB includes Immigration, Refugees and Citizenship Canada (IRCC) administrative records which contain exhaustive information about immigrants who were admitted to Canada since 1952. It also includes data about non-permanent residents who have been issued temporary resident permits since 1980. This report will discuss the IMDB data sources, concepts and variables, record linkage, data processing, dissemination, data evaluation and quality indicators, comparability with other immigration datasets, and the analyses possible with the IMDB.

    Release date: 2021-12-06

  • Articles and reports: 11-633-X2021007
    Description:

    Statistics Canada continues to use a variety of data sources to provide neighbourhood-level variables across an expanding set of domains, such as sociodemographic characteristics, income, services and amenities, crime, and the environment. Yet, despite these advances, information on the social aspects of neighbourhoods is still unavailable. In this paper, answers to the Canadian Community Health Survey on respondents’ sense of belonging to their local community were pooled over the four survey years from 2016 to 2019. Individual responses were aggregated up to the census tract (CT) level.

    Release date: 2021-11-16

  • Articles and reports: 75F0002M2021007
    Description:

    This discussion paper describes the proposed methodology for a Northern Market Basket Measure (MBM-N) for Yukon and the Northwest Territories, as well as identifies research which could be conducted in preparation for the 2023 review. The paper presents initial MBM-N thresholds and provides preliminary poverty estimates for reference years 2018 and 2019. A review period will follow the release of this paper, during which time Statistics Canada and Employment and Social Development Canada will welcome feedback from interested parties and work with experts, stakeholders, indigenous organizations, federal, provincial and territorial officials to validate the results.

    Release date: 2021-11-12

  • Articles and reports: 11-522-X202100100010
    Description:

    As part of processing for the 2021 Canadian Census, the write-in responses to 31 census questions must be coded. Up until, and including, 2016, this was a three stage process, including an “interactive (human) coding” step as the second stage. This human coding step is both lengthy and expensive, spanning many months and requiring the hiring and training of a large number of temporary employees. With this in mind, for 2021, this stage was either augmented with or replaced entirely by machine learning models using the "fastText" algorithm. This presentation will discuss the implementation of this algorithm and the challenges and decisions taken along the way.

    Key Words: Natural Language Processing, Machine Learning, fastText, Coding

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100011
    Description: The ways in which AI may affect the world of official statistics are manifold and Statistics Netherlands (CBS) is actively exploring how it can use AI within its societal role. The paper describes a number of AI-related areas where CBS is currently active: use of AI for its own statistics production and statistical R&D, the development of a national AI monitor, the support of other government bodies with expertise on fair data and fair algorithms, data sharing under safe and secure conditions, and engaging in AI-related collaborations.

    Key Words: Artificial Intelligence; Official Statistics; Data Sharing; Fair Algorithms; AI monitoring; Collaboration.

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100012
    Description: The modernization of price statistics by National Statistical Offices (NSO) such as Statistics Canada focuses on the adoption of alternative data sources that include the near-universe of all products sold in the country, a scale that requires machine learning classification of the data. The process of evaluating classifiers to select appropriate ones for production, as well as monitoring classifiers once in production, needs to be based on robust metrics to measure misclassification. As commonly utilized metrics, such as the Fß-score may not take into account key aspects applicable to prices statistics in all cases, such as unequal importance of categories, a careful consideration of the metric space is necessary to select appropriate methods to evaluate classifiers. This working paper provides insight on the metric space applicable to price statistics and proposes an operational framework to evaluate and monitor classifiers, focusing specifically on the needs of the Canadian Consumer Prices Index and demonstrating discussed metrics using a publicly available dataset.

    Key Words: Consumer price index; supervised classification; evaluation metrics; taxonomy

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100013
    Description: Statistics Canada’s Labour Force Survey (LFS) plays a fundamental role in the mandate of Statistics Canada. The labour market information provided by the LFS is among the most timely and important measures of the Canadian economy’s overall performance. An integral part of the LFS monthly data processing is the coding of respondent’s industry according to the North American Industrial Classification System (NAICS), occupation according to the National Occupational Classification System (NOC) and the Primary Class of Workers (PCOW). Each month, up to 20,000 records are coded manually. In 2020, Statistics Canada worked on developing Machine Learning models using fastText to code responses to the LFS questionnaire according to the three classifications mentioned previously. This article will provide an overview on the methodology developed and results obtained from a potential application of the use of fastText into the LFS coding process. 

    Key Words: Machine Learning; Labour Force Survey; Text classification; fastText.

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100029
    Description:

    In line with the path taken by the European Statistical System, Istat is investing on innovative methods to harness Big Data sources and to use them for the production of new and enriched Official Statistics products. Big Data sources are not, in general, directly tractable with traditional statistical techniques, just think of specific data types such as images and texts that are examples of the Variety dimension of Big Data. This motivates and justifies the growing interest of National Statistical Institutes in data science techniques. Istat is currently using data science techniques, including machine learning techniques, in innovation projects and for the publication of experimental statistics. This paper will provide an overview of the main current projects by Istat and will focus on two specific Big Data-based production pipelines, related to the processing of respectively text sources and imagery sources. The paper will highlight the main challenges these two pipelines and the solutions put in place to solve them.

    Key Words: Machine Learning; Text Processing; Image Processing; Big Data

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100008
    Description:

    Non-probability samples are being increasingly explored by National Statistical Offices as a complement to probability samples. We consider the scenario where the variable of interest and auxiliary variables are observed in both a probability and non-probability sample. Our objective is to use data from the non-probability sample to improve the efficiency of survey-weighted estimates obtained from the probability sample. Recently, Sakshaug, Wisniowski, Ruiz and Blom (2019) and Wisniowski, Sakshaug, Ruiz and Blom (2020) proposed a Bayesian approach to integrating data from both samples for the estimation of model parameters. In their approach, non-probability sample data are used to determine the prior distribution of model parameters, and the posterior distribution is obtained under the assumption that the probability sampling design is ignorable (or not informative). We extend this Bayesian approach to the prediction of finite population parameters under non-ignorable (or informative) sampling by conditioning on appropriate survey-weighted statistics. We illustrate the properties of our predictor through a simulation study.

    Key Words: Bayesian prediction; Gibbs sampling; Non-ignorable sampling; Statistical data integration.

    Release date: 2021-10-29
Reference (3)

Reference (3) ((3 results))

  • Surveys and statistical programs – Documentation: 12-004-X
    Description:

    Statistics: Power from Data! is a web resource that was created in 2001 to assist secondary students and teachers of Mathematics and Information Studies in getting the most from statistics. Over the past 20 years, this product has become one of Statistics Canada most popular references for students, teachers, and many other members of the general population. This product was last updated in 2021.

    Release date: 2021-09-02

  • Surveys and statistical programs – Documentation: 11-633-X2021005
    Description:

    The Analytical Studies and Modelling Branch (ASMB) is the research arm of Statistics Canada mandated to provide high-quality, relevant and timely information on economic, health and social issues that are important to Canadians. The branch strategically makes use of expert knowledge and a broad range of data sources and modelling techniques to address the information needs of a broad range of government, academic and public sector partners and stakeholders through analysis and research, modeling and predictive analytics, and data development. The branch strives to deliver relevant, high-quality, timely, comprehensive, horizontal and integrated research and to enable the use of its research through capacity building and strategic dissemination to meet the user needs of policy makers, academics and the general public.

    This Multi-year Consolidated Plan for Research, Modelling and Data Development outlines the priorities for the branch over the next two years.

    Release date: 2021-08-12

  • Surveys and statistical programs – Documentation: 11-633-X2021002
    Description:

    The Longitudinal Immigration Database (IMDB) is a comprehensive source of data that plays a key role in the understanding of the economic behaviour of immigrants. It is the only annual Canadian dataset that allows users to study the characteristics of immigrants to Canada at the time of admission and their economic outcomes and regional (inter-provincial) mobility over a time span of more than 35 years. The IMDB includes Immigration, Refugees and Citizenship Canada (IRCC) administrative records which contain exhaustive information about immigrants who were admitted to Canada since 1952. It also includes data about non-permanent residents who have been issued temporary resident permits since 1980. This report will discuss the IMDB data sources, concepts and variables, record linkage, data processing, dissemination, data evaluation and quality indicators, comparability with other immigration datasets, and the analyses possible with the IMDB.

    Release date: 2021-02-01
Date modified: