Other content related to Statistical methods

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (162)

All (162) (20 to 30 of 162 results)

  • Articles and reports: 11-522-X202100100012
    Description: The modernization of price statistics by National Statistical Offices (NSO) such as Statistics Canada focuses on the adoption of alternative data sources that include the near-universe of all products sold in the country, a scale that requires machine learning classification of the data. The process of evaluating classifiers to select appropriate ones for production, as well as monitoring classifiers once in production, needs to be based on robust metrics to measure misclassification. As commonly utilized metrics, such as the Fß-score may not take into account key aspects applicable to prices statistics in all cases, such as unequal importance of categories, a careful consideration of the metric space is necessary to select appropriate methods to evaluate classifiers. This working paper provides insight on the metric space applicable to price statistics and proposes an operational framework to evaluate and monitor classifiers, focusing specifically on the needs of the Canadian Consumer Prices Index and demonstrating discussed metrics using a publicly available dataset.

    Key Words: Consumer price index; supervised classification; evaluation metrics; taxonomy

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100013
    Description: Statistics Canada’s Labour Force Survey (LFS) plays a fundamental role in the mandate of Statistics Canada. The labour market information provided by the LFS is among the most timely and important measures of the Canadian economy’s overall performance. An integral part of the LFS monthly data processing is the coding of respondent’s industry according to the North American Industrial Classification System (NAICS), occupation according to the National Occupational Classification System (NOC) and the Primary Class of Workers (PCOW). Each month, up to 20,000 records are coded manually. In 2020, Statistics Canada worked on developing Machine Learning models using fastText to code responses to the LFS questionnaire according to the three classifications mentioned previously. This article will provide an overview on the methodology developed and results obtained from a potential application of the use of fastText into the LFS coding process. 

    Key Words: Machine Learning; Labour Force Survey; Text classification; fastText.

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100028
    Description:

    Many Government of Canada groups are developing codes to process and visualize various kinds data, often duplicating each other’s efforts, with sub-optimal efficiency and limited level of code quality reviewing. This paper informally presents a working-level approach to addressing this technical problem. The idea is to collaboratively build a common repository of code and knowledgebase for use by anyone in the public sector to perform many common data science tasks, and, in doing that, help each other to master both the data science coding skills and the industry standard collaborative practices. The paper explains why R language is used as the language of choice for collaborative data science code development. It summaries R advantages and addresses its limitations, establishes the taxonomy of discussion topics of highest interested to the GC data scientists working with R, provides an overview of used collaborative platforms, and presents the results obtained to date. Even though the code knowledgebase is developed mainly in R, it is meant to be valuable also for data scientists coding in Python and other development environments. Key Words: Collaboration; Data science; Data Engineering; R; Open Government; Open Data; Open Science

    Release date: 2021-10-29

  • Articles and reports: 11-522-X202100100001
    Description:

    We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a “working” reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The actual implementation is based on a novel application of the empirical likelihood method. The proposed method is particularly attractive for combining information from several sources with different missing patterns. The proposed method is applied to a real data example combining survey data from Korean National Health and Nutrition Examination Survey and big data from National Health Insurance Sharing Service in Korea.

    Key Words: Big data; Empirical likelihood; Measurement error models; Missing covariates.

    Release date: 2021-10-15

  • Articles and reports: 11-522-X202100100002
    Description:

    A framework for the responsible use of machine learning processes has been developed at Statistics Canada. The framework includes guidelines for the responsible use of machine learning and a checklist, which are organized into four themes: respect for people, respect for data, sound methods, and sound application. All four themes work together to ensure the ethical use of both the algorithms and results of machine learning. The framework is anchored in a vision that seeks to create a modern workplace and provide direction and support to those who use machine learning techniques. It applies to all statistical programs and projects conducted by Statistics Canada that use machine learning algorithms. This includes supervised and unsupervised learning algorithms. The framework and associated guidelines will be presented first. The process of reviewing projects that use machine learning, i.e., how the framework is applied to Statistics Canada projects, will then be explained. Finally, future work to improve the framework will be described.

    Keywords: Responsible machine learning, explainability, ethics

    Release date: 2021-10-15

  • Articles and reports: 11-522-X202100100003
    Description:

    The increasing size and richness of digital data allow for modeling more complex relationships and interactions, which is the strongpoint of machine learning. Here we applied gradient boosting to the Dutch system of social statistical datasets to estimate transition probabilities into and out of poverty. Individual estimates are reasonable, but the main advantages of the approach in combination with SHAP and global surrogate models are the simultaneous ranking of hundreds of features by their importance, detailed insight into their relationship with the transition probabilities, and the data-driven identification of subpopulations with relatively high and low transition probabilities. In addition, we decompose the difference in feature importance between general and subpopulation into a frequency and a feature effect. We caution for misinterpretation and discuss future directions.

    Key Words: Classification; Explainability; Gradient boosting; Life event; Risk factors; SHAP decomposition.

    Release date: 2021-10-15

  • Articles and reports: 11-522-X202100100019
    Description: Official statistical agencies must continually seek new methods and techniques that can increase both program efficiency and product relevance. The U.S. Census Bureau’s measurement of construction activity is currently a resource-intensive endeavor, relying heavily on monthly survey response via questionnaires and extensive field data collection. While our data users continually require more timely and granular data products, the traditional survey approach and associated collection cost and respondent burden limits our ability to meet that need. In 2019, we began research on whether the application of machine learning techniques to satellite imagery could accurately estimate housing starts and completions while meeting existing monthly indicator timelines at a cost equal to or less than existing methods. Using historical Census construction survey data in combination with targeted satellite imagery, the team trained, tested, and validated convolutional neural networks capable of classifying images by their stage of construction demonstrating the viability of a data science-based approach to producing official measures of construction activity.

    Key Words: Official Statistics; Housing Starts, Machine Learning, Satellite Imagery

    Release date: 2021-10-15

  • 19-22-0007
    Description:

    Course Duration: 2 days

    Course Cost: There is no cost for Statistics Canada employees. The cost for external participants is $200 per day.

    Course Language: Offered in English and in French

    Pre-requisites: Knowledge of SAS is highly recommended. Knowledge equivalent to the SAS 9 Programming 1: Essentials course is a minimum.

    To familiarize participants with raking methods and software. Raking deals with the problem of restoring cross-sectional aggregation constraints in time series systems. Optionally, temporal constraints can also be preserved. We also use the words reconciliation and balancing.

    Benefits to Participants: Upon completion of the course, the participants will be able to understand some of the raking techniques in use at Statistics Canada. They will acquire the technical knowledge to run PROC TSRAKING, aSAS procedure developed at Statistics Canada. The course is practical, technical and theoretical.

    Course outline: Introduction; One and two dimensional raking with or without annual constraints; Alterability coefficients; Pro-rating and proportional iterative raking methods; Raking method implemented in PROC TSRAKING: numerical optimization approach with alterability coefficients; Time series system with multiple raking rules; Movement preservation.

    Release date: 2021-10-13

  • Surveys and statistical programs – Documentation: 11-633-X2021005
    Description:

    The Analytical Studies and Modelling Branch (ASMB) is the research arm of Statistics Canada mandated to provide high-quality, relevant and timely information on economic, health and social issues that are important to Canadians. The branch strategically makes use of expert knowledge and a broad range of data sources and modelling techniques to address the information needs of a broad range of government, academic and public sector partners and stakeholders through analysis and research, modeling and predictive analytics, and data development. The branch strives to deliver relevant, high-quality, timely, comprehensive, horizontal and integrated research and to enable the use of its research through capacity building and strategic dissemination to meet the user needs of policy makers, academics and the general public.

    This Multi-year Consolidated Plan for Research, Modelling and Data Development outlines the priorities for the branch over the next two years.

    Release date: 2021-08-12

  • Stats in brief: 89-20-00062020002
    Description:

    This video is intended to teach viewers the differences between three fundamental statistical concepts. First, the mean, then the median and finally, the mode.

    Release date: 2021-05-03
Data (1)

Data (1) ((1 result))

  • Table: 82-567-X
    Description:

    The National Population Health Survey (NPHS) is designed to enhance the understanding of the processes affecting health. The survey collects cross-sectional as well as longitudinal data. In 1994/95 the survey interviewed a panel of 17,276 individuals, then returned to interview them a second time in 1996/97. The response rate for these individuals was 96% in 1996/97. Data collection from the panel will continue for up to two decades. For cross-sectional purposes, data were collected for a total of 81,000 household residents in all provinces (except people on Indian reserves or on Canadian Forces bases) in 1996/97.

    This overview illustrates the variety of information available by presenting data on perceived health, chronic conditions, injuries, repetitive strains, depression, smoking, alcohol consumption, physical activity, consultations with medical professionals, use of medications and use of alternative medicine.

    Release date: 1998-07-29
Analysis (102)

Analysis (102) (0 to 10 of 102 results)

  • Articles and reports: 11-522-X202200100002
    Description: The authors used the Splink probabilistic linkage package developed by the UK Ministry of Justice, to link census data from England and Wales to itself to find duplicate census responses. A large gold standard of confirmed census duplicates was available meaning that the results of the Splink implementation could be quality assured. This paper describes the implementation and features of Splink, gives details of the settings and parameters that we used to tune Splink for our particular project, and gives the results that we obtained.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100020
    Description: The reconciliation of 2021 census dwellings with the new Statistical Building Register (SBgR) presented linkage challenges. The Census of Population collected information from various dwelling types. For a large proportion of the population, mailing addresses were at the centre: they were used for reaching out to people and collected as contact info. In parallel, the register environment has been evolving. The agency is transitioning from the Address Register (AR) to the SBgR holding both mailing and location addresses, while also covering non-residential buildings. The reconciliation was conducted using a combination of systems, notably the new Register Matching Engine (RME) for difficult cases. The RME holds an interesting range of sophisticated string comparators. A deterministic linkage approach was used, while incorporating some data knowledge like the entropy. Through metadata, the matching expert could also reduce the amounts of false positives and false negatives.
    Release date: 2024-03-25

  • Journals and periodicals: 11-522-X
    Description: Since 1984, an annual international symposium on methodological issues has been sponsored by Statistics Canada. Proceedings have been available since 1987.
    Release date: 2024-03-25

  • Journals and periodicals: 12-001-X
    Geography: Canada
    Description: The journal publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves.
    Release date: 2024-01-03

  • Articles and reports: 82-003-X202301200002
    Description: The validity of survival estimates from cancer registry data depends, in part, on the identification of the deaths of deceased cancer patients. People whose deaths are missed seemingly live on forever and are informally referred to as “immortals”, and their presence in registry data can result in inflated survival estimates. This study assesses the issue of immortals in the Canadian Cancer Registry (CCR) using a recently proposed method that compares the survival of long-term survivors of cancers for which “statistical” cure has been reported with that of similar people from the general population.
    Release date: 2023-12-20

  • Journals and periodicals: 12-206-X
    Description: This report summarizes the annual achievements of the Methodology Research and Development Program (MRDP) sponsored by the Modern Statistical Methods and Data Science Branch at Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the agency’s statistical programs; these activities would otherwise be less likely to be carried out during the provision of regular methodology services to those programs. The MRDP also includes activities that provide support in the application of past successful developments in order to promote the use of the results of research and development work. Selected prospective research activities are also presented.
    Release date: 2023-10-11

  • Articles and reports: 75F0002M2022003
    Description: This discussion paper describes the proposed methodology for a Northern Market Basket Measure (MBM-N) for Nunavut, as well as identifies research which could be conducted in preparation for the 2023 review. The paper presents initial MBM-N thresholds and provides preliminary poverty estimates for reference years 2018 to 2021. A review period will follow the release of this paper, during which time Statistics Canada and Employment and Social Development Canada will welcome feedback from interested parties and work with experts, stakeholders, indigenous organizations, federal, provincial and territorial officials to validate the results.
    Release date: 2023-06-21

  • Articles and reports: 75F0002M2022004
    Description:

    This technical paper describes the results of the review period, including small adjustments to the disposable income amounts used in the discussion paper Construction of a Northern Market Basket Measure (MBM-N) of poverty for Yukon and the Northwest Territories. It also marks the end of the review period for the MBM-N for Yukon and the Northwest Territories by presenting the latest poverty estimates for reference year 2020.

    Release date: 2022-11-03

  • Articles and reports: 11-633-X2022002
    Description:

    This paper provides a description of the conceptual framework of the modernized system of national quality-of-life statistics that Statistics Canada is planning to implement within the next 5 to 10 years. Consistent with 50 years of dialogue on the improvement of social statistics, the conceptual framework proposes the adoption of a micro-level approach to describe how society operates and help create a cohesive and integrated system of quality-of-life statistics.

    Release date: 2022-06-01

  • Stats in brief: 45-20-00032022002
    Description:

    Canada’s diversity and rich cultural heritage have been shaped by the people who have come from all over the world to call it home. But even in our multicultural society, eliminating all forms of discrimination remains a challenge. In this episode, we turn a critical eye to the ways that cognitive bias risks perpetuating systemic racism. Statistics are supposed to accurately reflect the world around us, but are all data created equal? Join our guests, Sarah Messou-Ghelazzi, Communications Officer, Filsan Hujaleh, Analyst with the Centre for Social Data Insights and Innovation, and Jeff Latimer, Director General - Accountable for Health, Justice, Diversity and Populations at Statistics Canada as we explore the role data can play to make Canada a more equal society for all.

    Release date: 2022-03-16
Reference (54)

Reference (54) (40 to 50 of 54 results)

  • Surveys and statistical programs – Documentation: 62F0026M2004003
    Geography: Province or territory
    Description:

    This guide presents information of interest to users of data from the Survey of Household Spending, which gathers information on the spending habits, dwelling characteristics and household equipment of Canadian households.

    This guide includes definitions of survey terms and variables, as well as descriptions of survey methodology and data quality. One section describes the statistics that can be created using expenditure data (e.g., budget share, market share and aggregates).

    Release date: 2004-12-13

  • Surveys and statistical programs – Documentation: 92-390-X
    Description:

    This report includes a definition of the 2001 place of work concept and the place of work geography, standard text on data collection and coverage (including data collection methods, special coverage studies, sampling and weighting, edit and follow-up, coverage and content considerations). Both standard and subject-matter specific text pieces are also included for data assimilation (automated as well as interactive coding), edit and imputation and data evaluation. Finally, this technical report includes a section on historical comparability.

    Release date: 2004-08-26

  • Surveys and statistical programs – Documentation: 92-383-X
    Description:

    This report discusses various aspects of the quality of data on mother tongue, language spoken at home, knowledge of language and language at work. In the 2001 Census questionnaire, there are five questions on these four language categories. These questions, complemented by questions on ethnicity, religious affiliation and immigration, provide an opportunity to study linguistic and cultural characteristics of Canadians. These questions on languages are designed to collect the demolinguistic data. Demolinguistics, a subdiscipline of demography (not of linguistics), involves the demographic analysis of data on languages. Such analysis is useful for our understanding of, for instance, the linguistic diversity of Canadians, the evolution of language groups, or the transmission of mother tongue between generations. For each of the four categories of language questions mentioned above, the report describes briefly the procedures of data collection, some aspects of coverage, the processing stages of the data verification operation and the procedures used for editing and imputing the language variables. Finally, a description on how the data were evaluated will be presented.

    Release date: 2004-01-27

  • Surveys and statistical programs – Documentation: 92-384-X
    Description:

    This report provides information on various aspects of the data on mobility and migration. It provides a review of the questions, concepts and definitions, along with a discussion of limitations inherent in the measurement of one-year and five-year mobility and migration in the censuses of Canada. Some background is provided on the processing of mobility data, from collection through to retrieval. The historical comparability of mobility and migration data from 1961 to 2001 is examined in terms of conceptual and processing changes. The analysis of the quality of 2001 data focuses mainly on the quality at the national and provincial level. Where possible, the five-year data and the one-year data are discussed separately.

    Release date: 2004-01-27

  • Surveys and statistical programs – Documentation: 62F0026M2003002
    Geography: Province or territory
    Description:

    This guide presents information of interest to users of data from the Survey of Household Spending. Conducted in January, February and March after the reference year, data are collected via personal interview using a paper questionnaire. Information is gathered about the spending habits, dwelling characteristics and household equipment of Canadian households during the reference year. The survey covers private households in the 10 provinces. (The territories are surveyed every second year, starting in 2001.) This guide includes definitions of survey terms and variables, as well as descriptions of survey methodology and data quality. There is also a section describing the various statistics that can be created using expenditure data (e.g., budget share, market share and aggregates).

    Release date: 2003-12-17

  • Surveys and statistical programs – Documentation: 92-380-X
    Description:

    This report focuses on five demographic variables: date of birth, age, sex, marital status and common-law status. The report describes how the data were collected, verified, processed, edited and imputed. The final section covers how the data were evaluated.

    Release date: 2003-10-28

  • Surveys and statistical programs – Documentation: 62F0026M2002002
    Geography: Province or territory
    Description:

    This guide presents information of interest to users of data from the Survey of Household Spending. Data are collected via paper questionnaires and personal interviews conducted in January, February and March after the reference year. Information is gathered about the spending habits, dwelling characteristics and household equipment of Canadian households during the reference year. The survey covers private households in the 10 provinces and the 3 territories. (The territories are surveyed every second year, starting in 2001.) This guide includes definitions of survey terms and variables, as well as descriptions of survey methodology and data quality. There is also a section describing the various statistics that can be created using expenditure data (e.g., budget share, market share and aggregates).

    Release date: 2002-12-11

  • Surveys and statistical programs – Documentation: 62F0026M2001004
    Geography: Province or territory
    Description:

    This guide presents information of interest to users of data from the Survey of Household Spending. Data are collected via personal interview conducted in January, February and March after the reference year using a paper questionnaire. Information is gathered about the spending habits, dwelling characteristics and household equipment of Canadian households during the reference year. The survey covers private households in the ten provinces. (The three territories are surveyed every second year starting in 2001.)

    This guide includes definitions of survey terms and variables, as well as descriptions of survey methodology and data quality. There is also a section describing the various statistics that can be created using expenditure data (e.g., budget share, market share, and aggregates).

    Release date: 2001-12-12

  • Surveys and statistical programs – Documentation: 62F0026M2001003
    Description:

    This document provides a detailed description of the methodology of the Survey of Household Spending. Topics covered include: target population; sample design; data collection; data processing; weighting and estimation; estimation of sampling error; and data suppression and confidentiality.

    Release date: 2001-10-15

  • Notices and consultations: 87-004-X20000035566
    Geography: Canada
    Description:

    As with many other areas in Statistics Canada, the Culture Statistics Program (CSP) benefits from the informed advice of an external advisory committee. The National Advisory Committee on Culture Statistics (NACCS) was created in 1984 with a mandate to provide advice for the development of statistical activities related to all aspects of art and culture in Canada.

    Release date: 2001-03-16
Date modified: