Other content related to Statistical methods

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Type

1 facets displayed. 1 facets selected.

Geography

1 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (78)

All (78) (50 to 60 of 78 results)

  • Articles and reports: 11-522-X20030017699
    Description:

    This paper illustrates the link between the strategic needs of a national statistical office (NSO) and the methodological needs that this generates.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20030017716
    Description:

    This paper examines how risk and quality can be used to assist with investment decisions across the Office for National Statistics (ONS) in the United Kingdom. It discusses the construction of a table developed to provide measures of the strengths and weaknesses of statistical inputs and outputs.

    Release date: 2005-01-26

  • Articles and reports: 12-002-X20040027034
    Description:

    The use of command files in Stat/Transfer can expedite the transfer of several data sets in an efficient replicable manner. This note outlines a simple step-by-step method for creating command files and provides sample code.

    Release date: 2004-10-05

  • Articles and reports: 11-522-X20020016729
    Description:

    For most survey samples, if not all, we have to deal with the problem of missing values. Missing values are usually caused by nonresponse (such as refusal of participant or interviewer was unable to contact respondent) but can also be produced at the editing step of the survey in an attempt to resolve problems of inconsistent or suspect responses. The presence of missing values (nonresponse) generally leads to bias and uncertainty in the estimates. To treat this problem, the appropriate use of all available auxiliary information permits the maximum reduction of nonresponse bias and variance. During this presentation, we will define the problem, describe the methodology that SEVANI is based on and discuss potential uses of the system. We will end the discussion by presenting some examples based on real data to illustrate the theory in practice.

    In practice, it is very difficult to estimate the nonresponse bias. However, it is possible to estimate the nonresponse variance by assuming that the bias is negligible. In the last decade, many methods were indeed proposed to estimate this variance, and some of these have been implemented in the System for Estimation of Variance due to Nonresponse and Imputation (SEVANI).

    The methodology used to develop SEVANI is based on the theory of two-phase sampling where we assume that the second phase of selection is nonresponse. However, contrary to two-phase sampling, an imputation or nonresponse model is required for variance estimation. SEVANI also assumes that nonresponse is treated by reweighting respondent units or by imputing their missing values. Three imputation methods are considered: the imputation of an auxiliary variable, regression imputation (deterministic or random) and nearest-neighbour imputation.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016751
    Description:

    Closing remarks

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016752
    Description:

    Opening remarks of the Symposium 2002: Modelling Survey Data for Social and Economic Research, presented by David Binder.

    Release date: 2004-09-13

  • 57. Keynote address Archived
    Articles and reports: 11-522-X20020016753
    Description:

    Keynote Address.

    Release date: 2004-09-13

  • Articles and reports: 75F0002M2004001
    Description:

    The purpose of this document is to describe the detailed methodology and assumptions behind the construction of the market basket measure (MBM) of low income. Also this document will raise some issues and highlight some data limitations related to the MBM.

    The MBM represents the cost of a basket that includes a nutritious diet, clothing and footwear, shelter, transportation, and other necessary goods and services (such as personal care items or household supplies).

    The MBM methodology was developed by the Federal/Provincial/Territorial Working Group on Social Development Research and Information for Human Resources Development Canada (HRDC).

    Release date: 2004-02-04

  • Articles and reports: 81-595-M2003011
    Geography: Canada
    Description:

    This report presents a rethinking of the fundamental concepts used to guide statistical work on postsecondary education.

    Release date: 2003-12-23

  • Articles and reports: 12-001-X20030016600
    Description:

    International comparability of Official Statistics is important for domestic uses within any country. But international comparability matters also for the international uses of statistics; in particular the development and monitoring of global policies and assessing economic and social development throughout the world. Additionally statistics are used by international agencies and bilateral technical assistance programmes to monitor the impact of technical assistance.The first part of this paper describes how statistical indicators are used by the United Nations and other agencies. The framework of statistical indicators for these purposes is described ans some issues concerning the choice and quality of these indicators are identified.In the past there has been considerable methodological research in support of Official Statistics particularly by the strongest National Statistical Offices and some academics. This has established the basic methodologies for Official Statistics and has led to considerable developments and quality improvements over time. Much has been achieved. However the focus has, to an extent, been on national uses of Official Statistics. These developments have, of course, benefited the international uses, and some specific developments have also occurred. There is however a need to foster more methodological development on the international requirements. In the second part of this paper a number of examples illustrate this need.

    Release date: 2003-07-31
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (78)

Analysis (78) (10 to 20 of 78 results)

  • Articles and reports: 11-522-X202100100011
    Description: The ways in which AI may affect the world of official statistics are manifold and Statistics Netherlands (CBS) is actively exploring how it can use AI within its societal role. The paper describes a number of AI-related areas where CBS is currently active: use of AI for its own statistics production and statistical R&D, the development of a national AI monitor, the support of other government bodies with expertise on fair data and fair algorithms, data sharing under safe and secure conditions, and engaging in AI-related collaborations.

    Key Words: Artificial Intelligence; Official Statistics; Data Sharing; Fair Algorithms; AI monitoring; Collaboration.

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100012
    Description: The modernization of price statistics by National Statistical Offices (NSO) such as Statistics Canada focuses on the adoption of alternative data sources that include the near-universe of all products sold in the country, a scale that requires machine learning classification of the data. The process of evaluating classifiers to select appropriate ones for production, as well as monitoring classifiers once in production, needs to be based on robust metrics to measure misclassification. As commonly utilized metrics, such as the Fß-score may not take into account key aspects applicable to prices statistics in all cases, such as unequal importance of categories, a careful consideration of the metric space is necessary to select appropriate methods to evaluate classifiers. This working paper provides insight on the metric space applicable to price statistics and proposes an operational framework to evaluate and monitor classifiers, focusing specifically on the needs of the Canadian Consumer Prices Index and demonstrating discussed metrics using a publicly available dataset.

    Key Words: Consumer price index; supervised classification; evaluation metrics; taxonomy

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100013
    Description: Statistics Canada’s Labour Force Survey (LFS) plays a fundamental role in the mandate of Statistics Canada. The labour market information provided by the LFS is among the most timely and important measures of the Canadian economy’s overall performance. An integral part of the LFS monthly data processing is the coding of respondent’s industry according to the North American Industrial Classification System (NAICS), occupation according to the National Occupational Classification System (NOC) and the Primary Class of Workers (PCOW). Each month, up to 20,000 records are coded manually. In 2020, Statistics Canada worked on developing Machine Learning models using fastText to code responses to the LFS questionnaire according to the three classifications mentioned previously. This article will provide an overview on the methodology developed and results obtained from a potential application of the use of fastText into the LFS coding process. 

    Key Words: Machine Learning; Labour Force Survey; Text classification; fastText.

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100028
    Description:

    Many Government of Canada groups are developing codes to process and visualize various kinds data, often duplicating each other’s efforts, with sub-optimal efficiency and limited level of code quality reviewing. This paper informally presents a working-level approach to addressing this technical problem. The idea is to collaboratively build a common repository of code and knowledgebase for use by anyone in the public sector to perform many common data science tasks, and, in doing that, help each other to master both the data science coding skills and the industry standard collaborative practices. The paper explains why R language is used as the language of choice for collaborative data science code development. It summaries R advantages and addresses its limitations, establishes the taxonomy of discussion topics of highest interested to the GC data scientists working with R, provides an overview of used collaborative platforms, and presents the results obtained to date. Even though the code knowledgebase is developed mainly in R, it is meant to be valuable also for data scientists coding in Python and other development environments. Key Words: Collaboration; Data science; Data Engineering; R; Open Government; Open Data; Open Science

    Release date: 2021-10-29

  • Articles and reports: 11-522-X202100100001
    Description:

    We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a “working” reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The actual implementation is based on a novel application of the empirical likelihood method. The proposed method is particularly attractive for combining information from several sources with different missing patterns. The proposed method is applied to a real data example combining survey data from Korean National Health and Nutrition Examination Survey and big data from National Health Insurance Sharing Service in Korea.

    Key Words: Big data; Empirical likelihood; Measurement error models; Missing covariates.

    Release date: 2021-10-15

  • Articles and reports: 11-522-X202100100002
    Description:

    A framework for the responsible use of machine learning processes has been developed at Statistics Canada. The framework includes guidelines for the responsible use of machine learning and a checklist, which are organized into four themes: respect for people, respect for data, sound methods, and sound application. All four themes work together to ensure the ethical use of both the algorithms and results of machine learning. The framework is anchored in a vision that seeks to create a modern workplace and provide direction and support to those who use machine learning techniques. It applies to all statistical programs and projects conducted by Statistics Canada that use machine learning algorithms. This includes supervised and unsupervised learning algorithms. The framework and associated guidelines will be presented first. The process of reviewing projects that use machine learning, i.e., how the framework is applied to Statistics Canada projects, will then be explained. Finally, future work to improve the framework will be described.

    Keywords: Responsible machine learning, explainability, ethics

    Release date: 2021-10-15

  • Articles and reports: 11-522-X202100100003
    Description:

    The increasing size and richness of digital data allow for modeling more complex relationships and interactions, which is the strongpoint of machine learning. Here we applied gradient boosting to the Dutch system of social statistical datasets to estimate transition probabilities into and out of poverty. Individual estimates are reasonable, but the main advantages of the approach in combination with SHAP and global surrogate models are the simultaneous ranking of hundreds of features by their importance, detailed insight into their relationship with the transition probabilities, and the data-driven identification of subpopulations with relatively high and low transition probabilities. In addition, we decompose the difference in feature importance between general and subpopulation into a frequency and a feature effect. We caution for misinterpretation and discuss future directions.

    Key Words: Classification; Explainability; Gradient boosting; Life event; Risk factors; SHAP decomposition.

    Release date: 2021-10-15

  • Articles and reports: 11-522-X202100100019
    Description: Official statistical agencies must continually seek new methods and techniques that can increase both program efficiency and product relevance. The U.S. Census Bureau’s measurement of construction activity is currently a resource-intensive endeavor, relying heavily on monthly survey response via questionnaires and extensive field data collection. While our data users continually require more timely and granular data products, the traditional survey approach and associated collection cost and respondent burden limits our ability to meet that need. In 2019, we began research on whether the application of machine learning techniques to satellite imagery could accurately estimate housing starts and completions while meeting existing monthly indicator timelines at a cost equal to or less than existing methods. Using historical Census construction survey data in combination with targeted satellite imagery, the team trained, tested, and validated convolutional neural networks capable of classifying images by their stage of construction demonstrating the viability of a data science-based approach to producing official measures of construction activity.

    Key Words: Official Statistics; Housing Starts, Machine Learning, Satellite Imagery

    Release date: 2021-10-15

  • Articles and reports: 18-001-X2020001
    Description:

    This paper presents the methodology used to generate the first nationwide database of proximity measures and the results obtained with a first set of ten measures. The computational methods are presented as a generalizable model due to the fact that it is now possible to apply similar methods to a multitude of other services or amenities, in a variety of alternative specifications.

    Release date: 2021-02-15

  • Articles and reports: 11-633-X2020002
    Description:

    The concepts of urban and rural are widely debated and vary depending on a country’s geopolitical and sociodemographic composition. In Canada, population centres and statistical area classifications are widely used to distinguish urban and rural communities. However, neither of these classifications precisely classify Canadian communities into urban, rural and remote areas. A group of researchers at Statistics Canada developed an alternative tool called the “remoteness index” to measure the relative remoteness of Canadian communities. This study builds on the remoteness index, which is a continuous index, by examining how it can be classified into five discrete categories of remoteness geographies. When properly categorized, the remoteness index can be a useful tool to distinguish urban, rural and remote communities in Canada, while protecting the privacy and confidentiality of citizens. This study considers five methodological approaches and recommends three methods.

    Release date: 2020-08-11
Reference (0)

Reference (0) (0 results)

No content available at this time.

Date modified: