Other content related to Statistical methods

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Type

1 facets displayed. 1 facets selected.

Geography

1 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (79)

All (79) (10 to 20 of 79 results)

  • Articles and reports: 11-522-X202100100010
    Description:

    As part of processing for the 2021 Canadian Census, the write-in responses to 31 census questions must be coded. Up until, and including, 2016, this was a three stage process, including an “interactive (human) coding” step as the second stage. This human coding step is both lengthy and expensive, spanning many months and requiring the hiring and training of a large number of temporary employees. With this in mind, for 2021, this stage was either augmented with or replaced entirely by machine learning models using the "fastText" algorithm. This presentation will discuss the implementation of this algorithm and the challenges and decisions taken along the way.

    Key Words: Natural Language Processing, Machine Learning, fastText, Coding

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100011
    Description: The ways in which AI may affect the world of official statistics are manifold and Statistics Netherlands (CBS) is actively exploring how it can use AI within its societal role. The paper describes a number of AI-related areas where CBS is currently active: use of AI for its own statistics production and statistical R&D, the development of a national AI monitor, the support of other government bodies with expertise on fair data and fair algorithms, data sharing under safe and secure conditions, and engaging in AI-related collaborations.

    Key Words: Artificial Intelligence; Official Statistics; Data Sharing; Fair Algorithms; AI monitoring; Collaboration.

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100012
    Description: The modernization of price statistics by National Statistical Offices (NSO) such as Statistics Canada focuses on the adoption of alternative data sources that include the near-universe of all products sold in the country, a scale that requires machine learning classification of the data. The process of evaluating classifiers to select appropriate ones for production, as well as monitoring classifiers once in production, needs to be based on robust metrics to measure misclassification. As commonly utilized metrics, such as the Fß-score may not take into account key aspects applicable to prices statistics in all cases, such as unequal importance of categories, a careful consideration of the metric space is necessary to select appropriate methods to evaluate classifiers. This working paper provides insight on the metric space applicable to price statistics and proposes an operational framework to evaluate and monitor classifiers, focusing specifically on the needs of the Canadian Consumer Prices Index and demonstrating discussed metrics using a publicly available dataset.

    Key Words: Consumer price index; supervised classification; evaluation metrics; taxonomy

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100013
    Description: Statistics Canada’s Labour Force Survey (LFS) plays a fundamental role in the mandate of Statistics Canada. The labour market information provided by the LFS is among the most timely and important measures of the Canadian economy’s overall performance. An integral part of the LFS monthly data processing is the coding of respondent’s industry according to the North American Industrial Classification System (NAICS), occupation according to the National Occupational Classification System (NOC) and the Primary Class of Workers (PCOW). Each month, up to 20,000 records are coded manually. In 2020, Statistics Canada worked on developing Machine Learning models using fastText to code responses to the LFS questionnaire according to the three classifications mentioned previously. This article will provide an overview on the methodology developed and results obtained from a potential application of the use of fastText into the LFS coding process. 

    Key Words: Machine Learning; Labour Force Survey; Text classification; fastText.

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100028
    Description:

    Many Government of Canada groups are developing codes to process and visualize various kinds data, often duplicating each other’s efforts, with sub-optimal efficiency and limited level of code quality reviewing. This paper informally presents a working-level approach to addressing this technical problem. The idea is to collaboratively build a common repository of code and knowledgebase for use by anyone in the public sector to perform many common data science tasks, and, in doing that, help each other to master both the data science coding skills and the industry standard collaborative practices. The paper explains why R language is used as the language of choice for collaborative data science code development. It summaries R advantages and addresses its limitations, establishes the taxonomy of discussion topics of highest interested to the GC data scientists working with R, provides an overview of used collaborative platforms, and presents the results obtained to date. Even though the code knowledgebase is developed mainly in R, it is meant to be valuable also for data scientists coding in Python and other development environments. Key Words: Collaboration; Data science; Data Engineering; R; Open Government; Open Data; Open Science

    Release date: 2021-10-29

  • Articles and reports: 11-522-X202100100001
    Description:

    We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a “working” reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The actual implementation is based on a novel application of the empirical likelihood method. The proposed method is particularly attractive for combining information from several sources with different missing patterns. The proposed method is applied to a real data example combining survey data from Korean National Health and Nutrition Examination Survey and big data from National Health Insurance Sharing Service in Korea.

    Key Words: Big data; Empirical likelihood; Measurement error models; Missing covariates.

    Release date: 2021-10-15

  • Articles and reports: 11-522-X202100100002
    Description:

    A framework for the responsible use of machine learning processes has been developed at Statistics Canada. The framework includes guidelines for the responsible use of machine learning and a checklist, which are organized into four themes: respect for people, respect for data, sound methods, and sound application. All four themes work together to ensure the ethical use of both the algorithms and results of machine learning. The framework is anchored in a vision that seeks to create a modern workplace and provide direction and support to those who use machine learning techniques. It applies to all statistical programs and projects conducted by Statistics Canada that use machine learning algorithms. This includes supervised and unsupervised learning algorithms. The framework and associated guidelines will be presented first. The process of reviewing projects that use machine learning, i.e., how the framework is applied to Statistics Canada projects, will then be explained. Finally, future work to improve the framework will be described.

    Keywords: Responsible machine learning, explainability, ethics

    Release date: 2021-10-15

  • Articles and reports: 11-522-X202100100003
    Description:

    The increasing size and richness of digital data allow for modeling more complex relationships and interactions, which is the strongpoint of machine learning. Here we applied gradient boosting to the Dutch system of social statistical datasets to estimate transition probabilities into and out of poverty. Individual estimates are reasonable, but the main advantages of the approach in combination with SHAP and global surrogate models are the simultaneous ranking of hundreds of features by their importance, detailed insight into their relationship with the transition probabilities, and the data-driven identification of subpopulations with relatively high and low transition probabilities. In addition, we decompose the difference in feature importance between general and subpopulation into a frequency and a feature effect. We caution for misinterpretation and discuss future directions.

    Key Words: Classification; Explainability; Gradient boosting; Life event; Risk factors; SHAP decomposition.

    Release date: 2021-10-15

  • Articles and reports: 11-522-X202100100019
    Description: Official statistical agencies must continually seek new methods and techniques that can increase both program efficiency and product relevance. The U.S. Census Bureau’s measurement of construction activity is currently a resource-intensive endeavor, relying heavily on monthly survey response via questionnaires and extensive field data collection. While our data users continually require more timely and granular data products, the traditional survey approach and associated collection cost and respondent burden limits our ability to meet that need. In 2019, we began research on whether the application of machine learning techniques to satellite imagery could accurately estimate housing starts and completions while meeting existing monthly indicator timelines at a cost equal to or less than existing methods. Using historical Census construction survey data in combination with targeted satellite imagery, the team trained, tested, and validated convolutional neural networks capable of classifying images by their stage of construction demonstrating the viability of a data science-based approach to producing official measures of construction activity.

    Key Words: Official Statistics; Housing Starts, Machine Learning, Satellite Imagery

    Release date: 2021-10-15

  • Articles and reports: 18-001-X2020001
    Description:

    This paper presents the methodology used to generate the first nationwide database of proximity measures and the results obtained with a first set of ten measures. The computational methods are presented as a generalizable model due to the fact that it is now possible to apply similar methods to a multitude of other services or amenities, in a variety of alternative specifications.

    Release date: 2021-02-15
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (79)

Analysis (79) (60 to 70 of 79 results)

  • Articles and reports: 12-001-X20030016600
    Description:

    International comparability of Official Statistics is important for domestic uses within any country. But international comparability matters also for the international uses of statistics; in particular the development and monitoring of global policies and assessing economic and social development throughout the world. Additionally statistics are used by international agencies and bilateral technical assistance programmes to monitor the impact of technical assistance.The first part of this paper describes how statistical indicators are used by the United Nations and other agencies. The framework of statistical indicators for these purposes is described ans some issues concerning the choice and quality of these indicators are identified.In the past there has been considerable methodological research in support of Official Statistics particularly by the strongest National Statistical Offices and some academics. This has established the basic methodologies for Official Statistics and has led to considerable developments and quality improvements over time. Much has been achieved. However the focus has, to an extent, been on national uses of Official Statistics. These developments have, of course, benefited the international uses, and some specific developments have also occurred. There is however a need to foster more methodological development on the international requirements. In the second part of this paper a number of examples illustrate this need.

    Release date: 2003-07-31

  • Articles and reports: 12-001-X20030018802
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2003-07-31

  • Articles and reports: 11-522-X20010016263
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    This paper describes the Annual Business Inquiry (ABI) project to integrate the Office for National Statistics' (ONS) main, annual business surveys, regardless of economic sectors. The ABI project also brings together employment and financial data surveys and is capable of generating a wide range of subnational analyses, another objective of the development. Methodological aspects covered by the paper include sample design; estimation and outlier treatment; apportionment of data from reporting units to local units (individual sites) and the methodology for subnational and small area estimation. The subnational methodology involves the use of logistic and loglinear models.

    Release date: 2002-09-12

  • Articles and reports: 11-522-X20010016264
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    Conducting a census by traditional methods is becoming more difficult. The possibility of cross-linking administrative files provides an attractive alternative to conducting periodic censuses (Laihonen, 2000; Borchsenius, 2000). This method was proposed in a recent article by Nathan (2001). The Institut National de la Statistique et des Études Économiques (INSEE) census redesign is based on the idea of a "continuous census," originally suggested by Kish (1981, 1990) and Horvitz (1986). The first approach, which could be feasible in France, can be found in Deville and Jacod's paper (1996). This particular article reviews the methodological developments and approaches used since INSEE started its population census redesign program.

    Release date: 2002-09-12

  • Articles and reports: 11-522-X20010016272
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    The French survey of homeless people using support services is unique because of its scope and the conditions under which it was conducted. About 4,000 users of shelters and soup kitchens were surveyed in January and February 2001. Because some users move from one service point to another, it was necessary to collect precise data on the number of times each respondent used such services (meals and person-nights) during the week preceding the survey. Data quality is extremely important since it has a major impact on the sampling weight assigned to each individual.

    Release date: 2002-09-12

  • Articles and reports: 88F0006X2002012
    Description:

    Statistics Canada's Survey of Innovation 1999 surveyed manufacturing in the fall of 1999. It was the first innovation survey of selected natural resource industries. This paper is part of a series of working papers based on the Survey of Innovation 1999. This paper details the survey methodology, including decisions taken and lessons learned regarding survey design.

    Release date: 2002-06-28

  • Articles and reports: 11-522-X19990015646
    Geography: Canada
    Description:

    The current economic context obliges all partners of health-care systems, whether public or private, to identify those factors that determine the use of health-care services. To increase our understanding of the phenomena that underlie these relationships, Statistics Canada and the Manitoba Centre for Health Policy and Evaluation have established a new database. For a representative sample of the province of Manitoba, cross-sectional micro-data on the level of health of individuals and on their socioeconomic characteristics, and detailed longitudinal data on the use of health-care services have been linked. In this presentation, we will discuss the general context of the linkage of records from various organizations, the protection of privacy and confidentiality. We will also present results of studies which should not have been performed in the absence of the linked database.

    Release date: 2000-03-02

  • Articles and reports: 12-001-X19990024875
    Geography: Canada
    Description:

    Dr. Fellegi considers the challenges facing government statistical agencies and strategies to prepare for these challenges. He first describes the environment of changing information needs and the social, economic and technological developments driving this change. He goes on to describe both internal and external elements of a strategy to meet these evolving needs. Internally, a flexible capacity for survey taking and information gathering must be developed. Externally, contacts must be developed to ensure continuing relevance of statistical programs while maintaining non-political objectivity.

    Release date: 2000-03-01

  • Articles and reports: 88F0006X1999006
    Description:

    This study provides background information towards developing working definitions of e-commerce. In addition, through selected case studies it examines whether respondents could provide information for such measurements. The study distinguishes between e-commerce and e-business, the former being a sub-set of the latter and emphasizes computer-mediation as an important feature of this phenomenon. A definition of e-commerce is then proposed: "Transactions carried over computer-mediated channels that comprise the transfer of ownership or the entitlement to use tangible or intangible assets".

    Release date: 1999-08-20

  • Articles and reports: 82-003-X19980044511
    Geography: Canada
    Description:

    This article discusses some of the benefits and challenges of data from a longitudinal panel as exemplified by the National Population Health Survey (NPHS). An overview of content and collection methods, sample design, response rates, and some of the special methodological and operational approaches for this longitudinal survey.

    Release date: 1999-04-29
Reference (0)

Reference (0) (0 results)

No content available at this time.

Date modified: