Statistical Methodology Research and Development Program Achievements, 2024/2025
5. Support (Resource Centres)

5.1 Time Series Research and Analysis Centre

The objective of the Time Series Research and Analysis Centre is to maintain high-level expertise and offer consultation in time series throughout the agency. The Centre provides consultation and advice on problems related to time series, explores problems that do not currently have known or satisfactory solutions, and develops and maintains tools to apply solutions to real-life time series problems.

The projects can be split into four sub-topics with emphasis on the following:

Consultation and training in time series;
Support and enhancement of the time series processing system and tools;
Time series modelling and forecasting;
Methodological support to consumer and producer price index programs.

Progress:

Consultation and training in time series

The Time Series Research and Analysis Centre is responsible for developing and delivering training on time series methods including seasonal adjustment, benchmarking, reconciliation, and time series modelling to participants from Statistics Canada as well as those from other agencies. In addition, the Centre provides guidance and consultation on time series projects in general for programs throughout Statistics Canada.

The Centre offered courses on time series components, seasonal adjustment, benchmarking, and reconciliation during the year to internal and external participants through the Statistics Canada training centre (Statistics Canada, 2024). In particular, the introductory course on time series components and seasonal adjustment was offered to members of the Lao Statistics Bureau. The Centre also participated in outreach and ad hoc training to other groups in Statistics Canada on time series topics (methodology branch seminar series for recruits, data navigator course and seasonal adjustment dashboard demonstrations).

The Centre has also offered consultation to various internal programs (seasonal adjustment, time series modelling, backcasting, nowcasting, forecasting, trend estimation, calendarization, etc.). In particular, the Centre provided time series support to the System of National Accounts in a number of areas, including the quarterly and monthly Gross Domestic Product programs, balance of payments, international trade in services, and securities transactions. Representatives from the Centre also periodically attend a weekly analyst forum to maintain a presence in the analyst community. The Centre regularly consults on back-casting to preserve or restore comparability across time. Work on producing guidelines on time series continuity continued and was integrated into a larger project on updating and expanding guidelines on time series methods for Statistics Canada’s programs. Recent work involved discussing the proposed guidelines with our Departmental Project Management Office to determine how the guidelines could be integrated into project management framework.

In addition, to support various internal programs, the Centre consulted and exchanged externally on time series topics (seasonal adjustment strategy during the pandemic, backcasting, deflation, software tools, etc.) with multiple federal and provincial public agencies, as well as national statistical organizations.

Support and enhancement of the time series processing system and tools

The Time Series Research and Analysis Centre develops and maintains a number of important tools used to process and analyse time series data for the Statistics Canada programs producing seasonally adjusted data, in particular the Generalized System G-Series, for benchmarking and reconciliation (raking and balancing) (Statistics Canada, 2016; Ferland, 2025), the Time Series Processing System (Ferland, 2022), and the Seasonal Adjustment Dashboard (Verret, 2021).

The remaining work on a new, open-source, version of G-Series in R was completed, with the addition of the balancing functionality, utility functions, and expanded bilingual documentation. The new version of G-Series is developed and maintained in Statistics Canada's internal GitLab environment using Continuous Integration/Continuous Delivery (CI/CD) best practices. It underwent alpha and beta testing, as well as a code review and a cybersecurity application vulnerability assessment. The new G-Series will be released officially as an R package on GitHub and CRAN in May 2025.

The Time Series Processing System (TSPS) is a customizable SAS-based application to apply time series techniques including seasonal adjustment, benchmarking, and reconciliation, used extensively in the production of seasonally adjusted estimates for sub-annual programs within Statistics Canada (many of them being mission critical). The system is in a mature and stable state. However, it requires updating on an ongoing basis to broaden functionality and address new needs of programs in the agency. Just like G-Series, the TSPS will be redeveloped in R. The work is planned to start next year and will provide the flexibility to incorporate tools and new techniques available from open-source software.

The Seasonal Adjustment Dashboard is now hosted on Statistics Canada’s internal GitLab and reconfigured to make it easier to install. Other minor improvements to the dashboard include reducing dependencies and decreasing the load time. The dashboard was implemented for additional two labour programs this year, during which a minor revision was done to incorporate a program-specific calendar effect. The Centre also provided training to subject-matter analysts.

Time Series modelling and forecasting

The Centre completed a project on producing early indications of structural breaks using state-space models. An R tool was created to identify and model shocks (sudden changes) in time series, either in measurement equations (additive outliers) or state equations (more permanent effects).

The Labour Force Survey calibrates some of its estimates to population counts. The non-permanent resident portion of the population varies seasonally (in part due to international students and temporary workers). To stabilize calibration, a simple methodology is currently used to smooth out these fluctuations (12-month moving average). The Centre started a study to determine whether seasonal adjustment and trend-cycle estimation techniques could provide an improved methodology, especially in times of rapid changes in that segment of the population. The investigation is ongoing.

The COVID-19 pandemic period heavily affected many economic time series during the 2020-2022 period. With data now available for 2023 and 2024, the impact of the pandemic on seasonal patterns, the trend-cycle, and volatility can start to be assessed. The Centre has begun a preliminary investigation using representative time series from various statistical programs (labour, trade, manufacturing, building permits and investment in building construction, tourism, and consumer price index) to determine whether these series have returned to pre-pandemic patterns or have shifted to a new post-pandemic regime. The investigation will continue to be refined as more post-pandemic data becomes available.

The Centre also contributed to a nowcasting project as part of Statistics Canada’s Methodological Acceleration Initiative, an initiative started to address the need to make a significant leap forward in the Agency’s ability to efficiently and quickly implement modern solutions to data challenges. The Centre developed, tested, and validated one scenario of macro-estimate nowcasting for the Retail Commodity Survey, using ARIMA models and Neural Networks. Additionally, one scenario of micro-modeling, at the enterprise level, was developed, tested, and validated.

Methodological support to consumer and producer price index programs

The Time Series Research and Analysis Centre also has a unit dedicated to providing methodological support to consumer and producer price index (CPI and PPI) programs.

This unit developed and tested a sampling protocol for quality control of machine learning classifiers used in large scale multi-class classification problems and longitudinal collection. For the last few years, the CPI has incorporated point-of-sale scanner data, giving a census of transactions from a few major retailers each month. Product text descriptions need to be classified into CPI’s hierarchical aggregation structure of over 700 commodity classes. A linear support vector machine classifier has been trained to do this classification. However, it makes too many errors to be acceptable for the CPI, requiring additional review by human annotators. Cost of human annotation has been the limiting step in expanding the use of scanner data. Three sampling strategies were developed to select cases for human annotation and provide estimates of misclassification error rates with modified Wilson score confidence intervals. The Centre collaborated with data scientists from the Consumer Prices Division to implement these sampling strategies in Python and test them on simulated cases of food sales, with the goal of making code publicly available for other national statistical offices and researchers. Simulation results favoured a hybrid between two strategies: 1) a stratified simple random sample for new products each month to estimate misclassification rates and monitor model performance, and 2) a probability-proportional-to-size sample among unreviewed units from previous months, targeted at cases most likely to cause misclassification bias in the CPI estimate, based on an influence measure derived for price indices (Spackman, Francis and Goussev, 2025). The strategy and simulation results were presented at an April 2025 United Nations Economic Commission for Europe (UNECE) meeting on consumer price index methods in Geneva, Switzerland.

The Centre also consulted on the development of a set of quality indicators for the CPI. The first phase of this project produced a set of quantitative indicators and an aggregate measure for potential collection and processing issues. As well, a preliminary set of qualitative indicators covering Statistics Canada’s six dimensions of quality were developed (Francis, Carrillo-Garcia, Yélou and Rassart, 2025).

For more information, please contact:
Etienne Rassart (etienne.rassart@statcan.gc.ca).

References

Ferland, M. (2022). Time Series Processing System – v3.08. Internal document, Statistics Canada.

Ferland, M. (2025). gseries: Improve the Coherence of Your Time Series Data. R package version 3.0.2, https://StatCan.github.io/gensol-gseries/en/.

Francis, J., Carrillo-Garcia, I., Yélou, C. and Rassart, E. (2025). Developing CPI Quality Indicators: Project Update. Internal document, Statistics Canada.

Spackman, W., Francis, J. and Goussev, S. (2025). Optimal use of machine learning at scale: Designing quality control of machine learning classification and mitigating misclassification error. Proceedings of the 17th Meeting of the Group of Experts on Consumer Price Indices, UNECE, Geneva, Switzerland. Available at: https://unece.org/sites/default/files/2025-04/Spackman%20et%20al%20%282025%29%20-%20QC%20for%20mitigating%20misclassification%20bias%20-%20paper.pdf.

Statistics Canada (2024). Training. Available at: https://www.statcan.gc.ca/eng/wtc/training.

Statistics Canada (2016). G-Series 2.00.001 User Guides. Internal document, Statistics Canada.

Verret, F. (2021). Statistics Canada’s seasonal adjustment dashboard. Proceedings: Symposium 2021, Adopting Data Science in Official Statistics to Meet Society's Emerging Needs, Statistics Canada, Ottawa, Canada.

5.2 Resource Centre for Economic Statistical Tools and Innovation

The Economic Generalized Systems unit has been reorganized as a section called the Resource Centre for Economic Statistical Tools and Innovation. This new Centre is responsible, among other functions, for the support and development of three generalized systems: G-Sam (generalized sampling system), Banff (generalized system for statistical data editing), and G-Est (generalized system for estimation).

Progress:

Projects in the Centre can generally be categorized as support (for users of the system), research and development.

The Centre handled a typical volume of support cases for G-Sam, Banff, and G-Est. These included support cases for both the current SAS-based versions of the systems, and support for migration to the Python version of Banff. While some cases are resolved by a few email exchanges, several others required more in-depth involvement and included recommendations on appropriate implementation of the system to achieve statistical goals. A summary of these cases is provided below.

The Banff team:

Continued to collaborate with the Job Vacancy and Wage Survey to implement changes in their Edit and Imputation step. Additionally, they provided an open-source version of the process, calling the Python version of Banff, excluding custom SAS steps.
Worked with the tax section on an imputation strategy for the T1 redesign project, using the Python version of Banff.
Consulted with several surveys, on both the economic and household sides, exploring the use of the new Python version. This included demonstrations of the Banff Processor and recommendations on open-source methods that could be integrated within Banff.

The G-Sam team:

Provided support to the Prepared Food and Beverage Sales Survey, Canada’s Core Public Infrastructure (CCPI), and Canadian Survey on Interprovincial Trade on optimizing allocation under various constraints‒sample size, precision, and coordination‒with additional input on rotation and re-stratification.
Presented the G-Sam allocation functionality to CCPI, highlighting its features and potential applications.
Delivered a detailed explanation to the Singapore Department of Statistics on the Lavallée-Hidiroglou stratification algorithm, including its theoretical foundation and practical application in survey design.
Resolved numerical issues with allocation for the Fruits and Vegetables Survey, ensuring effective implementation of sampling strategies.

Most of the research and development in the fiscal year was devoted to Statistics Canada’s Shift to Open Source (SOS) initiative. The team took an active role in updating users, both inside and outside the branch, on the progress for the system. This included presentations to the Generalized Systems Steering Committee, Branch Seminars, Field Planning Board Meetings, and an external article (Gray and Pierre, 2025). The Centre also met with representatives from the Federal Statistical Office of Germany (Destatis) to discuss transitioning Banff to open source and to initiate a collaborative project on developing a framework and tool for assessing imputation methods.

The Banff modernization project was successfully completed within the defined scope, budget, and timeline. Banff is a modular Statistical Data Editing (SDE) system for identifying and treating reporting errors and non-response. In January 2025, the Python-based version was released, enhancing flexibility, supporting advanced imputation methods, and facilitating international collaboration. Banff includes nine procedures performing various SDE functions including outlier detection, error localization, imputation and prorating, and methods for reviewing and imputing data constrained by linear relationships. The Banff Processor is a metadata-driven tool that executes data editing in production, calling built-in Banff procedures alongside custom modules in a user-specified process flow. Users are encouraged to share custom Banff-compatible modules in the Banff Plugin Repository, open to all users, to foster collaboration and reduce duplication. This Python version brings the system in line with the Generic Statistical Data Editing Model (UNECE, 2019). A keynote presentation discussing the new system was given at the United Nations Economic Commission for Europe (UNECE) Expert Meeting on Statistical Data Editing (Gray, 2024). To promote the system, the Banff team delivered a launch seminar (Gray and Seffal, 2025a) and shared lessons learned during the Modern Statistical Methods and Data Science Branch Learning week (Gray and Seffal, 2025b).

The R version of G-Sam has been rebuilt from scratch, retaining core methods from the SAS version while significantly improving performance and expanding functionality. New allocation and selection functions handle a broader range of problems and outperform their SAS counterparts with less code and fewer inputs. Dependencies are minimal, and input structures have been redesigned for better usability. To ease the transition, the team is developing migration tools and guidance. Outputs remain largely consistent, with only minor differences due to the change in optimization engines. A beta version is scheduled for release in June 2025, with the production release planned for September 2025.

The R version of G-Est will offer weighting and variance estimation functions for complex survey designs and include most of the same features as the SAS version. It will support bootstrap replicates, calibration, nonresponse adjustments, small area estimation, and variance due to imputation (via the System for Estimation of the VAriance due to Nonresponse and Imputation, SEVANI, module). Like G-Sam, the implementation has been built from scratch, optimized for R with minimal external dependencies and the potential to add methods in the future. Alpha versions of calibration and one-phase variance estimation are completed, and initial tests show significant performance improvements relative to the SAS version. For small area estimation, the transition team is reviewing an existing package for suitability, while development of the bootstrap replication functions and SEVANI modules are scheduled to begin this spring. A beta version is scheduled for release in December 2025, with the production release planned for December 2026.

For more information, please contact:
Fritz Pierre (fritz.pierre@statcan.gc.ca).

References

Gray, D. (2024). Building the new Banff: An open-source data editing system based on GSDEM concepts. UNECE Expert Meeting on Statistical Data Editing, 7-9 October 2024, Vienna.

Gray, D. and Pierre, F. (2025). De SAS vers les sources libres : conversion des systèmes généralisés de Statistique Canada. To be published in the next issue of Convergence, a journal of the Association des statisticiennes et statisticiens du Québec.

Gray, D. and Seffal, M. (2025a). The New Banff: A Modern Edit and Imputation Platform for Everyone. Internal presentation, Statistics Canada.

Gray, D. and Seffal, M. (2025b). From SAS to Open-Source: What we learned from the Banff project. Internal presentation, Modern Statistical Methods and Data Science Branch Learning Week, Statistics Canada.

United Nations Economic Commission for Europe (UNECE) (2019). Generic Statistical Data Editing Model (GSDEM), https://statswiki.unece.org/display/sde/GSDEM.

5.3 Record Linkage Resource Centre

The objectives of the Record Linkage Resource Center (RLRC) are to provide consultation services to internal and external users of record linkage methods, which includes making recommendations about the software and methods to be used, and collaborative work on record linkage applications. We also facilitate the dissemination of information on record linkage methods, software, and policy as well as the analysis of linked data to interested parties inside and outside Statistics Canada.

Progress:

We continued to support the development team of G-Link, the record linkage system developed at Statistics Canada. The RLRC also offered support to internal and external G-Link users who requested assistance, provided comments or submitted suggestions through requests to the G-Link_info mailbox.

During the year, most of the methodological work focused on maintenance and support for users of version 3.5 of G-Link on SAS servers in cloud computing. A typical volume of support cases for G-Link was processed by the project team. Most of these were resolved with suggestions on how to apply the system in practical terms; however, several required more involvement.

Development work focused on an update of tools for the creation of synthetic linkage-ready data for testing and training purposes. The work was completed in R and Python as part of the move to open-source tools in the agency.

The RLRC has also worked on a variety of other probabilistic linkages in the Social Data Linkage Environment (SDLE). These linkages helped us to analyze the performance of the software and the solutions to be provided. Work on these projects has resulted in more systematic approaches to defining and adjusting record linkages on cloud-based SAS servers. Work was also undertaken in reweighting to compensate for bias introduced by missed links, including work for external clients on Census data linked to multiple longitudinal administrative files.

Members of the team offered formal courses with Statistics Canada’s Training Centre, as well as prepared materials for a workshop at the 2025 Canadian Research Data Centres Network conference and updated reference and strategy documentation for users of probabilistic record linkage techniques.

For more information, please contact:
Abdelnasser Saïdi (abdelnasser.saidi@statcan.gc.ca).

5.4 Data Analysis Resource Centre

The main goal of the Data Analysis Resource Centre (DARC) is to provide advice on the appropriate use of data analysis tools and methods, and to promote best practices in this area. DARC’s services – which focus mainly on survey, census, or administrative data – are available to the employees of the Agency and other departments, as well as to analysts and researchers from academia and Research Data Centers (RDCs).

Progress:

Consultations

Consultation services were provided as requested by internal and external clients. Between April 1, 2024 and March 31, 2025, DARC responded to approximately 50 requests. The questions varied in complexity and included topics such as logistic regression with survey data, variance estimation with bootstrap weights, confidence intervals for small proportions, comparison of dependent subpopulations, chi-square tests, calculation of degrees of freedom for survey data and quantile regression. DARC also helped clients with the implementation of statistical methods in R, SAS, SUDAAN and STATA software. In addition, DARC reviewed analytical papers for scientific journals, conferences and for internal publication.

Provision of Training

DARC presented, in English, the internal course 0438A “Statistical Analysis of Survey Data – Module 1”. This six-day course is a mix of theory and practice. Exercises and examples were presented using R, SUDAAN and SAS code.

DARC presented at Statistics Canada’s Data Interpretation Workshop on data analysis with complex survey data, in English and French. DARC again presented the sessions on linear regression with complex survey data of the Statistical Modelling Course at Statistics Canada, in English. DARC also gave the seminar for recruits on analysis of data from a complex survey.

DARC, in collaboration with the Health Analysis Division (HAD), developed and delivered a 12-hour course titled “Surviving the Transition to R for Survey Analysts.” This hands-on training was provided to survey analysts within HAD and included practical examples and exercises focused on data wrangling, data analysis using the R survey package, and data visualization.

Collaboration

DARC collaborated in developing measurement strategies for the Workplace Mental Health Performance Measurement Project with the Treasury Board Secretariat (TBS). This project used data from the 2019, 2020 and 2022 cycles of the Public Service Employee Survey (PSES) to measure latent variables like psychological risk factors, behaviors, etc. and to calculate factor scores for different levels of aggregation. The factor scores developed for this project were used to create the Federal Public Service Workplace Mental Health Dashboard: Mental Health Dashboard- Canada.ca (tbs-sct.gc.ca). The measurement models were developed using factor analysis and structural equation modelling as discussed by Blais, Mach, Michaud and Simard (2020) and Blais, Michaud, Simard, Mach and Houle (2021). This year, DARC estimated the variance of the factor scores, enabling the calculation of coefficients of variation and confidence intervals for all domains presented in the dashboard.

For more information, please contact:
Pierre-Olivier Julien (pierre-olivier.julien@statcan.gc.ca) or
Isabelle Michaud (isabelle.michaud@statcan.gc.ca).

References

Blais, A.-R., Mach, L., Michaud, I. and Simard, J.-F. (2020). Analysis of the Public Service Employee Survey Items as Measures of the Psychosocial Risk Factors. Presentation to the Workplace Mental Health Performance Measurement Steering Committee, October 7, 2020.

Blais, A.-R., Michaud, I., Simard, J.-S., Mach, L. and Houle, S. (2021). Measuring workplace psychosocial factors in the federal government. Health Reports, 32, 12.

5.5 Centre for Confidentiality and Access

The methodology group responsible for confidentiality and access methods continued to offer consultation and support services to internal and external partners on the various access solutions and disclosure avoidance strategies.

Anonymization

The confidentiality support group continued to offer its expertise in the understanding and development of ideas related to de-identification and anonymization. Statistics Canada is continuing to enhance its own internal strategies to ensure that internal information is de-identified whenever possible to minimize risks of disclosure.

Statistics Canada has contributed to a set of fact sheets that will be published by the Information and Privacy Commissioner of Ontario.

The Centre for Confidentiality and Access continues to offer governance, guidance and strategic advice in producing open datasets. This year, over 15 new Public Use Microdata Files were reviewed and made available on the Statistics Canada website.

External consultation

Statistics Canada has continued to offer its expertise to several groups internationally and within Canada.

Internationally, Statistics Canada continued to participate in a Pilot study between Canada (StatCan), France (Institut national du cancer), and the US (National Cancer Institute) meant to study the ability of sharing cancer data between nations.

Domestically, we are continuing to work with Health Canada in providing non-disclosive tables of daily mortality counts for input into their Air Quality Health Index Project (AQHI). Rather than conventional rounding techniques, noise addition methods are being proposed that aim to give better utility for the same level of disclosure risk.

Statistics Canada has offered its expertise with disclosure control strategies with Correction Services Canada, analysts for the National Inuit Health Survey, Health Canada/Public Health Agency of Canada in their release of information from Canada Vigilance Adverse Reaction Online Database, and Bank of Canada with their Canadian Survey of Consumer Expectations.

For more information, please contact:
Steven Thomas (steven.thomas@statcan.gc.ca).

5.6 Support and Research Activities in Artificial Intelligence

The Methodology Research and Development Program (MRDP) at Statistics Canada has supported many activities for the Centre of Artificial Intelligence, Research and Excellence (CAIRE) and the Artificial Intelligence and Methods Division (AIMD). The support from the MRDP has enabled many in depth research, prototype, community, centre of expertise and guidelines that can profit the agency.

Activities, Mandates and Products:

The funding enabled the Natural Language Processing (NLP) Centre of Expertise (COE) that is mandated to centralize resource for knowledge sharing and capacity building in text analytics using machine learning and create, maintain, and promote best practices and guidelines in text analytics. Their activities comprise of providing reviews, consultation, and guidance to NLP practitioners within Statistics Canada, as well as creating a list of realized and on-going NLP projects across the agency.

Another community that was funded by the MRDP deals with Privacy Enhancing Technologies (PET). The support enabled the launching of the community of practice which is still ongoing. In this era of privacy, the public servant that will require PET will be able to seek and find information on the latest techniques, reach out to the PET team at Statistics Canada, and eventually find out lectures and articles as the site evolves. The support of the MRDP also enabled the launching of the Responsible PET team, which will develop guidelines and create a review committee for any internal or external PET project (work is in progress).

Aside from the communities and centres, the funding enabled a continuation of an in-depth study that involves leveraging state-of-the-art Denoising Diffusion Probabilistic Models (DDPMs) to generate large volumes of synthetic tabular data that closely mimic real-world data distributions. The project integrates a debiasing process, inspired by Prediction-Powered Inference (PPI) and the generalized difference estimator. This proposed solution is now being tested for implementation, which could save time, efforts, and money when conducting sample surveys.

The second promising research project funded by the committee is an investigative study aimed at creating evaluation metrics for generative Artificial Intelligence (AI), i.e., a framework for Large Language Models. Unlike classical problems such as classification or regression with established metrics (such as the Root Mean Squared Error, RMSE, or F1 score), generative AI lacks standardized evaluation methods; the ones that are used in practice vary based on specific tasks and have reliability issues. A designed evaluation pipeline framework was created to improve usability and reproducibility while offering practitioner-friendly tools and artifacts that do not require deep theoretical knowledge.

Other research was enabled through the MRDP, which highlights its importance in supporting AI research, modernization of methodology and the government AI community at large.

For more information, please contact:
Marie-Eve Bedard (marie-eve.bedard@statcan.gc.ca) or
Nabila Ould-Brahim (nabila.ould-brahim@statcan.gc.ca).

5.7 Questionnaire Design Resource Centre

The Questionnaire Design Resource Centre (QDRC) is a focal point of expertise at Statistics Canada for questionnaire design and evaluation. The QDRC provides consultation and support services and carries out projects and research related to the development, testing and evaluation of survey questionnaires. The QDRC plays a very important role in quality management and responds to program requirements throughout Statistics Canada by consulting with clients, respondents and data users, and by pre-testing survey questionnaires.

While much of the QDRC’s work is carried out on a cost-recovery basis, the section is frequently approached on an ad hoc basis for expert reviews and consultation services on a wide variety of surveys. The group also offers courses on questionnaire design.

Progress:

The QDRC conducted many reviews of survey questionnaires. While most of these involved Statistics Canada questionnaires, several were conducted for surveys being done by other government organizations such as Public Works and Government Service Canada, Canada Energy Regulator, Public Services and Procurement Canada and others.

The group also contributed to various corporate consultation initiatives.

For more information, please contact:
Jeremy Solomon (jeremy.solomon@statcan.gc.ca).

5.8 Quality Assurance Resource Centre

The Quality Assurance Resource Centre (QARC) is committed to advancing research and development in statistical methods that enhance quality assurance and control processes. Our primary objective is to raise the standards of survey data collection and processing operations within the bureau. To achieve this objective, we explore a range of methodologies with a particular focus on improving the outgoing quality of data.

At the core of our efforts is the provision of methodological services for G-Code‒a generalized system developed at Statistics Canada for creating coded databases and implementing machine learning algorithms in data processing. Our research spans a wide array of quality assurance and control practices, addressing challenges related to efficiency and automation. The insights gained from this research not only benefit our internal operations but also have broad applicability across various stages of survey processes.

Progress:

The methodological support team assisted the G-Code development team and monitored user inputs to identify potential improvements for G-Code. Additionally, QARC extended support to both internal and external G-Code users whenever help, comments, or suggestions concerning G-Code were required.

Throughout the year, QARC has been dedicated to implementing an innovative methodology called “Quality Control by Score” to enhance the quality control (QC) of machine learning (ML) text coding processes. As ML technology becomes increasingly integral to data processing, maintaining the quality of generated codes is more important than ever. To meet this challenge, Statistics Canada has actively pursued a strategy for determining optimal QC sampling rates, leveraging scores derived from the ML process. This methodology promotes a responsible approach to data classification while facilitating the broader adoption of machine learning. Our goal is to apply this approach to QC across various classifications within key surveys, including the Labour Force Survey (LFS), Job Vacancy and Wage Survey (JVWS), Canadian Community Health Survey (CCHS), and the Statistical Business Register (SBR). A paper detailing this methodology was presented at the Advisory Committee on Statistical Methods (Oyarzun, Wile and Evans, 2023).

As well, QARC assisted LFS in updating its data to align with the latest industry and occupational classifications, which involved significant structural changes. Traditionally, split-off classes were handled through manual recoding and random allocation. A new hybrid framework, combining machine learning (fastText) with linear programming, was developed to improve efficiency while maintaining consistency with traditional estimates. This work was presented at the Statistics Canada Symposium 2024 (Evans and Wile, 2024).

For more information, please contact:
Javier Oyarzun (javier.oyarzun@statcan.gc.ca).

References

Evans, J. and Wile, L. (2024). Life in the FastText Lane: Harnessing linear programming constrained machine learning for classifications revision. Proceedings: Symposium 2024, The Future of Official Statistics, Statistics Canada, Ottawa, Canada.

Oyarzun, J., Wile, L. and Evans, J. (2023). Quality control by score. Paper presented to the Advisory Committee on Statistical Methods, October 2023, Statistics Canada.

5.9 Data Ethics Secretariat

The role of the Data Ethics Secretariat is to implement the Necessity and Proportionality Framework. Concretely, the Data Ethics Secretariat conducts ethical reviews on new data acquisitions via survey or other sources, and new data uses such as microdata linkages. The work has recently been expanded to include reviews related to Machine Learning and Artificial Intelligence. The purpose of these ethical reviews is to ensure responsible use of data throughout the data lifecycle. The Data Ethics Secretariat raises ethical considerations, holds discussions with program managers and makes recommendations to the Principal Data Ethics, Quality, and Scientific Integrity Officer. The Data Ethics Secretariat also supports the internal Data Ethics Committee and has a capacity building role.

Progress:

On top of conducting over 150 ethical reviews over the past year alone, members of the Data Ethics Secretariat have given numerous presentations to inform internal partners, and colleagues from other federal departments and international organizations about Statistics Canada’s approach to assessing data ethics. The team gathers information to remain up to date on topics which could be perceived as sensitive by the public. This is done by conducting literature reviews on some targeted topics and having informal discussions with internal partners, such as Communications and the Questionnaire Design Resource Centre, counterparts from other federal departments or National Statistical Offices around the world.

In addition to its internal activities, the team is also active internationally, playing a leadership role in the United Nations Economic Commission for Europe (UNECE) Task Team on Ethical Leadership. The main objective of this task team is to write a reference book on ethics for National Statistical Organizations. Work on this reference book was completed in 2025.

For more information, please contact:
Ryan Chepita (ryan.chepita@statcan.gc.ca).

5.10 Quality Secretariat

The Quality Secretariat’s mandate includes designing and managing quality management studies and responding to requests for quality management information or assistance from Statistics Canada’s various programs or other organizations.

PROJECT: Capacity building with internal, national and international partners

The Quality Secretariat’s objective is to provide advice and undertake capacity-building measures internally, with national partners (other departments or other organizations) and international partners, primarily by giving a general overview of Statistics Canada’s quality management practices and official quality-related documents (the Quality Assurance Framework and the Quality Guidelines) and by providing quality management support services.

Progress:

The Quality Secretariat undertook capacity building for many partners during the reporting period. Internally, training was offered through various courses for staff. At the national partner level, a presentation on quality management practices in relation to the Generic Statistical Business Process Model was made to the First Nations Information Governance Centre.

Discussions occurred within the Government of Canada Enterprise Data Community of Practice Data Quality Working Group. This working group, co-chaired by Statistics Canada, released an abridged data quality called Guidance on Data Quality in January 2024. Since its relaunch in autumn 2024, the group has aimed to identify gaps in quality governance by conducting an environmental scan on topics such as metadata, open data, and quality reporting.

At the international level, the Quality Secretariat met with the Abu Dhabi Statistics Centre to provide counsel on data quality dashboards, while continuing its involvement with the United Nations Expert Group on National Quality Assurance Frameworks. Statistics Canada served as the co-chair of the Subgroup on administrative and other data sources, whose purpose was to prepare a module for quality assurance when using administrative and other data sources to produce Official Statistics. This module, released in early 2025, aims to provide practical and concise guidance as well as best practices for statistical agencies in assuring the quality of official statistics when alternative data sources are used to produce official statistics; it is to be used as a complement to the United Nations National Quality Assurance Framework Manual for Official Statistics (United Nations, 2019).

PROJECT: Triennial Program Reviews

As part of Statistics Canada’s Triennial Program Reviews, the Quality Secretariat designed a self-assessment questionnaire which was completed by the seven programs that were in scope for the 2024-2025 reviews. The questionnaire allowed each program to gauge their quality preparedness and current best practices as they pertain to their culture and Statistics Canada’s six dimensions of quality. The Quality Secretariat will continue to play a key role in these reviews as their scope expands in 2025-2026.

For more information, please contact:
Ryan Chepita (ryan.chepita@statcan.gc.ca).

Reference

United Nations (2019). United Nations National Quality Assurance Frameworks Manual for Official Statistics. Available at: https://unstats.un.org/unsd/methodology/dataquality/un-nqaf-manual/.

Date modified:: 2025-10-10

Language selection

Search and menus

Search

Statistical Methodology Research and Development Program Achievements, 2024/2025
5. Support (Resource Centres)

5.1 Time Series Research and Analysis Centre

5.2 Resource Centre for Economic Statistical Tools and Innovation

5.3 Record Linkage Resource Centre

5.4 Data Analysis Resource Centre

5.5 Centre for Confidentiality and Access

5.6 Support and Research Activities in Artificial Intelligence

5.7 Questionnaire Design Resource Centre

5.8 Quality Assurance Resource Centre

5.9 Data Ethics Secretariat

5.10 Quality Secretariat

Statistical Methodology Research and Development Program Achievements, 2024/2025 5. Support (Resource Centres)

5.1 Time Series Research and Analysis Centre

5.2 Resource Centre for Economic Statistical Tools and Innovation

5.3 Record Linkage Resource Centre

5.4 Data Analysis Resource Centre

5.5 Centre for Confidentiality and Access

5.6 Support and Research Activities in Artificial Intelligence

5.7 Questionnaire Design Resource Centre

5.8 Quality Assurance Resource Centre

5.9 Data Ethics Secretariat

5.10 Quality Secretariat

Note of appreciation

Standards of service to the public

Copyright

Statistical Methodology Research and Development Program Achievements, 2024/2025
5. Support (Resource Centres)