Statistical Methodology Research and Development Program Achievements, 2020/2021
4. Support (Ressource Centre)

4.1 Record Linkage Ressource Centre

The objectives of the Record Linkage Resource Centre (RLRC) are to provide consulting services to both internal and external users of record linkage methods, including recommendations on software and methodology and collaborative work on record linkage applications, to evaluate alternative record linkage methods and develop improved methods. We evaluate software packages for record linkage and, where necessary, develop prototype versions of software incorporating methods not available in existing packages and assist in the dissemination of information concerning record linkage methods, software and applications to interested persons both within and outside Statistics Canada.

Progress:

We continued to provide the development team of G-Link with support and follow up on any raised issues. The RLRC also provided internal and external G-Link users with support when help/comments/suggestions regarding G-Link were sought through requests at G-Link_info.

During the year, much of methodology’s work revolved around the development and the support of users of the new G-Link version (Version 3.5), which included the addition of profile-based linkage, identification and treatment of orphan records and integrated pseudokeys. Additionally, work has been done to explore record linkage in the cloud using the platform Databrics and to integrate a clerical review tool (quality assessment) as well as correct and improve some threshold estimators.

The RLRC also worked on a variety of other record linkage-related projects during the year, including holding more instances of the Record Linkage Forum.

Our record linkages helped us document performance and issues pertaining to management and developers and was used as an opportunity to field test new G-Link 3.5 features and develop more systematic and theoretically coherent approaches of defining and adjusting record linkages under servers and SAS Grid. The RLRC updated the tutorial and the user guide of G-Link 3.5.

For more information, please contact:
Abdelnasser Saïdi (613-863-7863, abdelnasser.saidi@statcan.gc.ca).

4.2 Generalized Systems

The Economic Generalized Systems team is responsible for the support and development of four Generalized Systems, namely G-SAM – the generalized sampling system, BANFF – the generalized system for edit and imputation, G-EST – the generalized system for estimation and G-SERIES- the generalized system for time series techniques.

Progress:

A large volume of support cases were processed by the project team, mainly on G-EST, BANFF and G-SAM.  Most of these were resolved with suggestions on how to apply the systems properly. However, several required more involvement. Two new versions of the generalized systems were released this year. G-EST 2.03.002 was released including performance improvements for calibration and parallel processing for sampling variance (Statistics Canada, 2020). BANFF version 2.08 was also released including improvements to the BANFF processor and bug fixes to improve error localization (Statistics Canada, 2021).

Two support cases were identified that involved processing issues with large complex datasets in applying SEVANI (part of G-EST) to estimate variance due to imputation. Consequently, development work was done to build a prototype to allow for unnecessary calculations to be omitted for specific cases, and for parallel processing. These prototypes are being tested and will be part of a future release of G-EST.

The development of ImpACT, a system to visualize imputation was continued, and the system was used to evaluate imputation strategies from two Statistics Canada projects. This work was presented to the Scientific Review Committee of the Modern Statistical Methods Branch (Gray, 2020a). Expansion of this tool to include further visualisations, and application to other programs is planned for the future.

Members of the team participated in training through formal courses with Statistics Canada’s training institute, as well as seminars for recently recruited statisticians and other ad hoc presentations to analysts and other organisations. The team also contributed to the organisation of the UNECE data editing workshop, and presented work on the ImpACT tool (Gray, 2020b). This presentation generated interest from international colleagues and is expected to lead to collaborations.

A major initiative for the generalized systems is a long-term planning exercise to consider the upcoming evolution of the systems, including the possibility of using open-source tools, inclusion of modern methods, and revisiting the approach to manage future developments. As part of this work, a number of over-arching requirements for Generalized Systems were outlined (Matthews, 2020a) and work was begun to define the business requirements for each individual system in the short, medium and long term. These requirements will feed directly into the long-term evolution plan for each system that will be elaborated over the next year in partnership with the Informatics Technology team (Matthews, 2021a).

For more information, please contact:
Steve Matthews (613-854-3174, Steve.Matthews@statcan.gc.ca).

4.3 Questionnaire Design Ressource Centre

The Questionnaire Design Resource Centre (QDRC) is a focal point of expertise at Statistics Canada for questionnaire design and evaluation. The QDRC provides consultation and support services, and carries out projects and research related to the development, testing and evaluation of survey questionnaires. The QDRC plays a very important role in quality management and responds to program requirements throughout Statistics Canada by consulting with clients, respondents and data users and by pre-testing survey questionnaires.

While much of the QDRC ’s work is carried out on a cost-recovery basis, the Center is frequently approached on an ad hoc basis for expert reviews and consultation services on a wide variety of surveys. The group also offers courses on questionnaire design.

Progress:

The QDRC conducted many reviews of survey questionnaires. While most of these involved Statistics Canada questionnaires, several were conducted for surveys being done by other government organizations such as the Bank of Canada, the Office of the Superintendent of Bankruptcy, the Canadian Office of the Ombudsman for Responsible Enterprise, Global Affairs Canada and others.

In response to the very quick questionnaire development schedules for new COVID-19 related surveys, the QDRC developed and implemented a new qualitative research panel that allowed for rapid testing of these new data collection tools.

The QDRC continued to experiment with mixed method research. As well some research and experimentation began with asynchronous qualitative methods. The group also contributed to various corporate consultation initiatives.

For further information, please contact:
Paul Kelly (613-371-1489, paul.kelly2@statcan.gc.ca).

4.4 Quality Secretariat

The Quality Secretariat’s mandate includes designing and managing quality management studies and responding to requests for quality management information or assistance from Statistics Canada’s various programs or other organizations.

SUB-PROJECT: Capacity building with internal, national and international partners

The Quality Secretariat’s objective is to provide advice and undertake capacity-building measures internally, with national partners (other departments or others) and international partners, primarily by giving a general overview of Statistics Canada’s quality management practices and official quality-related documents (the Quality Assurance Framework and the Quality Guidelines) and by providing quality management support services.

Progress:

The Quality Secretariat undertook capacity building for many partners during the reporting period. Internally, training workshops were offered through various courses for staff. At the national partner level, formal presentations on quality management practices were made to three organizations, in addition to holding a number of workshops and seminars. Material on data quality and good quality management practices was provided to Statistics Canada's Data Literacy Training Initiative. Discussions occurred within the Data Governance Standardization Collaborative and the Data Quality Working Group. The latter group, co-chaired by Statistics Canada, aims to define a data quality framework applicable to all Government of Canada organizations as part of the implementation of the Data Strategy. The validation of the quality of a statistical process carried out as well as the validation of the quality of a data source used by another federal agency has also been completed. At the international level, involvement as the United Nations Expert Group on National Quality Assurance Frameworks continued in preparation for the implementation of the United Nations National Quality Assurance Framework Manual for Official Statistics (United Nations, 2019).

SUB-PROJECT: Quality indicators for statistics from integrated data.

Details on this sub-project can be found in Section 3.3 (Quality indicator research).

For more information, please contact:
Martin Beaulieu (613-854-2406, martin-j.beaulieu@statcan.gc.ca).

Reference

United Nations (2019). United Nations National Quality Assurance Frameworks Manual for Official Statistics. https://unstats.un.org/unsd/methodology/dataquality/un-nqaf-manual/

4.5 Quality Assurance Resource Centre

The objectives of the Quality Assurance Resource Centre (QARC) are to conduct research and development activities on statistical methods of quality assurance and control, aimed at improving the outgoing quality of survey data collection and processing operations within the bureau. This includes offering methodological services for G-Code which is used at Statistics Canada to create coding databases for data processing. Research on quality assurance and control is often generic in nature and involves issues of efficiencies and automation that are frequently applied to many steps of survey operations.

Progress:

The methodological support team helped the development team and tracked user inputs to help identify ideas for potential improvements for G-Code. The QARC also provided internal and external (international users) G-Code users with support when help/comments/suggestions regarding G-Code was needed.

During the year, work revolved around the implementation of a new version of G-Code (Version 3.2), which included the addition of machine learning capabilities (XgBoost and FastText). The QARC team has been involved in a coding and classification proof of concept looking at the integration of the FastText algorithm into our Generalized Coding tool (G-CODE). The new algorithms have been widely used to code industry and occupation for the Business Register, the Labour Force Survey and many other programs/surveys (including Census of Population). Additionally, these new functionalities have been presented to external agencies (Australian Bureau of Statistics and Statistics New Zealand). Lately, the QARC team has been helping with the integration of Pytorch into G-Code. PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab.

The QARC also worked on implementing a variety of quality controls on the coding processes for the Labour Force Survey, Canadian Community Health Survey, Job Vacancy and Wage Survey and the Business Register.

For further information, please contact:
Javier Oyarzun (613-302-8454, javier.oyarzun@statcan.gc.ca).

4.6 Data Analysis Resource Centre and consultation

The main goal of Data Analysis Resource Centre (DARC) is to give advice on the appropriate use of data analysis tools and methods, and to promote best practices in this area. DARC’s services - which focus mainly on survey, census or administrative data - are available to the employees of the Agency or other departments, as well as to analysts and researchers from academia or the Research Data Centres (RDCs).

Progress:

Consultations

Consultation services were provided as requested by internal and external clients. The questions varied in complexity and included use of survey bootstrap weights, p-values, constructing confidence intervals and testing hypotheses with survey data, estimating effect size with survey data, analysis with linked data, fitting logistic regression, etc. We also helped our clients with implementation of methods in SUDAAN, SAS, STATA and R software.

Provision of Training and Training Material

DARC developed and presented Session 1 of Statistical Modelling Course, entitled “Statistical Modelling with Complex Survey Data – Linear Regression” (Mach and Michaud, 2020). This new course was offered virtually by the Statistical Talent Development Working Group to employees of Modern Statistical Methods and Data Science Branch of Statistics Canada.

The document “Data Visualization - Best practices” (DARC, 2020a) was finalized and is available internally in both official languages. It is intended to be a tool for Statistics Canada analysts and dissemination teams.

Collaboration

We collaborated in developing measurement strategies for three projects: i) Workplace Mental Health Performance Measurement Project, ii) Beyond2020 Public Service Renewal Index, and iii) Accessibility Strategy for the Public Service of Canada.

All three projects use data collected by the Public Service Employee Survey (PSES) to measure latent variables like psychological risk factors, behaviors, etc. The measurement models were developed using factor analysis and structural equation modelling as discussed by DARC (2020b) and Blais et al. (2020 and 2021).

For further information, please contact:
Harold Mantel (613-863-9135, harold.mantel@statcan.gc.ca).

4.7 Time Series Research and Analysis Centre

The objective of the time series research is to maintain high-level expertise and offer needed consultation in the area, to develop and maintain tools to apply solutions to real-life time series problems as well as to explore current problems without known or acceptable solutions.  
The projects can be split into various sub-topics with emphasis on the following:

 

SUB-PROJECT: Consultation and training in Time Series Methods

The Time Series Research and Analysis Centre is responsible for developing and delivering training on time series methods including seasonal adjustment, reconciliation and time series modelling to participants internal to Statistics Canada as well as those from other agencies. In addition, the Centre provides guidance and consultation on time series projects in general for Statistics Canada, including requests coming from within and outside of the agency.

Progress:

With the sudden change to remote working, a reduced number of courses were delivered. However, those that were delivered required modifications to the content and organisation of the courses to accommodate shorter but more frequent sessions to deliver the material. Specifically, courses on benchmarking (H-0436), raking (H-0437) and seasonal adjustment (H0434) were delivered during the year via the Statistics Canada training centre (Statistics Canada, 2021). In addition, a course on time series modelling and forecasting (H-0433) was delivered to participants from Immigration, Refugees and Citizenship Canada via a remote format. Members of the Centre also participated in outreach and training to other groups in Statistics Canada on time series topics as part of training for recent recruits and financial officers.

The Centre also participated in review of papers for refereed journals related to the COVID-19 pandemic, in particular on forecasting of case counts for selected countries, and informal discussions with statisticians involved in statistical modelling related to the pandemic. Members of the Centre also participated in the United Nations Economic Commission for Europe - High Level Group on Machine Learning, contributing to a comparison of machine learning and traditional forecasting methods (Picard, 2020). The Centre also reviewed the methodology behind several new statistical products released by Statistics Canada, namely the provincial economic indicator (Statistics Canada, 2021b), and excess mortality estimates (Statistics Canada, 2020) to validate time series features in the development of these indicators.

SUB-PROJECT: Support and Enhancement of Time Series Processing System

The Time Series Processing System is a customizable SAS-based application to apply time series techniques including file validations, application of revision strategies as well as seasonal adjustment and reconciliation techniques used extensively in the production of seasonally-adjusted estimates for mission critical programs within Statistics Canada. Developed over 10 years ago, the system is in a fairly mature and stable state. However, it requires updating on an ongoing basis to broaden functionality and address new needs of programs in the agency. In the longer term, a new version of the system may need to be developed to support processing and allow flexibility to use new techniques available from open-source software.

Progress:

Minor fixes were applied to version 3.08 of the Time Series Processing System (TSPS) described in Ferland (2019) as well as to a number of supporting tools for analysis and system development. Notably, a tool that is used as a key component in quality assurance was expanded to include improved diagnostics on residual seasonality and relative roughness statistics at different stages in the seasonal adjustment process to allow for more efficient trouble-shooting and development of solutions.

Investigations were made to evaluate tools available to apply benchmarking through open-source tools, including those available in R. A comparison was made between the current functionality of G-Series, and an R package called tempdisagg, in terms of available methods and their parametrization (Picard, 2021).  Statistics Canada joined the Seasonal Adjustment Centre of Excellence of Eurostat as a partner organization to participate in discussions and development of related tools.

SUB-PROJECT: Development and Support for Seasonal Adjustment and Trend-Cycle Estimation

Analysis and evaluation of new methods and techniques for seasonal adjustment as well as consultation and centralization of expertise in applying seasonal adjustment.

Progress:

The Time Series Research and Analysis Centre provided extensive support for seasonal adjustment in order to ensure the quality of results during the unprecedented economic shocks due the COVID 19 pandemic. Many exchanges were held with representatives from national statistical offices, either via email exchanges, one-on-one conversations, or videoconferences in small groups. The exchanges included members from Eurostat, the Office for National Statistics, the United States Census Bureau and the Bureau of Labor Statistics, Statistics Norway, INSEE (Institut national de la statistique et des études économiques), Statistics Israel, the Australian Bureau of Statistics, Statistics New Zealand and others. These exchanges were extremely valuable to compare and contrast approaches for the short and long term, and were almost universally in line with the approach suggested by Eurostat in Eurostat (2020). The Centre shared findings and recommendations from these consultations with relevant contacts within the agency through periodic updates to ensure that the information was widely available.

For the seasonal adjustment of programs that are directly supported by the Centre, a quality assurance strategy for seasonal adjustment was developed and implemented including increased communication with the subject-matter analysts to ensure that important information was shared with subject-matter experts to provide guidance on seasonal adjustment. In addition, the strategy for annual review of seasonal adjustment parameters was developed for each program, taking into account the timing of the review, as well as the extent of shocks in the time series. This overall strategy was documented and shared with other seasonal adjustment practitioners during a round-table discussion organized by the government statistics section of the American Statistical Association, see American Statistical Association (2021).

The Time Series Research and Analysis Centre also provided extensive consultations on seasonal adjustment within the agency for programs not formally supported by the team. Consultations were provided to groups producing seasonally adjusted statistics on trade in services, tourism, trade and exporter characteristics, the business register entry and exits counts, and various components within the system of national accounts. Explorations were done to seasonally adjust statistics from other programs.  Notably, statistics on railcar loadings were analysed and scenarios to produce seasonally adjusted estimates for some high-level time series were suggested. A similar analysis was done for electricity generation. In both cases, many seasonal series were identified and a decision is pending on whether the seasonally adjusted estimates will be produced for dissemination.

Continued progress was made on making the Seasonal Adjustment Dashboard available to analysts for individual programs, with two mission critical surveys now able to access the tool to understand and explain seasonally adjusted results, and a plan to increase this to four more mission critical surveys in the first quarter of the coming fiscal year. The dashboard is presented in Matthews (2019).

As well, trend-cycle estimation methods were evaluated in the context of the COVID-19 pandemic. In particular, given the sharp nature of the economic shocks, the trend-cycle may present an overly smooth impression of the economy so a number of alternative measures were identified including breaking the series at a particular point, and introducing outlier effects to model shocks more directly. These methods will be further evaluated taking into account their advantages and disadvantages for potential application in some programs.

SUB-PROJECT: Modelling and Forecasting, particularly in the context of real-time estimation

Details on this sub-project can be found in Section 1.2 (Real-time estimation via time series methods).

For more information, please contact:
Steve Matthews (613-854-3174, Steve.Matthews@statcan.gc.ca).

References

Matthews, S. (2019). De-Mystifying Seasonal Adjustment: A visual tool to understand the process. Presentation to the Seasonal Adjustment Practitioners Workshop, United States Census Bureau.

Eurostat (2020). Guidance On Time Series Treatment In The Context Of The Covid-19 Crisis. https://ec.europa.eu/eurostat/documents/10186/10693286/Time_series_treatment_guidance.pdf.

Ferland, M. (2019). What’s new in the Time Series Processing System – v3.08. Internal Document, Statistics Canada.

Statistics Canada, (2020). Excess mortality in Canada during the COVID-19 pandemic. https://www150.statcan.gc.ca/n1/pub/45-28-0001/2020001/article/00076-eng.htm.

Statistics Canada (2021). Workshops, training and references. https://www.statcan.gc.ca/eng/wtc/training.

Statistics Canada (2021b). Experimental indexes of economic activity in the provinces and territories, December 2020. https://www150.statcan.gc.ca/n1/daily-quotidien/210413/dq210413d-eng.htm.

4.8 Confidentiality

Part of Statistics Canada’s role and responsibility continues to be outreach and support for confidentiality strategies. Some of the activities carried out throughout the year include consultation with Employment and Social Development Canada (ESDC) on Pay Transparency Statistics, presentation to the G20 DGI-2 workshop on access strategies, and Confidentiality workshops for Health Canada and ESDC.

Other research and development activities related to this topic are described in section 2.

For more information, please contact:
Steven Thomas (613-882-0851, Steven.Thomas@statcan.gc.ca).

4.9 Data Science Communities of Practice

SUB-PROJECT: Machine Learning Community of Practice

The Statistics Canada Machine Learning Community of Practice has the goal of facilitating collaboration and knowledge transfer as well as improving our machine learning operations at Statistics Canada.

Through various activities pertaining to machine learning bringing together 50 to 80 people, such as lunch-and-learns, presentations, reading groups, viewing groups and information-sharing on a site developed and updated by the members, the Community is, through its active presence, still collaborating in the development of the machine learning capacities of Statistics Canada’s employees.

Progress:

Despite remote work, the Community organized a number of presentations on several areas of machine learning, such as a series of four presentations on machine learning applications developed by Statistics Canada to help frontline agencies assess and prepare for COVID-19 spread scenarios. The Community also continued the activity of viewing free online machine learning courses, enabling participants to discuss the topics covered after the viewing. The Community has updated the list of existing methodology projects that explore or use machine learning and continues to collaborate with the other communities of practice, including a new collaboration with the Citizen Development Working Group. Finally, the Community continues to provide information to its members on a variety of relevant external activities related to data science.

SUB-PROJECT: Machine Learning Text Analysis Community of Practice (CoP)

Machine Learning Text Analysis CoP is a centralized inter-departmental place for practitioners of various expertise to discuss practical applications of Natural Language Processing (NLP). Various practitioners across GoC come together to learn, discuss and adopt ethical applications of NLP. Monthly meetings bring together about 50-60 attendees to share each other’s solutions and problems. About half of the participants are from federal departments outside Statistics Canada.

Progress:

Throughout 2020-2021, practitioners from different fields of Statistics Canada or from other department such as the Office of the Superintendent of Financial Institutions, Canada Border Services Agency, Immigration, Refugees and Citizenship Canada or Canada Revenue Agency presented their high-quality NLP solutions. Each presenter illustrated his or her modern methodology to quickly process their data source whether that be survey data, administrative data and public reports. Discussions after the presentation broke down the complex concepts for attendees to comprehend. The attendees came from over 20 different departments.

COVID-19 and the move to teleworking was an unexpected setback but we capitalized this as an opportunity to strengthen our teleconferencing framework. This invited a more inclusive inter-departmental community.

For more information, please contact:
Yanick Beaucage (613-854-2397, Yanick.Beaucage@statcan.gc.ca).


Date modified: