Statistical Methodology Research and Development Program Achievements, 2017/2018
2. Support Activities

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

2.1 Record Linkage Resource Centre (RLRC)

The objectives of the Record Linkage Resource Centre (RLRC) are to provide consulting services on record linkage methods to internal and external users, including recommendations on software and methods and collaborative work on record linkage applications. Our mandate is to evaluate various record linkage methods and software packages and, where necessary, develop prototype versions of software incorporating methods not available in existing packages. We also assist in the dissemination of information about record linkage methods, software and applications to interested persons both at and outside Statistics Canada.

Progress:

We continued to support the development team of G-Link and worked jointly to potential sources of current/past fixes/bugs/improvements for G-Link. The RLRC also provided support to internal and external G-Link users who sought help or provided comments or suggestions.

Mixmatch exclusion and conversion table prototypes were integrated into Version 3.4 of G-Link and were tested on linkage projects of various sizes by the RLRC. In order to improve the Fellegi and Sunter classification and to reduce the burden of a manual review, the RLRC developed and tested a series of SAS macros to generate automated weight thresholds using various techniques (unsupervised automatic learning technique called K-mean automatic classification, two-phase automatic learning technique (K-mean followed by Probit modelling), a technique founded on the theory of extreme values to measure tail ranges, and finally, two techniques based on the linked and unlinked weight distributions of the profiles). Our record linkages using data from the project on the “tobacco court case” helped us to analyze software performance and the solutions to provide. The work on these data has contributed to developing more systematic and theoretically more coherent approaches for defining and adjusting record linkages on servers and on the SAS Grid platform.

The new alpha version of G-LINK 3.4 boasts many improvements, including an interface for loading data and conversion, exclusion and look-up tables. The RLRC contributed to the development of the G-LINK 3 user guide.

The RLRC assessed the SAS enterprise miner version in local mode to study the implementation of unsupervised (K-mean) and supervised (support vector machine, decision trees, etc.) automatic learning algorithms in order to classify pairs into two groups (true pairs and true non-pairs).

The inventory of record linkages done by the Methodology Branch was updated in 2017 and the results were presented.

For further information, please contact:
Abdelnasser Saïdi (613-863-7863; abdelnasser.saidi@canada.ca)

2.2 Time Series Research and Analysis Centre (TSRAC)

The objective of the time series research is to maintain high-level expertise and offer needed consultation in the area, to develop and maintain tools to apply solutions to real-life time series problems as well as to explore current problems without known or acceptable solutions.

The projects can be split into various subtopics with emphasis on the following:

Progress:

Consultation in time series

As part of the Time Series Research and Analysis Centre (TSRAC) mandate, consultation was offered as requested by various clients. Topics most frequently covered in the review period were related to the identification of break in series, interpolation and forecasting for various programs (education, justice, tourism) application of seasonal adjustment in various situations (for example, System of National Accounts, small area estimates and capacity utilization estimates for manufacturing) and specific applications of benchmarking and reconciliation. TSRAC members also contributed to the development of a directive and guidelines for time series continuity, jointly with the System of National Accounts (Statistics Canada, 2018).

TSRAC members continued their participation in various internal analytical and dissemination groups such as the Forum for Daily analysts and the forum on seasonal adjustment and economic signals. TSRAC members also met with international visitors (Manpower Singapore) and participated in exchanges both in person and virtually with other agencies including the Office for National Statistics, Eurostat, the US Census Bureau, the Bureau of Labor Statistics, and the Bureau of Economic Analysis to discuss current issues with seasonal adjustment and other time series tools and techniques under development. Several papers under consideration for journals and other publications on topics of benchmarking, reconciliation and seasonal adjustment software packages were also reviewed by TSRAC staff.

Time series processing and seasonal adjustment

This project monitors activities related to the support and development of the Time Series Processing System. Seasonal adjustment is done using X-12-ARIMA and X-13-ARIMA-SEATS (for analysis and development or production) or SAS Proc X12 (for production).

Several enhancements were implemented in the Time Series Processing System (TSPS) in order to prepare outputs needed for diagnostic reports and to make the system more robust and flexible in processing environments as required for the SAS grid (Ferland, 2017).

Initial work was done to empirically compare X12-ARIMA with the other methods for seasonal adjustment such as SEATS and other model-based methods. In addition, state space models were used to approximate each method and provide insight into the similarities and differences between the methods. This work will be presented at an upcoming conference of time series specialists.

An evaluation of interaction effects between trading days and holidays was also completed to determine if trading day effects can reasonably be assumed constant across different calendar months, notably those such as July or December with important holidays which may not exhibit the same change in level, given the different consumer habits surrounding certain holidays (Verret and Matthews, 2018).

The quality assurance process of Seasonal Adjustment was further developed, with an emphasis on clarifying client expectations, and documenting available methods in different situations (Matthews, 2018). Summary tools were prepared to help guide clients on resource requirements and what conditions need to be met to consider seasonal adjustment methods.

Support for G-Series (benchmarking and reconciliation)

This project entails the support and development of G-Series 2.0, which includes PROC BENCHMARKING and PROC TSRAKING, two SAS procedures, as well as the Macro TSBALANCING, which solves multi-dimensional reconciliation problems through a numerical approach. TSBALANCING was introduced with the release of version 2.0 and the training materials for reconciliation were updated to include this methodology. The macro has now been implemented for production for one survey and use will be expanded in other upcoming development work. Some of the challenges related to the application of these methods in the context of seasonal adjustment and some applied solutions were documented in Fortier and Ferland (2017).

A decommissioning plan was developed to guide the transition of production projects throughout the agency to the new version of G-Series. According to this plan, as of March of 2019, no internal production applications would be using earlier versions G-Series.

In terms of the methods available for benchmarking, research was done to identify and compare methods suitable for benchmarking of stock variables (Leung, 2018). Several options were considered, including benchmarking of first differences, and intermittent benchmarking to individual periods (closing inventories at end of year, for example). Of those identified, the most appealing is the direct BI method, which will be developed as a module in the Time Series Processing System. This module would include flexibility for more challenging aspects such as additive or multiplicative adjustments and the behaviour of the adjustment outside of the span of available benchmarks.

Modelling and forecasting

The recently acquired software SAS Forecast Studio (SAS/HPF) continued to be explored and used for various projects related to time series modelling. It proved to be an efficient preliminary tool to evaluate breaks in series and to detect large time series outliers. Forecasting techniques were applied in a number of projects to detect breaks in series in the absence of a formal parallel run. It has also been used to impute key units for periodic non-response and to forecast preliminary estimates for individual domains.

Specific research was conducted to become more familiar with state-space models. Benchmarking and reconciliation were expressed within this framework, as well as splines and other time series models (Picard, 2018; Picard, 2018b). A number of advantages were identified, including the ability to handle and treat missing data, the efficiency of the model estimation via a Kalman filter (relative to ARIMA and other models), and the somewhat sophisticated seasonal structures that can be specified. Before the approach could be used to develop seasonal adjustment for a production setting, a number of challenges would need to be addressed, such as the extent of revisions from adding new periods, and methods to make the method robust to extreme observations.

Additional analysis was explored, to estimate the effect of extreme weather on economic time series. While weather effects are not inherently included in seasonal adjustment, average weather patterns (climate) do tend to account for some of the periodic movements represented by the seasonal component. The approach that has been proposed by the Office for National Statistics is very much in line with X12-ARIMA, using an ARIMA model with specific weather data-based regressors to estimate the effect of a given dimension of the weather on an economic time series. A paper was presented at the 2017 Joint Statistical Meetings to describe the approach and present examples using Statistics Canada data (Matthews and Patak, 2017). The approach was applied to monthly data on retail sales to explore if it would perform well on a larger scale, with more automation (Aston and Patak, 2018). In addition, progress was made toward simplifying the access to weather data used in the analysis. Weather data was also used in a project to explore modelling of tourism counts and found to be a useful explanatory variable. In both of these cases, the weather effects were validated by subject-matter experts and contributed to a better understanding aggregate data.

Analysis was carried out to explore modelling of high-frequency (daily) data, with daily frontier counts used for tourism. Development is required in general to detect and estimate patterns in higher frequency data, especially when the number of occurrences of the cycle is small. In this project, we were able to integrate monthly patterns from monthly data and weekly patterns from daily data, as well as integrating extraneous weather variables to explain the series. An evaluation of the model used demonstrates some potential to produce advance indicators or nowcasts of tourism counts on a more timely basis, using the daily data accumulated for the partial month. This work has been submitted for a talk at the 2018 Statistics Canada Methodology Symposium. A summary of initial work related to weather and daily data can be found in Fortier, Matthews and Patak (2017).

Trend-cycle estimation

Following up on the addition of trend-cycle lines added in The Daily, further documentation was prepared for external users, which is now available on the Statistics Canada website (Statistics Canada, 2018b). This documentation includes the precise weights used in the moving averages so that results can be reproduced by external users. In addition, the comparative study originally used to select the method used was re-applied to recent estimates and the findings confirmed the original choice. The expansion of trend-cycle estimates to other programs will be considered further. The use of trend-cycle estimates in the detection of residual seasonality is also currently being explored.

For further information, please contact:
Steve Matthews (613-854-3174; steve.matthews@canada.ca).

2.3 Research Data Centres and Confidentiality support

The research data centres (RDC) provide researchers with access to microdata from population and household surveys in a secure university setting. They are operated under the provisions of the Statistics Act in accordance with all the confidentiality rules and are accessible only to researchers with approved projects who have been sworn in under the Act as “deemed employees.” The role of the methodologist is to provide support to the RDC analysts and researchers on vetting requests. Methodologists also develop survey-specific guidelines whenever a new survey becomes available in the RDCs.

Progress:

A Methodology Expert Panel (MEP) on the creation of Public Use Microdata File (PUMF) was put in place. The team of methodologists has the mandate to review, guide and recommend the approval of PUMF to the Microdata Release Committee (MRC). The MEP reviewed about half a dozen PUMFs this year. They also refined their process and prepared a seminar.

General support is provided to clients and other methodologists on disclosure control. The team in the Methodology Branch has been actively involved on a working group that has created a simple checklist for managers to assess the risk of disclosure from their products. Development is also sought on specific projects related to disclosure control, such as the Companion and Confid-on-the-fly.

The Companion was further developed to the point where it is ready for piloting in selected survey program areas. It has also added the capacity to advise managers on measures for reducing the risk of disclosure from their products that contain sensitive statistical information. Our continued communication with the Australian Bureau of Statistics concerning Confid-on-the-fly has renewed their efforts to resolve the reported issues with errors in multinomial regression and with parameters needed for the creation of perturbation tables required to produce its confidentialized outputs. A synthetic version of the MacKay file (data from the Survey on Financing and Growth of Small and Medium Enterprises) was produced. Preliminary investigations were also made into the methodology and workings of the R Synthpop package.

For further information, please contact:
Michelle Simard (613-293-3192; michelle.simard@canada.ca).

2.4 Quality Secretariat

The mandate of the Quality Secretariat is to promote and support the use of sound quality management practices across Statistics Canada.

The projects can be split into various subtopics with emphasis on the following:

Progress:

Capacity building

In 2016/2017, the Quality Secretariat developed a two-day pilot course on quality management for middle managers in subject-matter fields. The feedback was so overwhelmingly positive that it was clear that the Quality Secretariat did not have the capacity to offer such training to all potential participants. Instead, this fiscal year, the material was re-packaged into modules and made available to all Statistics Canada employees on the Internal Communications Network.

Another vehicle for capacity building is the Quality Guidelines. A revision of the Guidelines was undertaken this year. The most significant improvements are in terms of accessibility and relevance. The revised Guidelines are aligned with Version 5 of the Generic Statistical Business Process Model (GSBPM), offering succinct and specific quality assurance practices for activities throughout the statistical process, and covering sub-processes such as data integration and preparation of national accounts.

Other federal government departments have enquired about the quality management tools used at Statistics Canada. We openly share the Quality Assurance Framework and the Quality Guidelines, but these tools are very StatCan-centric. As the journey toward good quality management is long, with many small steps, the Quality Secretariat developed a Data Quality Toolkit, intended for anyone outside Statistics Canada who produces or uses data. The objective of the toolkit is to raise awareness about quality assurance practices. It offers two checklists, one for self-assessment by data producers, and the other to help users assess the fitness-for-use of a data set.

Quality indicators

There is growing need for tools to evaluate the sources of error and the magnitude of that error on administrative data. The Quality Secretariat worked with the Agriculture Division on providing the metadata for a preliminary admin-based release for the province of Alberta. For this situation, the established quality indicators for survey-based Statistics Canada releases were not applicable. We thus ensured that the note to users provided the requisite available metadata (e.g., timeliness), a warning about the lack of a true accuracy indicator, and the need to proceed with caution.

The Quality Secretariat was also approached by the Canadian Housing Statistics Program to discuss the data quality statement for their release in December 2017. It is clear that not only do we need new methods for measuring quality of administrative data, we also need new indicators and metadata to report the quality of data products composed in part or in whole of non-sample survey data.

These two collaborations led to the launch of a research project into methods to measure and report on the quality of non-sample survey data. The most technical challenge will be to develop a quantitative measure of accuracy, incorporating both variability (noise) and bias. The anticipated outcome of this research project is that metadata about data quality will facilitate informed decision-making by data users.

For further information, please contact:
Steven Thomas (613-882-0851; Steven.Thomas@Canada.ca).

2.5 Data Analysis Resource Centre

The Data Analysis Resource Centre (DARC) is a team of statistical consultants and researchers in the Methodology Branch. The main goals of DARC are to give advice on the appropriate use of data analysis tools and methods and to promote best practices in this area. DARC’s services—which focus mainly on survey, census or administrative data—are available to the employees of the agency or other departments, as well as analysts and researchers from academia or the research data centres (RDCs).

Progress:

Consultations

As part of DARC’s mandate, consultation was offered as requested by various clients. Specific consultation services were provided to Statistics Canada’s analysts from a dozen different divisions. These various consultations covered topics on the use of bootstrap weights, estimation of medians and their standard errors, tests of independence and other statistical tests, estimation of confidence intervals, variance estimation for age-standardized rates, helping with SUDAAN and SAS SURVEY procedures, etc.

The group also provided services to other methodologists. These consultations included questions on Poisson regression, survival analysis, degrees of freedom for variance estimation, analyzing older General Social Survey (GSS) cycles, for which the mean bootstrap weights are given, together with the new GSS cycles that come with the “standard” bootstrap weights, etc.

External consultations were also delivered to a variety of clients from other federal and provincial governments. The requests included using STATA for analyzing Canadian Health Measures Survey (CHMS) data, statistical testing using the 2016 Survey on Sexual Misconduct in the Canadian Armed Forces, statistical inference using a census of employees with high non-response, etc.

Finally, expert advice was given to the analysts and researchers from the RDCs. The topics included bootstrap variance estimation, combining survey cycles, multiple imputation using SAS, longitudinal analysis and weights, etc.

Training

The team presented the newly redesigned special seminar for recruits, “Analyzing Data from a Survey with a Complex Design.”

The course Statistical Analysis of Survey Data, Module 2, “Linear, Logistic and Generalized Logistic Regression Analysis”, was redesigned and presented in the fall.

Several other training activities were developed and / or presented, in particular at the annual RDC analysts conference in November (with new presentations on structural equation modeling and weighting). An introductory course on descriptive analysis using survey data and presentations at the Data Interpretation Workshop were also delivered.

Collaboration with analysts

An article co-authored by Isabelle Michaud, in collaboration with Dr. Jean-Philippe Chaput and Suzy L. Wong (Health Analysis Division [HAD]), entitled “Duration and quality of sleep among Canadians aged 18 to 79”, was published in the September issue of Health Reports. (Health Reports, Vol. 28, no. 9, pp. 28-33, September 2017 • Statistics Canada, Catalogue no. 82-003-X).

Another article entitled “The effect of reallocating time between sleep, sedentary or active behaviours on obesity and health in Canadian adults”, coauthored by Rachel Colley, Isabelle Michaud, and Didier Garriguet was accepted for publication in Health Reports.

For further information, please contact:
Harold Mantel (613-863-9135; harold.mantel@canada.ca).

2.6 Questionnaire Design Resource Centre (QDRC)

The Questionnaire Design Resource Centre (QDRC), Methodology Branch, is a focal point of expertise at Statistics Canada for questionnaire design and evaluation. The QDRC provides consultation and support services, and carries out projects and research related to the development, testing and evaluation of survey questionnaires. The QDRC plays a very important role in quality management and responds to program requirements throughout Statistics Canada by consulting with clients, respondents and data users and by pre-testing survey questionnaires.

While much of the QDRC’s work is carried out on a cost-recovery basis, the section is frequently approached on an ad hoc basis for expert reviews and consultation services on a wide variety of surveys.

Progress:

Questionnaire review and consultation

During the review period, the QDRC responded to requests for expert reviews and consultation services on a variety of survey topics. Some survey themes included the Canadian Forest Services' questionnaire, the New Tourism Vision (various visitor exit surveys for three northern territories), the Newfoundland and Labrador Tourism Visitor Exit Survey, PRASC – Organisation of Eastern Caribbean States (OECS) Advocacy Project, Organisation for Economic Co-operation and Development (OECD) – Survey on Policy Responses to New Forms of Work, and Statistics Canada’s Client Satisfaction Survey.

Training

The QDRC presented the three-day “Questionnaire Design” workshop (Course 410) three times in 2017/2018 (June, September and March). The course is offered as part of Statistics Canada’s Training and Development Program.

Other work

The QDRC is an active member of several ongoing Statistics Canada committees related to questionnaire development, including a team which reviews and applies a LEAN process to the development of EQ for efficiency purpose.

For further information, please contact:
Paul Kelly (613-371-1489; paulkelly2@canada.ca).

2.7 Statistical Consultation Group

Sub-project: Reducing non-response Bias with replacements

The Teaching and Learning International Survey (TALIS) is a large-scale survey that focuses on the working conditions of teachers and the learning environment in schools. In this survey, samples of schools are selected at the first stage for each participating country. For each of these sampled schools, a pair of replacement schools is selected so that if a sampled school refused to participate, then the corresponding replacement school would be invited to participate survey so that total nonresponse would be minimized at the school level. Although this approach is supported in the literature (Chapman, 1976 and 1982; Platek, Singh and Tremblay, 1978), the investigation carried out by Chapman (1982) failed to validate the appropriateness of this approach as the studies were not carried out under ideal conditions. This research focuses on comparing the TALIS’s current approach with three different approaches that are teacher imputation method, conditional probability estimation method and triplet clustering method using simulation studies to investigate the appropriateness of using replacements.

Progress:

Goal of this research to identify the appropriateness of the use of replacements in TALIS. This research is carried out in three parts. The first part focuses on identifying the characteristics of response rates for participating countries in TALIS. Based on the results of the analysis, appropriate assumption was arrived to carry out the next step. The second part pertains to the development of equations for above-mentioned approaches which was completed. The third and final part of the work requires simulating schools with similar characteristics as the schools from participated countries in TALIS and producing estimates to compare these above mentionned approaches. An extensive SAS program was created to simulate the required samples and produced the estimations. An R program was created to produce the graphical display of these results.

Documentation of the method is completed but the results of these analyses are soon to be added.

For further information, please contact:
Ahalya Sivathayalan (613-302-6647; Ahalya.Sivathayalan@canada.ca).

References

Chapman, D.W. (1976). A Survey of nonresponse imputation procedures. Proceedings of the Social Statistics Section, American Statistical Association, Part I, 245-251.

Chapman, D.W. (1982). Substitution for Missing Units. Bureau of Census.

Platek, R., Singh, M.P. and Tremblay, V. (1978). Adjustment for nonresponse in surveys. In Survey Sampling and Measurement (Ed., N. Krishnan Namboodiri), Academic Press, New York, San Francisco, London, 1978.

2.8 Knowledge Transfer - Statistical Training

In 2017-2018, there was a pause in the automatic scheduling of the large curriculum of courses under the statistical training umbrella. This time was used to think about new strategies to enable us to achieve our objective for capacity building and talent development.Generally speaking, the curriculum is now split into thematic blocks under the responsabilities of the appropriate resource centres (and relevant activities are reported in their respective section).For core survey methods, a small group of experts is reviewing and reorgonising the content, with new courses scheduled to start in 2018-2019. Moreover, two year-long activities focussing on a very active learning approach were successfully piloted for machine learning and data science.These learning groups were a hybrid between a reading and working group tasked with evaluating and organizing other learning activities for their peers.

For further information, please contact:
Susie Fortier (613-220-1948; susie.fortier@canada.ca).

2.9 Knowledge Transfer – Survey Methodology

Survey Methodology is an international journal available at http://www.statcan.gc.ca/surveymethodology that publishes articles in both official languages on various aspects of statistical development relevant to a statistical agency. Its editorial board includes world-renowned leaders in survey methods from the government, academic and private sectors. The journal is released in fully accessible HTML format and in PDF.

The work related to the editorial and production processes include: correspondence with authors, referees, associate editors, and subscribers; review of referees’ comments and author revisions; re-formatting manuscripts; copy editing of manuscripts; liaison with translation and dissemination; and maintenance of a data base of submitted papers. It is part of the knowledge transfer activities.

Progress:

The June and December 2017 issues (43-1 and 43-2) were released in both PDF and HTML version. The June issue contains 7 regular papers. The December 2017 issue contains one special paper discussing the past, present and future of sample surveys followed by four short discussions of the paper, two regular papers and one short note.

Most of the journal's historical papers are available online (including all papers published after December 1981 (volume 7-2).) Older papers can still be obtained upon request. A subset of the papers published before issue 7-2 were selected based on their relevance, have been prepared, translated and added to the website.

From April 2017 to March 2018, Survey Methodology pages were viewed 48,500 times and 64,800 copies of papers were downloaded. 43 papers were submitted for publication.

A new online tool to manage the editorial process called Scholar One was tested and is being implemented. A special edition for the 9th colloque francophone sur les sondages (2016) will be released in December 2018. Some papers presented at a conference titled "Contemporary Theory and Practice in Survey Sampling: A Celebration of Research Contributions by J.N.K. Rao" have been selected and will be released in a special issue published in collaboration with the International Journal of Statistics.

For further information, please contact:
Susie Fortier (613-220-1948; susie.fortier@canada.ca). 
Date modified: