Support Activities

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Time series
Record linkage resource centre
Data analysis resource centre (DARC) support
Generalized systems
Quality assurance
Statistical training
Conferences
Survey methodology

For more information on the program as a whole, contact:
Mike Hidiroglou (613-951-0251, mike.hidiroglou@statcan.gc.ca).

Time series

The projects can be split into seven sub-topics:

  • Consultation in Time Series (including course development)
  • Time Series Processing and Seasonal Adjustment
  • Benchmarking (including development and support to G-Series)
  • Reconciliation (including development and support to G-Series)
  • Calendarization
  • Trend Estimation
  • Other R&D Time Series Projects

Consultation in Time Series (including course development)

As part of the Time Series Research and Analysis Centre (TSRAC) mandate, consultation was offered as requested to various clients. Following one of this request, an investigation of treatments for abrupt changes in seasonality (such as seasonal outliers from X-13 and change of regime) was performed and shared within TSRAC.

Some papers were reviewed for official Statistics Canada publications or refereed for external journals.

To support the publication of Wyman (2010), a working paper on gain and phase shift was produced (Quenneville, 2010).

Various courses and introductory workshops were given. The new course on raking was completed and successfully presented as a pilot course. The course will be incorporated in the regular training offering.

Time Series Processing and Seasonal Adjustment

This project monitors high-level activities related to the support and development of the Time Series Processing System. Seasonal adjustment is done using X-12-ARIMA (for analysis and development or production) or SAS Proc X12 (for production). TSRAC had various exchanges with the SAS Institute and ONS on the use of PROC X12. The latest versions of X-12-ARIMA, Win X-12 and X-12 Graph packages were downloaded and evaluated as needed.

A beta version of the new Time Series Processing System now includes a benchmarking module (from the G-series/Forillon software). The input and output requirements were also documented (TSRAC, 2010).

Following a suggestion from both the Policy and Methods Committee and the Standards Committee, a horizontal review on seasonal adjustment was carried out within the agency (Julien and Fortier, 2011).

Benchmarking (including development and support to G-series/Forillon)

Benchmarking refers to techniques used to ensure coherence between time series data of the same target variable measured at different frequencies, for example, sub-annually and annually.

Support for PROC BENCHMARKING was provided as needed. Many information requests were received from international agencies and national banks. Draft methodological specifications to include alterability coefficients and to facilitate the use of implicit forecasted benchmarks were prepared.

The use of benchmarking in the context of seasonal adjustment was summarized in a review paper (Quenneville and Fortier, 2010).

Development of compatibility tests continued and was documented. The technique was first presented at the Advisory Committee on Statistical Methods and then updated and submitted for publication (Quenneville and Gagné, 2010).

Benchmarking techniques from an error-model point of view continue to be explored and documented (Chen and Wu, 2010a and 2010b).

Raking (including development and support to G-series/Forillon)

Reconciliation is a method for imposing contemporaneous aggregation constraints on tables of time series, so that the "cell" series add up to the appropriate "marginal total" series.

Support for PROC TSRAKING was provided as needed. The procedure was publicized with a poster presentation at the SAS Global Forum (Bérubé and Fortier, 2010).

Time series reconciliation or raking is most often use on seasonally adjusted data to resolve the inconsistencies between a direct and an indirect adjustment. This use is presented in Mazzi, Fortier and Quenneville (2010).

The last few years of development of the methods and tools for benchmarking and reconciliation were summarized in a review paper (Quenneville and Fortier, 2011).

Calendarization

This sub-topic deals with the support and development of calendarization methods both from a benchmarking point of view and regarding the more recent and extremely promising spline interpolation techniques.

Spline interpolation techniques for calendarization were developed and presented at the Joint Statistical Meeting (Vancouver, August 2010). The paper was also submitted for publication in a refereed journal (Quenneville, Picard and Fortier, 2010).

An update of the calendarization methods and their applications in business surveys was also prepared for the Advisory Committee on Statistical methods (Fortier, Quenneville and Picard, 2010).

Trend estimation

Methods used for long-term trend estimation in environmental data were reviewed. The use of a trend-cycle approach was presented in Bemrose, Meszaros, Quenneville, Henry, Kemp and Soulard(2010) and in Bemrose, Meszaros and Quenneville (2011). This paper won Statistics Canada's Tom-Symons Award. The work continued to further develop the method and to incorporate the statistical properties of the linear trend estimate computed on smoothed data (Quenneville, Benmore and Meszaros, 2011).

Methods for short-term trend analysis (also referred to as "real-time" analysis) were studied, especially those related to reproducing kernel Henderson filters. (Bianconcini and Quenneville, 2010).

Other R&D time series projects

Modeling is another area of time series research, and unit-root testing is a core part of identifying ARIMA models. In collaboration with professors from the Chinese University of Honk Kong and the Hangzhou Normal University, work continued on the theoretical distribution – exact, approximate, and asymptotic – of the sample variogram to obtain a test for unit root (Chen, Wu and Wang, 2011).

For further information, contact:
Susie Fortier (613-951-4751, susie.fortier@statcan.gc.ca).

Reference

Wyman, D. (2010). Seasonal adjustment and identifying economic trends. Statistics Canada: Canadian Economic Observer, March 2010, Catalogue no. 11-010-X.

Record linkage resource centre

The objectives of the Record Linkage Resource Centre (RLRC) are the following:

  • To provide consulting services to both internal and external users of record linkage methods, including recommendations on software and methodology and collaborative work on record linkage applications.
  • To evaluate alternative record linkage methods and develop improved methods.
  • To evaluate software packages for record linkage and, where necessary, develop prototype versions of software incorporating methods not available in existing packages.
  • To assist in the dissemination of information concerning record linkage methods, software and applications to interested persons both within and outside Statistics Canada.

Below is a list of our activities in 2010-2011

  • Continued to provide support work to the development team of G-Link, including participating in Record Linkage User Group meetings.
  • Provided various consulting services to internal users of record linkage methods.
  • Consultation services (steps to access data for record linkage, provision of documentation and information on RL policy), services associated with the probabilities calculated using G-Link to select better rules and thresholds.
  • Sharing pre-processing macros for cleaning up much of the data (treatment of dates, standardization and parsing names and surnames, validity of identifiers (SIN, HIN)).
  • Presentation of a seminar on the Record Linkage Resource Centre (RLRC) and all the resources available to methodologists and external users in Methodology Branch's Learning Week.
  • Preparations for the presentation of an article by Winkler (2009) entitled "Record linkage" for Methodology Branch's article-of-the-month program. The focus will be on the following topics: the Fellegi-Sunter theoretical approach, the generalized EM algorithm approach, string comparators (Jaro-Winkler), the importance of pre-processing steps, estimation of the error rate, and adjustment of the linear regression analysis using matched data.
  • An update of Methodology Branch's list of probabilistic linkage projects.
  • A study of the linkage error rate due to false positives and false negatives under the usual probabilistic matching assumptions to automatically generate the thresholds for charts of observed error rates. For approaches using conditional weights, certain tools must first be developed in G-Link when frequency weights are used. A project is under way to create that functionality.

For further information, contact:
Abdelnasser Saidi (613-951-0328, abdelnasser.saidi@statcan.gc.ca).

Data analysis resource centre (DARC) support

The Data Analysis Resource Centre (DARC) is a centre of data analysis expertise within Statistics Canada whose purpose is to encourage, suggest and provide good analytical methods and tools for use with Statistics Canada data. While DARC gets funding for its support activities from a number of sources, this report will concentrate mainly on work charged to the Methodology Block Fund. DARC also uses a portion of its Methodology Block Fund support activities for doing research. Those research activities are also partially funded by Data Analysis Research (1919) and are described in the report for that project.

Consultations

In 2010/11, DARC did a great number of consultations on statistical methods with analysts from many areas within Statistics Canada. As well as giving advice on a methodological approach, many consultations also included support for determining and using suitable software for an analysis. Many of the problems were related to the more traditional analytical approaches with survey data, such as estimation and inference involving descriptive statistics and fitting of linear and logistic models to survey data. However, other consultations involved extensions of the traditional methods and newer, more complex issues, such as accounting for the correlation of repeated measures, the use of hierarchical models with survey data and the integration of data from more than one survey into a single analysis. Some consultations were not based on survey data, such as the estimation of confidence intervals for ratios of standardized rates estimated from administrative sources when the counts are small.

Again this year, there was a considerable consultation with other methodologists who were either doing analysis themselves or were supporting subject-matter analysts of the survey data for which they were responsible. One example of this was the support to Canadian Health Measures Survey (CHMS) methodologists in determining how to modify analytical methods to account for the limited degrees of freedom for variance estimation, and how joint research might be done with CHMS and the American survey NHANES. Another example is the support given for an analysis of data from the 2007 Survey of Drinking Water Plants; this resulted in a coauthorship on a paper in Envirostats (See Nelligan, Wirth, De Cuypere and Mach 2011).

Several consultations also took place with people external to both Statistics Canada and the Research Data Centre (RDC) network. One consultation was with researchers from the Institute for Clinical Evaluative Sciences, wishing assistance with centering and standardization and with calculating survey bootstrap variance estimates. Another consultation involved assistance to a student doing research with a survey PUMF file.

The consultation function also included technical reviewing and refereeing of articles for both internal and external journals, including Envirostats, Canadian Social Trends, and the proceedings of the Survey Methods Section of the Statistics Society of Canada (SSC) and of the Joint Statistical Meetings.

Provision of training

During 2010/11, DARC was responsible for presenting Course 0438, Statistical Analysis of Survey Data. Both parts (0438A and 0438B) were presented in both English and French. Additional material was also developed. Some methodologists outside of DARC were involved in the teaching.

DARC members were involved in the brainstorming sessions for each session of the Data Interpretation Workshop, in the project reviews and in presenting a seminar on the analysis of survey data. There has been follow-up with some of the participants who wanted further assistance with their analysis projects.

DARC presented a seminar in the seminar series aimed at new recruits on the topic of analysis of survey data.

Several "brownbag" telephone seminars were provided to RDC analysts on analytical topics of interest to them. An overview seminar was also given at the COOL RDC. While most of the preparation time for these seminars was covered by budget from the RDCs, a small amount is charged to Methodology Research and Development Program (MRDC) since the resulting material is useful outside of the RDCs.

Software evaluation, development and promotion

During 2010/11, DARC continued to examine various commercial software packages with respect to their suitability for analyzing Statistics Canada survey data. One activity was keeping abreast of new features in software that can make use of survey bootstrap weights for design-based analysis (e.g., SUDAAN 10, Stata 11 and SAS 9.2). Feedback was provided to Stata Corp on the new bootstrapping features they were releasing. As well, a "how-to" document was prepared to introduce new researchers to the implementation of weighted estimation and bootstrap variance estimation in a number of different software packages (See Gagné, Roberts and Keown 2011). Such a document was requested by RDC analysts (and its preparation was mainly covered by RDC support resources), but it has been made available for evaluation, in draft form, to a number of other researchers.

A seminar called "Introduction to SAS-callable SUDAAN" was presented to a division in Statistics Canada that had newly gained access to SUDAAN.

Work was done on updating our knowledge of diagnostics available in survey analysis software related to logistic regression of survey data. These findings will be incorporated into our training and written material.

DARC spent a little time on supporting the use of Stata and SUDAAN at Statistics Canada.

Other activities

DARC resources were used to support one of the PhD students at Statistics Canada on NICDS/MITACS internships.
 
DARC members were involved in a variety of organizing functions for the program of the Survey Methods Section of the SSC in 2010 and 2011 and for the Methodology Interchange of Statistics Canada and the U.S. Census Bureau held in March 2011.

Lenka Mach and Karla Fox organized a case study entitled "Gender Gap in Earnings among Young People" for the 2011 annual meeting of the SSC. This involved finding appropriate data, developing research questions and writing background material for the students. Groups of students registered for the case study will compete for an award.

Georgia Roberts sat on the editorial boards of Health Reports and Canadian Social Trends.

For further information, contact:
Georgia Roberts (613-951-1471, georgia.roberts@statcan.gc.ca).

Generalized systems

G-Confid

G-Confid (formerly known as Confid2) is the generalized software used to suppress sensitive cells in a data release. It preserves the confidentiality of data providers while masking the minimum amount of information. Version 1.0 of Confid2 was released in May 2009, and version 1.02 was released in January 2010. Version 1.03 will be released in April 2011. Enhancements in version 1.03 include the following:

Re-engineering

  • Despite being a relatively new system, advances in SAS itself and in programming techniques precipitated a re-engineering of G-Confid resulting in faster run-time.
  • Users now have the option to scale the cost coefficient used in cell suppression, another strategy for reducing run-time.

Greater flexibility

  • Users can specify aggregates on which the confidentiality will be assessed.
  • An additional macro was created to allow users to create tables from the suppression patterns produced by G-Confid.
  • An additional macro was created to giving users more information about the aggregates.
  • Users can perform G-Confid tasks using SAS Enterprise Guide 4.3.

Banff

The Banff system for edit and imputation was developed to satisfy most of the edit and imputation requirements of economic surveys at Statistics Canada. Banff can be called with programs in SAS, tasks in SAS Enterprise Guide or Metadata with the Banff Processor. The Banff Processor allows users to enter their edits in a spreadsheet format which is then read and parsed into SAS code. The Banff Processor was released internally in August 2009 and user support was provided. The production version of Banff 2.04 (including the Processor) was expected to be released in March 2011. Additional enhancements have been investigated and will be included in the next version of Banff, expected to be released in December 2012. These include:

  • Sigma-gap outlier detection method;
  • Identification of top contributors as an alternative outlier detection method;
  • The ability to keep track of the fields that have already been imputed, so that the user can prioritize these for imputation again if the record still does not pass edits;
  • Allowing the user the option to limit the number of times a record is used as a donor.

StatMx

StatMx is a collection of SAS macros that provide additional functions and capabilities beyond those currently available in either the Generalized Sampling System (GSAM) or the Generalized Estimation System (GES).

In April, StatMx 3.6 was released. This version includes updated documentation and support for SAS 9.2.

A literature review on jackknife and bootstrap replication methods for estimating sampling variance was undertaken. Some design issues are still to be resolved, after which a prototype will be built.

Documentation describing the macro parameters and the methodology was completed and made available for the calibration estimation and calibration weights functions. The system documentation for the stratification and allocation functions was also completed.

For further information, contact:
Laurie Reedman (613-951-7301, laurie.reedman@statcan.gc.ca).

Quality assurance

Quality Control Data Analysis System (QCDAS)

QCDAS is SAS based with a menu-driven user interface. It is used to analyze, estimate and tabulate summary statistics for quality control programs. The system is also used to generate reports to reflect the analysis. The system is performing well in production, allowing the team to discover more functionality to add. These include the addition of several reports allowing the users to interpret data capture quality at the operator or the survey level.

Generic intelligent character recognition (ICR)

ICR is the technology used for data capture, which makes use of a combination of automated machine capture (using optical character, mark and image recognition), along with the manual heads-up capture of data by operators. The software used is called "AnyDoc". BSMD has developed and implemented a generic Quality Control for this ICR software for the document preparation, scanning, automated data entry (ADE) and manual key-from-image (KFI) stages of data processing. A simulation study was designed and executed to use the software to not only detect the presence of data on a questionnaire but to capture that data as well. The study involved manually capturing data from a set of test questionnaires, then allowing the software to capture that same data, using different error tolerance levels that can be set as parameters in the software itself. The results from the first type of questionnaire studied were promising. Next, more tests will be performed on different types of questionnaires. If successful, this will improve the efficiency of the capture process by reducing the amount of data that has to be captured manually by an operator, while maintaining the high accuracy rate of the captured data.

Automated coding, parsing and scoring methodology (G-Code)

G-Code (formerly known as ACTR) is used to automatically assign predefined codes to responses to open-ended questions. This is done in two steps. In the first step, both the input and reference text are parsed by a user-defined parsing strategy in order to reduce the text to a standard form. Parsing deals with problems such as common spelling variations, abbreviations, etc. The parsing strategy plays a strong role in determining the success rate of the coding process. The second step is to match the parsed input text to a list of parsed descriptions in a reference file and assign the associated code when a match is successful. Direct or indirect matching can be done. For indirect matching, a weight is assigned to each matching word in the input phrase and a score for this phrase is computed based on weights and the number of words in common between the input and the reference descriptions.

Research to improve indirect matching was continued using the Levenshtein algorithm. Simulation studies demonstrated that the new strategy is complementary to the one currently in the G-Code software. Users can expect to achieve more matches if they use one strategy followed by the other rather than using only one of the two strategies. The new strategy will be included in a new version of the G-Code software, and the interface is being enhanced to allow the users more flexibility and additional functionality.

Methodology acted as a consultant to the Standards Division regarding which parsing strategy to implement in the new generalized interactive coding tool to be launched in April 2011.

For further information, contact:
Laurie Reedman (613-951-7301, laurie.reedman@statcan.gc.ca).

Statistical training

The Statistical Training Committee (CFS) coordinates the development and delivery of 24 regularly scheduled courses in survey methods, sampling theory and practice, questionnaire design, time series methods and statistical methods for data analysis. During the year, a record number of 31 scheduled sessions (104 days of training) were given in either English or French.

The suite of courses continues to expand as 4 new courses were given for the first time in 2010-2011

  • 0437: Theory and Application of Raking
  • 0448: Calibration and balanced sampling in surveys
  • 0488: Variance estimation: Theory and Practice
  • 0495: Using Paradata in Applied Operational and Methodological Research for Social Surveys

In addition, three other new courses are currently being developed:

  • 0494: Introduction to Longitudinal Survey Data Analysis
  • 0460: Methodological Issues for New Data Collection Methods
  • 0492: Automated Coding.

For more information, contact:
François Gagnon (613-951-1463, francois.gagnon@statcan.gc.ca).

Conferences

2009 Symposium:

The 25th International Methodology Symposium was held on October 27 to 30, 2009 at the Palais des congrès in Gatineau under the theme of "Longitudinal Surveys: From Design to Analysis". The individual papers for the 2009 Symposium proceedings were translated and verified. Final formatting, approval by the Dissemination Division, loading of the papers onto the Statistics Canada website and creation of a CD-ROM are upcoming.

2010 Symposium:

The 26th International Methodology Symposium was held on October 26 to 29, 2010 at the Crowne Plaza Hotel under the theme of "Social Statistics: The Interplay among Censuses, Surveys and Administrative Data". Close to 530 people were in attendance. Jelke Bethlehem was the keynote speaker, and Ivan P. Fellegi was the Waksberg Award speaker. Three workshops were also given on the first day of the symposium:

  • Karla Fox and Lori Stratychuk: "Record Linkage Methods";
  • André Cyr, Julien Bérard-Chagnon, Éric Caron Malenfant, and Dominic Grenier: "From Traditional Demographic Calculations to Projections by Microsimulations";
  • Fritz Scheuren and Young Chun: "Using Administrative/Operating Systems to Strengthen Statistical Survey/Census Systems".

In the months leading up to the Symposium, the organizing committee put together a program made up of over 87 presentations from 25 different countries. It also took care of the details surrounding registration, operations and facilities. The proceedings should be disseminated by late 2011.

2011 Symposium:

The preparations for the 27th International Methodology Symposium are well under way. The Symposium will take place from November 1 to 4 at the Ottawa Convention Centre in Ottawa and the theme will be "Strategies for Standardization of Methods and Tools - How to get there". The keynote speaker will be Susan Linacre and Danny Pfefferman will be the Waksberg Award speaker. The Symposium will be preceded by a day of workshops.

For further information, contact:
Colin Babyak (613-951-2045, colin.babyak@statcan.gc.ca).

Survey methodology

Survey Methodology (SM) is an international journal that publishes articles in both official languages on various aspects of statistical development relevant to a statistical agency. Its editorial board includes world-renowned leaders in survey methods from the government, academic and private sectors. It is one of just two major journals in the world dealing with methodology for official statistics.

The June 2010 issue, SM 36-1, was released on June 29nd, 2010. The issue contains 10 papers.

The December 2010 issue, SM 36-2, was released on December 21, 2010. It contains 11 papers including the tenth in the annual invited paper series in honour of Joseph Waksberg. The recipient of the 2010 Waksberg Award is Ivan Fellegi.

In 2010, the journal received 57 submissions of papers by various authors.

The available on-line electronic versions of SM now range from 36-2 (December 2010) to 25-1 (June 1999). Electronic index and abstracts are also available up to 22-2 (December 1996). The conversion of back issues to an electronic version continues.

During the review period, the editorial board and the assistant editors worked to standardize, streamline and document the editorial process including the various contacts with the authors and the Associate editors.

For more information, contact:
Susie Fortier (613-951-4751, susie.fortier@statcan.gc.ca).