Keyword search

Sort Help
entries

Results

All (31)

All (31) (0 to 10 of 31 results)

  • Articles and reports: 11-522-X202100100021
    Description: Istat has started a new project for the Short Term statistical processes, to satisfy the coming new EU Regulation to release estimates in a shorter time. The assessment and analysis of the current Short Term Survey on Turnover in Services (FAS) survey process, aims at identifying how the best features of the current methods and practices can be exploited to design a more “efficient” process. In particular, the project is expected to release methods that would allow important economies of scale, scope and knowledge to be applied in general to the STS productive context, usually working with a limited number of resources. The analysis of the AS-IS process revealed that the FAS survey incurs substantial E&I costs, especially due to intensive follow-up and interactive editing that is used for every type of detected errors. In this view, we tried to exploit the lessons learned by participating to the High-Level Group for the Modernisation of Official Statistics (HLG-MOS, UNECE) about the Use of Machine Learning in Official Statistics. In this work, we present a first experiment using Random Forest models to: (i) predict which units represent “suspicious” data, (ii) to assess the prediction potential use over new data and (iii) to explore data to identify hidden rules and patterns. In particular, we focus on the use of Random Forest modelling to compare some alternative methods in terms of error prediction efficiency and to address the major aspects for the new design of the E&I scheme.
    Release date: 2021-10-15

  • 19-22-0004
    Description: One of the main objectives of statistics is to distill data into information which can be summarized and easily understood. Data visualizations, which include graphs and charts, are powerful ways of doing so. The purpose of this information session is to provide examples of common graphs and charts, highlight practical advice to help the audience choose the right display for their data, and identify what to avoid and why. An overall objective is to build capacity and increase understanding of fundamental techniques which foster accurate and effective dissemination of statistics and research findings.

    https://www.statcan.gc.ca/en/wtc/information/19220004
    Release date: 2020-10-30

  • Surveys and statistical programs – Documentation: 71-543-G
    Description:

    The Guide to the Labour Force Survey contains a dictionary of concepts and definitions and covers topics such as survey methodology, data collection, data processing and data quality. It also contains information on products and services, sub-provincial geography descriptions as well as the survey questionnaire.

    Release date: 2020-04-09

  • Articles and reports: 12-001-X201600114538
    Description:

    The aim of automatic editing is to use a computer to detect and amend erroneous values in a data set, without human intervention. Most automatic editing methods that are currently used in official statistics are based on the seminal work of Fellegi and Holt (1976). Applications of this methodology in practice have shown systematic differences between data that are edited manually and automatically, because human editors may perform complex edit operations. In this paper, a generalization of the Fellegi-Holt paradigm is proposed that can incorporate a large class of edit operations in a natural way. In addition, an algorithm is outlined that solves the resulting generalized error localization problem. It is hoped that this generalization may be used to increase the suitability of automatic editing in practice, and hence to improve the efficiency of data editing processes. Some first results on synthetic data are promising in this respect.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201400214089
    Description:

    This manuscript describes the use of multiple imputation to combine information from multiple surveys of the same underlying population. We use a newly developed method to generate synthetic populations nonparametrically using a finite population Bayesian bootstrap that automatically accounts for complex sample designs. We then analyze each synthetic population with standard complete-data software for simple random samples and obtain valid inference by combining the point and variance estimates using extensions of existing combining rules for synthetic data. We illustrate the approach by combining data from the 2006 National Health Interview Survey (NHIS) and the 2006 Medical Expenditure Panel Survey (MEPS).

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214091
    Description:

    Parametric fractional imputation (PFI), proposed by Kim (2011), is a tool for general purpose parameter estimation under missing data. We propose a fractional hot deck imputation (FHDI) which is more robust than PFI or multiple imputation. In the proposed method, the imputed values are chosen from the set of respondents and assigned proper fractional weights. The weights are then adjusted to meet certain calibration conditions, which makes the resulting FHDI estimator efficient. Two simulation studies are presented to compare the proposed method with existing methods.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201300111825
    Description:

    A considerable limitation of current methods for automatic data editing is that they treat all edits as hard constraints. That is to say, an edit failure is always attributed to an error in the data. In manual editing, however, subject-matter specialists also make extensive use of soft edits, i.e., constraints that identify (combinations of) values that are suspicious but not necessarily incorrect. The inability of automatic editing methods to handle soft edits partly explains why in practice many differences are found between manually edited and automatically edited data. The object of this article is to present a new formulation of the error localisation problem which can distinguish between hard and soft edits. Moreover, it is shown how this problem may be solved by an extension of the error localisation algorithm of De Waal and Quere (2003).

    Release date: 2013-06-28

  • Public use microdata: 89M0017X
    Description:

    The public use microdata file from the 2010 Canada Survey of Giving, Volunteering and Participating is now available. This file contains information collected from nearly 15,000 respondents aged 15 and over residing in private households in the provinces.The public use microdata file provides provincial-level information about the ways in which Canadians donate money and in-kind gifts to charitable and nonprofit organizations; volunteer their time to these organizations; provide help directly to others. Socio-demographic, income and labour force data are also included on the file.

    Release date: 2012-05-04

  • Articles and reports: 11-522-X200800011010
    Description:

    The Survey of Employment, Payrolls and Hours (SEPH) is a monthly survey using two sources of data: a census of payroll deduction (PD7) forms (administrative data) and a survey of business establishments. This paper focuses on the processing of the administrative data, from the weekly receipt of data from the Canada Revenue Agency to the production of monthly estimates produced by SEPH.

    The edit and imputation methods used to process the administrative data have been revised in the last several years. The goals of this redesign were primarily to improve the data quality and to increase the consistency with another administrative data source (T4) which is a benchmark measure for Statistics Canada's System of National Accounts people. An additional goal was to ensure that the new process would be easier to understand and to modify, if needed. As a result, a new processing module was developed to edit and impute PD7 forms before their data is aggregated to the monthly level.

    This paper presents an overview of both the current and new processes, including a description of challenges that we faced during development. Improved quality is demonstrated both conceptually (by presenting examples of PD7 forms and their treatment under the old and new systems) and quantitatively (by comparison to T4 data).

    Release date: 2009-12-03

  • Articles and reports: 11-522-X200800011014
    Description:

    In many countries, improved quality of economic statistics is one of the most important goals of the 21st century. First and foremost, the quality of National Accounts is in focus, regarding both annual and quarterly accounts. To achieve this goal, data quality regarding the largest enterprises is of vital importance. To assure that the quality of data for the largest enterprises is good, coherence analysis is an important tool. Coherence means that data from different sources fit together and give a consistent view of the development within these enterprises. Working with coherence analysis in an efficient way is normally a work-intensive task consisting mainly of collecting data from different sources and comparing them in a structured manner. Over the last two years, Statistics Sweden has made great progress in improving the routines for coherence analysis. An IT tool that collects data for the largest enterprises from a large number of sources and presents it in a structured and logical matter has been built, and a systematic approach to analyse data for National Accounts on a quarterly basis has been developed. The paper describes the work in both these areas and gives an overview of the IT tool and the agreed routines.

    Release date: 2009-12-03
Data (6)

Data (6) ((6 results))

  • Public use microdata: 89M0017X
    Description:

    The public use microdata file from the 2010 Canada Survey of Giving, Volunteering and Participating is now available. This file contains information collected from nearly 15,000 respondents aged 15 and over residing in private households in the provinces.The public use microdata file provides provincial-level information about the ways in which Canadians donate money and in-kind gifts to charitable and nonprofit organizations; volunteer their time to these organizations; provide help directly to others. Socio-demographic, income and labour force data are also included on the file.

    Release date: 2012-05-04

  • Public use microdata: 82M0011X
    Description:

    The main objective of the 2002 Youth Smoking Survey (YSS) is to provide current information on the smoking behaviour of students in grades 5 to 9 (in Quebec primary school grades 5 and 6 and secondary school grades 1 to 3), and to measure changes that occurred since the last time the survey was conducted in 1994. Additionally, the 2002 survey collected basic data on alcohol and drug use by students in grades 7 to 9 (in Quebec secondary 1 to 3). Results of the Youth Smoking Survey will help with the evaluation of anti-smoking and anti-drug use programs, as well as with the development of new programs.

    Release date: 2004-07-14

  • Public use microdata: 82M0010X
    Description:

    The National Population Health Survey (NPHS) program is designed to collect information related to the health of the Canadian population. The first cycle of data collection began in 1994. The institutional component includes long-term residents (expected to stay longer than six months) in health care facilities with four or more beds in Canada with the principal exclusion of the Yukon and the Northwest Teritories. The document has been produced to facilitate the manipulation of the 1996-1997 microdata file containing survey results. The main variables include: demography, health status, chronic conditions, restriction of activity, socio-demographic, and others.

    Release date: 2000-08-02

  • Public use microdata: 12M0010X
    Description:

    Cycle 10 collected data from persons 15 years and older and concentrated on the respondent's family. Topics covered include marital history, common- law unions, biological, adopted and step children, family origins, child leaving and fertility intentions.

    The target population of the GSS (General Social Survey) consisted of all individuals aged 15 and over living in a private household in one of the ten provinces.

    Release date: 1997-02-28

  • Public use microdata: 82F0001X
    Description:

    The National Population Health Survey (NPHS) uses the Labour Force Survey sampling frame to draw a sample of approximately 22,000 households. The sample is distributed over four quarterly collection periods. In each household, some limited information is collected from all household members and one person, aged 12 years and over, in each household is randomly selected for a more in-depth interview.

    The questionnaire includes content related to health status, use of health services, determinants of health and a range of demographic and economic information. For example, the health status information includes self-perception of health, a health status index, chronic conditions, and activity restrictions. The use of health services is probed through visits to health care providers, both traditional and non-traditional, and the use of drugs and other medications. Health determinants include smoking, alcohol use, physical activity and in the first survey, emphasis has been placed on the collection of selected psycho-social factors that may influence health, such as stress, self-esteem and social support. The demographic and economic information includes age, sex, education, ethnicity, household income and labour force status.

    Release date: 1995-11-21

  • Public use microdata: 82M0008X
    Description:

    The survey, begun in February 1994, monitors the smoking patterns of Canadians over a 12 month period and to measure any changes in smoking resulting from the decrease in taxes in cigarettes which took place in February 1994 in some provinces. It is related to MDF 82M0006. Updates are included in the microdata file price. A guide for this microdata file is available.

    Release date: 1995-06-08
Analysis (16)

Analysis (16) (0 to 10 of 16 results)

  • Articles and reports: 11-522-X202100100021
    Description: Istat has started a new project for the Short Term statistical processes, to satisfy the coming new EU Regulation to release estimates in a shorter time. The assessment and analysis of the current Short Term Survey on Turnover in Services (FAS) survey process, aims at identifying how the best features of the current methods and practices can be exploited to design a more “efficient” process. In particular, the project is expected to release methods that would allow important economies of scale, scope and knowledge to be applied in general to the STS productive context, usually working with a limited number of resources. The analysis of the AS-IS process revealed that the FAS survey incurs substantial E&I costs, especially due to intensive follow-up and interactive editing that is used for every type of detected errors. In this view, we tried to exploit the lessons learned by participating to the High-Level Group for the Modernisation of Official Statistics (HLG-MOS, UNECE) about the Use of Machine Learning in Official Statistics. In this work, we present a first experiment using Random Forest models to: (i) predict which units represent “suspicious” data, (ii) to assess the prediction potential use over new data and (iii) to explore data to identify hidden rules and patterns. In particular, we focus on the use of Random Forest modelling to compare some alternative methods in terms of error prediction efficiency and to address the major aspects for the new design of the E&I scheme.
    Release date: 2021-10-15

  • Articles and reports: 12-001-X201600114538
    Description:

    The aim of automatic editing is to use a computer to detect and amend erroneous values in a data set, without human intervention. Most automatic editing methods that are currently used in official statistics are based on the seminal work of Fellegi and Holt (1976). Applications of this methodology in practice have shown systematic differences between data that are edited manually and automatically, because human editors may perform complex edit operations. In this paper, a generalization of the Fellegi-Holt paradigm is proposed that can incorporate a large class of edit operations in a natural way. In addition, an algorithm is outlined that solves the resulting generalized error localization problem. It is hoped that this generalization may be used to increase the suitability of automatic editing in practice, and hence to improve the efficiency of data editing processes. Some first results on synthetic data are promising in this respect.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201400214089
    Description:

    This manuscript describes the use of multiple imputation to combine information from multiple surveys of the same underlying population. We use a newly developed method to generate synthetic populations nonparametrically using a finite population Bayesian bootstrap that automatically accounts for complex sample designs. We then analyze each synthetic population with standard complete-data software for simple random samples and obtain valid inference by combining the point and variance estimates using extensions of existing combining rules for synthetic data. We illustrate the approach by combining data from the 2006 National Health Interview Survey (NHIS) and the 2006 Medical Expenditure Panel Survey (MEPS).

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201400214091
    Description:

    Parametric fractional imputation (PFI), proposed by Kim (2011), is a tool for general purpose parameter estimation under missing data. We propose a fractional hot deck imputation (FHDI) which is more robust than PFI or multiple imputation. In the proposed method, the imputed values are chosen from the set of respondents and assigned proper fractional weights. The weights are then adjusted to meet certain calibration conditions, which makes the resulting FHDI estimator efficient. Two simulation studies are presented to compare the proposed method with existing methods.

    Release date: 2014-12-19

  • Articles and reports: 12-001-X201300111825
    Description:

    A considerable limitation of current methods for automatic data editing is that they treat all edits as hard constraints. That is to say, an edit failure is always attributed to an error in the data. In manual editing, however, subject-matter specialists also make extensive use of soft edits, i.e., constraints that identify (combinations of) values that are suspicious but not necessarily incorrect. The inability of automatic editing methods to handle soft edits partly explains why in practice many differences are found between manually edited and automatically edited data. The object of this article is to present a new formulation of the error localisation problem which can distinguish between hard and soft edits. Moreover, it is shown how this problem may be solved by an extension of the error localisation algorithm of De Waal and Quere (2003).

    Release date: 2013-06-28

  • Articles and reports: 11-522-X200800011010
    Description:

    The Survey of Employment, Payrolls and Hours (SEPH) is a monthly survey using two sources of data: a census of payroll deduction (PD7) forms (administrative data) and a survey of business establishments. This paper focuses on the processing of the administrative data, from the weekly receipt of data from the Canada Revenue Agency to the production of monthly estimates produced by SEPH.

    The edit and imputation methods used to process the administrative data have been revised in the last several years. The goals of this redesign were primarily to improve the data quality and to increase the consistency with another administrative data source (T4) which is a benchmark measure for Statistics Canada's System of National Accounts people. An additional goal was to ensure that the new process would be easier to understand and to modify, if needed. As a result, a new processing module was developed to edit and impute PD7 forms before their data is aggregated to the monthly level.

    This paper presents an overview of both the current and new processes, including a description of challenges that we faced during development. Improved quality is demonstrated both conceptually (by presenting examples of PD7 forms and their treatment under the old and new systems) and quantitatively (by comparison to T4 data).

    Release date: 2009-12-03

  • Articles and reports: 11-522-X200800011014
    Description:

    In many countries, improved quality of economic statistics is one of the most important goals of the 21st century. First and foremost, the quality of National Accounts is in focus, regarding both annual and quarterly accounts. To achieve this goal, data quality regarding the largest enterprises is of vital importance. To assure that the quality of data for the largest enterprises is good, coherence analysis is an important tool. Coherence means that data from different sources fit together and give a consistent view of the development within these enterprises. Working with coherence analysis in an efficient way is normally a work-intensive task consisting mainly of collecting data from different sources and comparing them in a structured manner. Over the last two years, Statistics Sweden has made great progress in improving the routines for coherence analysis. An IT tool that collects data for the largest enterprises from a large number of sources and presents it in a structured and logical matter has been built, and a systematic approach to analyse data for National Accounts on a quarterly basis has been developed. The paper describes the work in both these areas and gives an overview of the IT tool and the agreed routines.

    Release date: 2009-12-03

  • Articles and reports: 11-522-X20040018755
    Description:

    This paper reviews the robustness of methods dealing with response errors for rare populations. It also reviews problems with weighting scheme for these populations. It develops an asymptotic framework intended to deal with such problems.

    Release date: 2005-10-27

  • Articles and reports: 12-001-X20050018087
    Description:

    In Official Statistics, data editing process plays an important role in terms of timeliness, data accuracy, and survey costs. Techniques introduced to identify and eliminate errors from data are essentially required to consider all of these aspects simultaneously. Among others, a frequent and pervasive systematic error appearing in surveys collecting numerical data, is the unity measure error. It highly affects timeliness, data accuracy and costs of the editing and imputation phase. In this paper we propose a probabilistic formalisation of the problem based on finite mixture models. This setting allows us to deal with the problem in a multivariate context, and provides also a number of useful diagnostics for prioritising cases to be more deeply investigated through a clerical review. Prioritising units is important in order to increase data accuracy while avoiding waste of time due to the follow up of non-really critical units.

    Release date: 2005-07-21

  • Articles and reports: 11-522-X20030017708
    Description:

    This article provides an overview of the work to date using GST data at Statistics Canada as direct replacement in imputation or estimation or as a data certification tool.

    Release date: 2005-01-26
Reference (8)

Reference (8) ((8 results))

  • Surveys and statistical programs – Documentation: 71-543-G
    Description:

    The Guide to the Labour Force Survey contains a dictionary of concepts and definitions and covers topics such as survey methodology, data collection, data processing and data quality. It also contains information on products and services, sub-provincial geography descriptions as well as the survey questionnaire.

    Release date: 2020-04-09

  • Surveys and statistical programs – Documentation: 82-225-X
    Description:

    The compendium of Canadian Cancer Registry procedures manuals set out the rules for reporting cancer data to the CCR for all provincial and territorial cancer registries.

    Release date: 2008-01-18

  • Surveys and statistical programs – Documentation: 13-604-M2004045
    Description:

    How "good" are the National Tourism Indicators (NTI)? How can their quality be measured? This study looks to answer these questions by analysing the revisions to the NTI estimates for the period 1997 through 2001.

    Release date: 2004-10-25

  • Surveys and statistical programs – Documentation: 92-393-X
    Description:

    This report is a brief guide to users of census income data. It provides a general description of the various 2001 Census phases, from data collection, through processing for non-response, to dissemination. Descriptions of, and summary data on, the changes to income data that occurred during the processing stages are given. Comparative data from national accounts and tax data sources at a highly aggregated level are also presented to put the quality of the 2001 Census income data into perspective. For users wishing to compare census income data over time, changes in income content and universe coverage over the years are explained. Finally, a complete description of all census products containing income data is also supplied.

    Release date: 2004-09-16

  • Surveys and statistical programs – Documentation: 92-390-X
    Description:

    This report includes a definition of the 2001 place of work concept and the place of work geography, standard text on data collection and coverage (including data collection methods, special coverage studies, sampling and weighting, edit and follow-up, coverage and content considerations). Both standard and subject-matter specific text pieces are also included for data assimilation (automated as well as interactive coding), edit and imputation and data evaluation. Finally, this technical report includes a section on historical comparability.

    Release date: 2004-08-26

  • Surveys and statistical programs – Documentation: 92-388-X
    Description:

    This report contains basic conceptual and data quality information to help users interpret and make use of census occupation data. It gives an overview of the collection, coding (to the 2001 National Occupational Classification), edit and imputation of the occupation data from the 2001 Census. The report describes procedural changes between the 2001 and earlier censuses, and provides an analysis of the quality level of the 2001 Census occupation data. Finally, it details the revision of the 1991 Standard Occupational Classification used in the 1991 and 1996 Censuses to the 2001 National Occupational Classification for Statistics used in 2001. The historical comparability of data coded to the two classifications is discussed. Appendices to the report include a table showing historical data for the 1991, 1996 and 2001 Censuses.

    Release date: 2004-07-15

  • Surveys and statistical programs – Documentation: 75F0002M2000010
    Description:

    This report explains the concept of income and provides definitions of the various sources of income and derived income variables. It also documents the various aspects of the census that can have an impact on census income estimates.

    Release date: 2000-07-26

  • Surveys and statistical programs – Documentation: 75F0002M1998012
    Description:

    This paper looks at the work of the task force responsible for reviewing Statistics Canada's household and family income statistics programs, and at one of associated program changes, namely, the integration of two major sources of annual income data in Canada, the Survey of Consumer Finances (SCF) and the Survey of Labour and Income Dynamics (SLID).

    Release date: 1998-12-30
Date modified: