Statistical techniques

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

3 facets displayed. 0 facets selected.

Survey or statistical program

1 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (149)

All (149) (0 to 10 of 149 results)

  • Stats in brief: 89-20-00062021001
    Description:

    As Canada's national statistical organization, Statistics Canada is committed to sharing our knowledge and expertise to help all Canadians develop their data literacy skills. The goal is to provide learners with information on the basic concepts and skills with regard to a range of data literacy topics.

    The training is aimed at those who are new to data or those who have some experience with data but may need a refresher or want to expand their knowledge. We invite you to check out our Learning catalogue to learn more about our offerings including a great collection of short videos. Be sure to check back regularly as we will be continuing to release new training.

    Release date: 2021-05-03

  • Stats in brief: 89-20-00062021003
    Description:

    In this video, viewers will learn the differences between three types of measure: proportions, ratios, and rates. In addition, viewers by the end of this video will be able to determine how each measure is calculated and when it is best to use one measure rather than the other.

    Release date: 2021-05-03

  • Stats in brief: 89-20-00062021004
    Description:

    One important distinction we will make in this video is the differences between Data Science, Artificial Intelligence and Machine Learning. You'll learn what machine learning can be used for, how it works, and some different methods for doing it. And you'll also learn how to build and use machine learning processes responsibly.

    This video is recommended for those who already have some familiarity with the concepts and techniques associated with computer programming and using algorithms to analyze data.

    Release date: 2021-05-03

  • Stats in brief: 89-20-00062021005
    Description:

    By the end of this video, you should have a deeper understanding of the fundamentals of using data to tell a story. We will go over some the principle components of storytelling including the data, the narrative and visualization, and discuss how they can be used to construct concise, informative and interesting messages your audience can trust. And then, you will learn the importance of a well planned data story, which includes learning who your audience will be, what they should know and how to best deliver that information.

    Release date: 2021-05-03

  • Stats in brief: 89-20-00062021006
    Description:

    In this video, you'll learn what we can do to data itself, to make it easier to work with. That's the role of data standards. And you'll learn what extra information we can provide to make data easier to use. That's the role of metadata.

    Release date: 2021-05-03

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2021-03-16

  • Articles and reports: 12-001-X202000200005
    Description:

    In surveys, text answers from open-ended questions are important because they allow respondents to provide more information without constraints. When classifying open-ended questions automatically using supervised learning, often the accuracy is not high enough. Alternatively, a semi-automated classification strategy can be considered: answers in the easy-to-classify group are classified automatically, answers in the hard-to-classify group are classified manually. This paper presents a semi-automated classification method for multi-label open-ended questions where text answers may be associated with multiple classes simultaneously. The proposed method effectively combines multiple probabilistic classifier chains while avoiding prohibitive computational costs. The performance evaluation on three different data sets demonstrates the effectiveness of the proposed method.

    Release date: 2020-12-15

  • Articles and reports: 11-637-X202000100001
    Description:

    As the first goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to end poverty in all its forms everywhere by 2030. This fact sheet provides an overview of indicators underlying the first Sustainable Development Goal in support of eradicating poverty, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2020-10-20

  • Articles and reports: 11-637-X202000100002
    Description:

    As the second goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to end hunger, achieve food security and improved nutrition, and promote sustainable agriculture by 2030. This fact sheet provides an overview of indicators underlying the second Sustainable Development Goal in support of ending hunger, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2020-10-20

  • Articles and reports: 11-637-X202000100003
    Description:

    As the third goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to ensure healthy lives and promote well-being for all at all ages by 2030. This fact sheet provides an overview of indicators underlying the third Sustainable Development Goal in support of Good Health and Well-being, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2020-10-20
Data (1)

Data (1) ((1 result))

  • Table: 11-10-0074-01
    Geography: Census tract
    Frequency: Occasional
    Description:

    The divergence index (D-index) describes the degree that families with different income levels are mixing together in neighbourhoods. It compares neighbourhood (census tract, CT) discrete income distributions to a base distribution, which is the income quintiles of the neighbourhood’s census metropolitan area (CMA).

    Release date: 2020-06-22
Analysis (140)

Analysis (140) (0 to 10 of 140 results)

  • Stats in brief: 89-20-00062021001
    Description:

    As Canada's national statistical organization, Statistics Canada is committed to sharing our knowledge and expertise to help all Canadians develop their data literacy skills. The goal is to provide learners with information on the basic concepts and skills with regard to a range of data literacy topics.

    The training is aimed at those who are new to data or those who have some experience with data but may need a refresher or want to expand their knowledge. We invite you to check out our Learning catalogue to learn more about our offerings including a great collection of short videos. Be sure to check back regularly as we will be continuing to release new training.

    Release date: 2021-05-03

  • Stats in brief: 89-20-00062021003
    Description:

    In this video, viewers will learn the differences between three types of measure: proportions, ratios, and rates. In addition, viewers by the end of this video will be able to determine how each measure is calculated and when it is best to use one measure rather than the other.

    Release date: 2021-05-03

  • Stats in brief: 89-20-00062021004
    Description:

    One important distinction we will make in this video is the differences between Data Science, Artificial Intelligence and Machine Learning. You'll learn what machine learning can be used for, how it works, and some different methods for doing it. And you'll also learn how to build and use machine learning processes responsibly.

    This video is recommended for those who already have some familiarity with the concepts and techniques associated with computer programming and using algorithms to analyze data.

    Release date: 2021-05-03

  • Stats in brief: 89-20-00062021005
    Description:

    By the end of this video, you should have a deeper understanding of the fundamentals of using data to tell a story. We will go over some the principle components of storytelling including the data, the narrative and visualization, and discuss how they can be used to construct concise, informative and interesting messages your audience can trust. And then, you will learn the importance of a well planned data story, which includes learning who your audience will be, what they should know and how to best deliver that information.

    Release date: 2021-05-03

  • Stats in brief: 89-20-00062021006
    Description:

    In this video, you'll learn what we can do to data itself, to make it easier to work with. That's the role of data standards. And you'll learn what extra information we can provide to make data easier to use. That's the role of metadata.

    Release date: 2021-05-03

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2021-03-16

  • Articles and reports: 12-001-X202000200005
    Description:

    In surveys, text answers from open-ended questions are important because they allow respondents to provide more information without constraints. When classifying open-ended questions automatically using supervised learning, often the accuracy is not high enough. Alternatively, a semi-automated classification strategy can be considered: answers in the easy-to-classify group are classified automatically, answers in the hard-to-classify group are classified manually. This paper presents a semi-automated classification method for multi-label open-ended questions where text answers may be associated with multiple classes simultaneously. The proposed method effectively combines multiple probabilistic classifier chains while avoiding prohibitive computational costs. The performance evaluation on three different data sets demonstrates the effectiveness of the proposed method.

    Release date: 2020-12-15

  • Articles and reports: 11-637-X202000100001
    Description:

    As the first goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to end poverty in all its forms everywhere by 2030. This fact sheet provides an overview of indicators underlying the first Sustainable Development Goal in support of eradicating poverty, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2020-10-20

  • Articles and reports: 11-637-X202000100002
    Description:

    As the second goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to end hunger, achieve food security and improved nutrition, and promote sustainable agriculture by 2030. This fact sheet provides an overview of indicators underlying the second Sustainable Development Goal in support of ending hunger, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2020-10-20

  • Articles and reports: 11-637-X202000100003
    Description:

    As the third goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to ensure healthy lives and promote well-being for all at all ages by 2030. This fact sheet provides an overview of indicators underlying the third Sustainable Development Goal in support of Good Health and Well-being, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2020-10-20
Reference (11)

Reference (11) (0 to 10 of 11 results)

  • Surveys and statistical programs – Documentation: 84-538-X
    Geography: Canada
    Description:

    This document presents the methodology underlying the production of the life tables for Canada, provinces and territories, from reference period 1980/1982 and onward.

    Release date: 2019-05-30

  • Surveys and statistical programs – Documentation: 82-225-X200701010508
    Description:

    The Record Linkage Overview describes the process used in annual internal record linkage of the Canadian Cancer Registry. The steps include: preparation; pre-processing; record linkage; post-processing; analysis and resolution; resolution entry; and, resolution processing.

    Release date: 2008-01-18

  • Surveys and statistical programs – Documentation: 11-522-X20050019476
    Description:

    The paper will show how, using data published by Statistics Canada and available from member libraries of the CREPUQ, a linkage approach using postal codes makes it possible to link the data from the outcomes file to a set of contextual variables. These variables could then contribute to producing, on an exploratory basis, a better index to explain the varied outcomes of students from schools. In terms of the impact, the proposed index could show more effectively the limitations of ranking students and schools when this information is not given sufficient weight.

    Release date: 2007-03-02

  • Surveys and statistical programs – Documentation: 68-514-X
    Description:

    Statistics Canada's approach to gathering and disseminating economic data has developed over several decades into a highly integrated system for collection and estimation that feeds the framework of the Canadian System of National Accounts.

    The key to this approach was creation of the Unified Enterprise Survey, the goal of which was to improve the consistency, coherence, breadth and depth of business survey data.

    The UES did so by bringing many of Statistics Canada's individual annual business surveys under a common framework. This framework included a single survey frame, a sample design framework, conceptual harmonization of survey content, means of using relevant administrative data, common data collection, processing and analysis tools, and a common data warehouse.

    Release date: 2006-11-20

  • Surveys and statistical programs – Documentation: 89-612-X
    Description:

    This paper describes the structure and linkage of two databases: the Longitudinal Administrative Databank (LAD), and the Longitudinal Immigration Database (IMDB). The combined data associate landed immigrant taxfilers on the LAD with their key characteristics upon immigration. The paper highlights how the combined information, referred to here as the LAD_IMDB, enhances and complements the existing separate databases. The paper compares the full IMDB file with the sample of immigrants to assess the representativeness of the sample file.

    Release date: 2004-01-05

  • Surveys and statistical programs – Documentation: 12-001-X20030016609
    Description:

    To automate the data editing process the so-called error localization problem, i.e., the problem of identifying the erroneous fields in an erroneous record, has to be solved. A paradigm for identifying errors automatically has been proposed by Fellegi and Holt in 1976. Over the years their paradigm has been generalized to: the data of a record should be made to satisfy all edits by changing the values of the variables with the smallest possible sum of reliability weights. A reliability weight of a variable is a non-negative number that expresses how reliable one considers the value of this variable to be. Given this paradigm the resulting mathematical problem has to be solved. In the present paper we examine how vertex generation methods can be used to solve this mathematical problem in mixed data, i.e., a combination of categorical (discrete) and numerical (continuous) data. The main aim of this paper is not to present new results, but rather to combine the ideas of several other papers in order to give a "complete", self-contained description of the use of vertex generation methods to solve the error localization problem in mixed data. In our exposition we will focus on describing how methods for numerical data can be adapted to mixed data.

    Release date: 2003-07-31

  • Surveys and statistical programs – Documentation: 81-595-M2003005
    Geography: Canada
    Description:

    This paper develops technical procedures that may enable ministries of education to link provincial tests with national and international tests in order to compare standards and report results on a common scale.

    Release date: 2003-05-29

  • Surveys and statistical programs – Documentation: 85-602-X
    Description:

    The purpose of this report is to provide an overview of existing methods and techniques making use of personal identifiers to support record linkage. Record linkage can be loosely defined as a methodology for manipulating and / or transforming personal identifiers from individual data records from one or more operational databases and subsequently attempting to match these personal identifiers to create a composite record about an individual. Record linkage is not intended to uniquely identify individuals for operational purposes; however, it does provide probabilistic matches of varying degrees of reliability for use in statistical reporting. Techniques employed in record linkage may also be of use for investigative purposes to help narrow the field of search against existing databases when some form of personal identification information exists.

    Release date: 2000-12-05

  • Surveys and statistical programs – Documentation: 12-001-X19980013910
    Description:

    Let A be a population domain of interest and assume that the elements of A cannot be identified on the sampling frame and the number of elements in A is not known. Further assume that a sample of fixed size (say n) is selected from the entire frame and the resulting domain sample size (say n_A) is random. The problem addressed is the construction of a confidence interval for a domain parameter such as the domain aggregate T_A = \sum_{i \in A} x_i. The usual approach to this problem is to redefine x_i, by setting x_i = 0 if i \notin A. Thus, the construction of a confidence interval for the domain total is recast as the construction of a confidence interval for a population total which can be addressed (at least asymptotically in n) by normal theory. As an alternative, we condition on n_A and construct confidence intervals which have approximately nominal coverage under certain assumptions regarding the domain population. We evaluate the new approach empirically using artificial populations and data from the Bureau of Labor Statistics (BLS) Occupational Compensation Survey.

    Release date: 1998-07-31

  • Surveys and statistical programs – Documentation: 12-001-X19970023613
    Description:

    Many policy decisions are best made when there is supporting statistical evidence based on analyses of appropriate microdata. Sometimes all the needed data exist but reside in multiple files for which common identifiers (e.g., SIN's, EIN's, or SSN's) are unavailable. This paper demonstrates a methodology for analyzing two such files: (1) when there is common nonunique information subject to significant error and (2) when each source file contains uncommon quantitative data that can be connected with appropriate models. Such a situation might arise with files of businesses only having difficult-to-use name and address information in common, one file with the energy products consumed by the companies, and the other file containing the types and amounts of goods they produce. Another situation might arise with files on individuals in which one file has earnings data, another information about health-related expenses, and a third information about receipts of supplemental payments. The goal of the methodology presented is to produce valid statistical analyses; appropriate microdata files may or may not be produced.

    Release date: 1998-03-12
Date modified: