Statistical techniques

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

3 facets displayed. 0 facets selected.

Survey or statistical program

48 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (188)

All (188) (0 to 10 of 188 results)

  • Articles and reports: 75-005-M2024004
    Description: This article provides information about population totals in the Labour Force Survey (LFS), including details on who is included in the survey target population, and a description of the methodology used to produce monthly population totals in the LFS. The note also provides guidance on how to interpret population statistics in the LFS, and discusses the extent to which the LFS can be used to examine disaggregated labour market indicators for new immigrants and non-permanent residents.
    Release date: 2024-09-20

  • Journals and periodicals: 11-633-X
    Description: Papers in this series provide background discussions of the methods used to develop data for economic, health, and social analytical studies at Statistics Canada. They are intended to provide readers with information on the statistical methods, standards and definitions used to develop databases for research purposes. All papers in this series have undergone peer and institutional review to ensure that they conform to Statistics Canada's mandate and adhere to generally accepted standards of good professional practice.
    Release date: 2024-09-11

  • Articles and reports: 11-522-X202200100008
    Description: The publication of more disaggregated data can increase transparency and provide important information on underrepresented groups. Developing more readily available access options increases the amount of information available to and produced by researchers. Increasing the breadth and depth of the information released allows for a better representation of the Canadian population, but also puts a greater responsibility on Statistics Canada to do this in a way that preserves confidentiality, and thus it is helpful to develop tools which allow Statistics Canada to quantify the risk from the additional data granularity. In an effort to evaluate the risk of a database reconstruction attack on Statistics Canada’s published Census data, this investigation follows the strategy of the US Census Bureau, who outlined a method to use a Boolean satisfiability (SAT) solver to reconstruct individual attributes of residents of a hypothetical US Census block, based just on a table of summary statistics. The technique is expanded to attempt to reconstruct a small fraction of Statistics Canada’s Census microdata. This paper will discuss the findings of the investigation, the challenges involved in mounting a reconstruction attack, and the effect of an existing confidentiality measure in mitigating these attacks. Furthermore, the existing strategy is compared to other potential methods used to protect data – in particular, releasing tabular data perturbed by some random mechanism, such as those suggested by differential privacy.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100014
    Description: Ethnic minorities are often underrepresented in survey research, due to the challenges many researchers face in including these populations. While some studies discuss several methods in comparison, few have directly compared these methods empirically, leaving researchers seeking to include ethnic minorities in their studies unsure of their best options. In this article, I briefly review the methodological and ethical reasons for increasing ethnic minority representation in social science research, as well as challenges of doing so. I then present findings from ten studies which empirically compare methods of sampling and/or recruiting ethnic minority individuals. Finally, I discuss some implications for future research.
    Release date: 2024-03-25

  • Articles and reports: 12-001-X202300200005
    Description: Population undercoverage is one of the main hurdles faced by statistical analysis with non-probability survey samples. We discuss two typical scenarios of undercoverage, namely, stochastic undercoverage and deterministic undercoverage. We argue that existing estimation methods under the positivity assumption on the propensity scores (i.e., the participation probabilities) can be directly applied to handle the scenario of stochastic undercoverage. We explore strategies for mitigating biases in estimating the mean of the target population under deterministic undercoverage. In particular, we examine a split population approach based on a convex hull formulation, and construct estimators with reduced biases. A doubly robust estimator can be constructed if a followup subsample of the reference probability survey with measurements on the study variable becomes feasible. Performances of six competing estimators are investigated through a simulation study and issues which require further investigation are briefly discussed.
    Release date: 2024-01-03

  • Articles and reports: 11-633-X2023003
    Description: This paper spans the academic work and estimation strategies used in national statistics offices. It addresses the issue of producing fine, grid-level geography estimates for Canada by exploring the measurement of subprovincial and subterritorial gross domestic product using Yukon as a test case.
    Release date: 2023-12-15

  • Surveys and statistical programs – Documentation: 84-538-X
    Geography: Canada
    Description: This electronic publication presents the methodology underlying the production of the life tables for Canada, provinces and territories.
    Release date: 2023-08-28

  • Articles and reports: 12-001-X202300100001
    Description: Recent work in survey domain estimation allows for estimation of population domain means under a priori assumptions expressed in terms of linear inequality constraints. For example, it might be known that the population means are non-decreasing along ordered domains. Imposing the constraints has been shown to provide estimators with smaller variance and tighter confidence intervals. In this paper we consider a formal test of the null hypothesis that all the constraints are binding, versus the alternative that at least one constraint is non-binding. The test of constant versus increasing domain means is a special case. The power of the test is substantially better than the test with the same null hypothesis and an unconstrained alternative. The new test is used with data from the National Survey of College Graduates, to show that salaries are positively related to the subject’s father’s educational level, across fields of study and over several years of cohorts.
    Release date: 2023-06-30

  • Articles and reports: 12-001-X202300100002
    Description: We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a “working” reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The actual implementation is based on a novel application of the information projection and model calibration weighting. The proposed method is particularly attractive for combining information from several sources with different missing patterns. The proposed method is applied to a real data example combining survey data from Korean National Health and Nutrition Examination Survey and big data from National Health Insurance Sharing Service in Korea.
    Release date: 2023-06-30

  • Articles and reports: 11-637-X202200100007
    Description:

    As the seventh goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to ensure access to affordable, reliable, sustainable and modern energy for all by 2030. This 2022 infographic provides an overview of indicators underlying the seventh Sustainable Development Goal in support of affordable and clean energy, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2022-12-13
Data (1)

Data (1) ((1 result))

  • Table: 11-10-0074-01
    Geography: Census tract
    Frequency: Occasional
    Description:

    The divergence index (D-index) describes the degree that families with different income levels are mixing together in neighbourhoods. It compares neighbourhood (census tract, CT) discrete income distributions to a base distribution, which is the income quintiles of the neighbourhood’s census metropolitan area (CMA).

    Release date: 2020-06-22
Analysis (180)

Analysis (180) (20 to 30 of 180 results)

  • Articles and reports: 11-637-X202200100003
    Description:

    As the third goal outlined in the 2030 Agenda for Sustainable Development, Canada and other UN member states have committed to ensure healthy lives and promote well-being for all at all ages by 2030. This 2022 infographic provides an overview of indicators underlying the third Sustainable Development Goal in support of Good Health and Well-being, and the statistics and data sources used to monitor and report on this goal in Canada.

    Release date: 2022-06-23

  • Articles and reports: 12-001-X202200100007
    Description:

    By record linkage one joins records residing in separate files which are believed to be related to the same entity. In this paper we approach record linkage as a classification problem, and adapt the maximum entropy classification method in machine learning to record linkage, both in the supervised and unsupervised settings of machine learning. The set of links will be chosen according to the associated uncertainty. On the one hand, our framework overcomes some persistent theoretical flaws of the classical approach pioneered by Fellegi and Sunter (1969); on the other hand, the proposed algorithm is fully automatic, unlike the classical approach that generally requires clerical review to resolve the undecided cases.

    Release date: 2022-06-21

  • Stats in brief: 89-20-00062022001
    Description:

    Gathering, exploring, analyzing and interpreting data are essential steps in producing information that benefits society, the economy and the environment. To properly conduct these processes, data ethics ethics must be upheld in order to ensure the appropriate use of data.

    Release date: 2022-05-24

  • Stats in brief: 89-20-00062022002
    Description:

    This video will break down what it means to be FAIR in terms of data and metadata, and how each pillar of FAIR serves to guide data users and producers alike, as they navigate their way through the data journey, in order to gain maximum, long term value.

    Release date: 2022-05-24

  • Stats in brief: 89-20-00062022003
    Description:

    By the end of this video you will understand what confidence intervals are, why we use them, and what factors have an impact on them.

    Release date: 2022-05-24

  • Articles and reports: 12-001-X202100200002
    Description:

    When linking massive data sets, blocking is used to select a manageable subset of record pairs at the expense of losing a few matched pairs. This loss is an important component of the overall linkage error, because blocking decisions are made early on in the linkage process, with no way to revise them in subsequent steps. Yet, measuring this contribution is still a major challenge because of the need to model all the pairs in the Cartesian product of the sources, not just those satisfying the blocking criteria. Unfortunately, previous error models are of little use because they typically do not meet this requirement. This paper addresses the issue with a new finite mixture model, which dispenses with clerical reviews, training data, or the assumption that the linkage variables are conditionally independent. It applies when applying a standard blocking procedure for the linkage of a file to a register or a census with complete coverage, where both sources are free of duplicate records.

    Release date: 2022-01-06

  • Stats in brief: 11-001-X202134332266
    Description: Release published in The Daily – Statistics Canada’s official release bulletin
    Release date: 2021-12-09

  • Articles and reports: 11-522-X202100100010
    Description:

    As part of processing for the 2021 Canadian Census, the write-in responses to 31 census questions must be coded. Up until, and including, 2016, this was a three stage process, including an “interactive (human) coding” step as the second stage. This human coding step is both lengthy and expensive, spanning many months and requiring the hiring and training of a large number of temporary employees. With this in mind, for 2021, this stage was either augmented with or replaced entirely by machine learning models using the "fastText" algorithm. This presentation will discuss the implementation of this algorithm and the challenges and decisions taken along the way.

    Key Words: Natural Language Processing, Machine Learning, fastText, Coding

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100011
    Description: The ways in which AI may affect the world of official statistics are manifold and Statistics Netherlands (CBS) is actively exploring how it can use AI within its societal role. The paper describes a number of AI-related areas where CBS is currently active: use of AI for its own statistics production and statistical R&D, the development of a national AI monitor, the support of other government bodies with expertise on fair data and fair algorithms, data sharing under safe and secure conditions, and engaging in AI-related collaborations.

    Key Words: Artificial Intelligence; Official Statistics; Data Sharing; Fair Algorithms; AI monitoring; Collaboration.

    Release date: 2021-11-05

  • Articles and reports: 11-522-X202100100012
    Description: The modernization of price statistics by National Statistical Offices (NSO) such as Statistics Canada focuses on the adoption of alternative data sources that include the near-universe of all products sold in the country, a scale that requires machine learning classification of the data. The process of evaluating classifiers to select appropriate ones for production, as well as monitoring classifiers once in production, needs to be based on robust metrics to measure misclassification. As commonly utilized metrics, such as the Fß-score may not take into account key aspects applicable to prices statistics in all cases, such as unequal importance of categories, a careful consideration of the metric space is necessary to select appropriate methods to evaluate classifiers. This working paper provides insight on the metric space applicable to price statistics and proposes an operational framework to evaluate and monitor classifiers, focusing specifically on the needs of the Canadian Consumer Prices Index and demonstrating discussed metrics using a publicly available dataset.

    Key Words: Consumer price index; supervised classification; evaluation metrics; taxonomy

    Release date: 2021-11-05
Reference (7)

Reference (7) ((7 results))

  • Surveys and statistical programs – Documentation: 84-538-X
    Geography: Canada
    Description: This electronic publication presents the methodology underlying the production of the life tables for Canada, provinces and territories.
    Release date: 2023-08-28

  • Surveys and statistical programs – Documentation: 82-225-X200701010508
    Description:

    The Record Linkage Overview describes the process used in annual internal record linkage of the Canadian Cancer Registry. The steps include: preparation; pre-processing; record linkage; post-processing; analysis and resolution; resolution entry; and, resolution processing.

    Release date: 2008-01-18

  • Surveys and statistical programs – Documentation: 11-522-X20050019476
    Description:

    The paper will show how, using data published by Statistics Canada and available from member libraries of the CREPUQ, a linkage approach using postal codes makes it possible to link the data from the outcomes file to a set of contextual variables. These variables could then contribute to producing, on an exploratory basis, a better index to explain the varied outcomes of students from schools. In terms of the impact, the proposed index could show more effectively the limitations of ranking students and schools when this information is not given sufficient weight.

    Release date: 2007-03-02

  • Surveys and statistical programs – Documentation: 68-514-X
    Description:

    Statistics Canada's approach to gathering and disseminating economic data has developed over several decades into a highly integrated system for collection and estimation that feeds the framework of the Canadian System of National Accounts.

    The key to this approach was creation of the Unified Enterprise Survey, the goal of which was to improve the consistency, coherence, breadth and depth of business survey data.

    The UES did so by bringing many of Statistics Canada's individual annual business surveys under a common framework. This framework included a single survey frame, a sample design framework, conceptual harmonization of survey content, means of using relevant administrative data, common data collection, processing and analysis tools, and a common data warehouse.

    Release date: 2006-11-20

  • Surveys and statistical programs – Documentation: 89-612-X
    Description:

    This paper describes the structure and linkage of two databases: the Longitudinal Administrative Databank (LAD), and the Longitudinal Immigration Database (IMDB). The combined data associate landed immigrant taxfilers on the LAD with their key characteristics upon immigration. The paper highlights how the combined information, referred to here as the LAD_IMDB, enhances and complements the existing separate databases. The paper compares the full IMDB file with the sample of immigrants to assess the representativeness of the sample file.

    Release date: 2004-01-05

  • Surveys and statistical programs – Documentation: 81-595-M2003005
    Geography: Canada
    Description:

    This paper develops technical procedures that may enable ministries of education to link provincial tests with national and international tests in order to compare standards and report results on a common scale.

    Release date: 2003-05-29

  • Surveys and statistical programs – Documentation: 85-602-X
    Description:

    The purpose of this report is to provide an overview of existing methods and techniques making use of personal identifiers to support record linkage. Record linkage can be loosely defined as a methodology for manipulating and / or transforming personal identifiers from individual data records from one or more operational databases and subsequently attempting to match these personal identifiers to create a composite record about an individual. Record linkage is not intended to uniquely identify individuals for operational purposes; however, it does provide probabilistic matches of varying degrees of reliability for use in statistical reporting. Techniques employed in record linkage may also be of use for investigative purposes to help narrow the field of search against existing databases when some form of personal identification information exists.

    Release date: 2000-12-05
Date modified: