Keyword search

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

3 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (94)

All (94) (0 to 10 of 94 results)

  • Articles and reports: 11-522-X202200100001
    Description: Record linkage aims at identifying record pairs related to the same unit and observed in two different data sets, say A and B. Fellegi and Sunter (1969) suggest each record pair is tested whether generated from the set of matched or unmatched pairs. The decision function consists of the ratio between m(y) and u(y),probabilities of observing a comparison y of a set of k>3 key identifying variables in a record pair under the assumptions that the pair is a match or a non-match, respectively. These parameters are usually estimated by means of the EM algorithm using as data the comparisons on all the pairs of the Cartesian product ?=A×B. These observations (on the comparisons and on the pairs status as match or non-match) are assumed as generated independently of other pairs, assumption characterizing most of the literature on record linkage and implemented in software tools (e.g. RELAIS, Cibella et al. 2012). On the contrary, comparisons y and matching status in ? are deterministically dependent. As a result, estimates on m(y) and u(y) based on the EM algorithm are usually bad. This fact jeopardizes the effective application of the Fellegi-Sunter method, as well as automatic computation of quality measures and possibility to apply efficient methods for model estimation on linked data (e.g. regression functions), as in Chambers et al. (2015). We propose to explore ? by a set of samples, each one drawn so to preserve independence of comparisons among the selected record pairs. Simulations are encouraging.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100002
    Description: The authors used the Splink probabilistic linkage package developed by the UK Ministry of Justice, to link census data from England and Wales to itself to find duplicate census responses. A large gold standard of confirmed census duplicates was available meaning that the results of the Splink implementation could be quality assured. This paper describes the implementation and features of Splink, gives details of the settings and parameters that we used to tune Splink for our particular project, and gives the results that we obtained.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100012
    Description: At Statistics Netherlands (SN) for some economic sectors two partly-independent intra-annual turnover index series are available: a monthly series based on survey data and a quarterly series based on value added tax data for the smaller units and re-used survey data for the other units. SN aims to benchmark the monthly turnover index series to the quarterly census data on a quarterly basis. This cannot currently be done because the tax data has a different quarterly pattern: the turnover is relatively large in the fourth quarter of the year and smaller in the first quarter. With the current study we aim to describe this deviating quarterly pattern at micro level. In the past we developed a mixture model using absolute turnover levels that could explain part of the quarterly patterns. Because the absolute turnover levels differ between the two series, in the current study we use a model based on relative quarterly turnover levels within a year.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100019
    Description: The purpose of this article is to compare the linkage results for individuals from French tax sources with those of the 2019 Enquête Annuelle de Recensement (EAR), obtained through different methods. Such a comparison will decide whether the Répertoires Statistiques d'Individus et de Logements (Résil) program should be equipped with a probabilistic matching tool for its administrative source identification and matching engine.
    Release date: 2024-03-25

  • Articles and reports: 11-522-X202200100020
    Description: The reconciliation of 2021 census dwellings with the new Statistical Building Register (SBgR) presented linkage challenges. The Census of Population collected information from various dwelling types. For a large proportion of the population, mailing addresses were at the centre: they were used for reaching out to people and collected as contact info. In parallel, the register environment has been evolving. The agency is transitioning from the Address Register (AR) to the SBgR holding both mailing and location addresses, while also covering non-residential buildings. The reconciliation was conducted using a combination of systems, notably the new Register Matching Engine (RME) for difficult cases. The RME holds an interesting range of sophisticated string comparators. A deterministic linkage approach was used, while incorporating some data knowledge like the entropy. Through metadata, the matching expert could also reduce the amounts of false positives and false negatives.
    Release date: 2024-03-25

  • Articles and reports: 91F0015M2024002
    Description: This paper examines the emigration of immigrants using the Longitudinal Immigration Database (IMDB). An indirect definition of emigration is proposed that leverages the information available in the IMDB. This study found that emigration of immigrants is a significant phenomenon. Certain characteristics of immigrants, such as having children, admission category and country of birth, have a strong correlation with emigration.
    Release date: 2024-02-02

  • Articles and reports: 91F0015M2023001
    Description: Using record linkage, this article compares marital status as identified in the 2015 T1 tax data to what was provided in the 2016 Census using record linkage.
    Release date: 2023-07-11

  • Articles and reports: 12-001-X202200100007
    Description:

    By record linkage one joins records residing in separate files which are believed to be related to the same entity. In this paper we approach record linkage as a classification problem, and adapt the maximum entropy classification method in machine learning to record linkage, both in the supervised and unsupervised settings of machine learning. The set of links will be chosen according to the associated uncertainty. On the one hand, our framework overcomes some persistent theoretical flaws of the classical approach pioneered by Fellegi and Sunter (1969); on the other hand, the proposed algorithm is fully automatic, unlike the classical approach that generally requires clerical review to resolve the undecided cases.

    Release date: 2022-06-21

  • Articles and reports: 11-522-X202100100006
    Description:

    In the context of its "admin-first" paradigm, Statistics Canada is prioritizing the use of non-survey sources to produce official statistics. This paradigm critically relies on non-survey sources that may have a nearly perfect coverage of some target populations, including administrative files or big data sources. Yet, this coverage must be measured, e.g., by applying the capture-recapture method, where they are compared to other sources with good coverage of the same populations, including a census. However, this is a challenging exercise in the presence of linkage errors, which arise inevitably when the linkage is based on quasi-identifiers, as is typically the case. To address the issue, a new methodology is described where the capture-recapture method is enhanced with a new error model that is based on the number of links adjacent to a given record. It is applied in an experiment with public census data.

    Key Words: dual system estimation, data matching, record linkage, quality, data integration, big data.

    Release date: 2021-10-22

  • Surveys and statistical programs – Documentation: 12-539-X
    Description:

    This document brings together guidelines and checklists on many issues that need to be considered in the pursuit of quality objectives in the execution of statistical activities. Its focus is on how to assure quality through effective and appropriate design or redesign of a statistical project or program from inception through to data evaluation, dissemination and documentation. These guidelines draw on the collective knowledge and experience of many Statistics Canada employees. It is expected that Quality Guidelines will be useful to staff engaged in the planning and design of surveys and other statistical projects, as well as to those who evaluate and analyze the outputs of these projects.

    Release date: 2019-12-04
Data (2)

Data (2) ((2 results))

  • Table: 95F0303X
    Description:

    This product presents selected 2001 and historical data from the Census of Agriculture - Census of Population Linkage database. The data are available at the Canada and province levels for free. The data variables include: age; sex; marital status; mother tongue; highest level of schooling; net farm income; as well as farm population counts and income profiles for census farm families and households.

    (No linkage databases were created for the 1966 and 1976 Census years, so historical comparisons are not possible for those years.)

    Release date: 2003-12-02

  • Table: 16-200-X
    Description:

    Part of Statistics Canada's Econnections: linking the environment and the economy statistical series, this product consists of a printed publication combined with a CD-ROM. The product offers summary indicators plus detailed statistics that quantify the relationship between economic activity and the environment. Information is presented for issues ranging from greenhouse gas emissions, water and energy use, to natural resource wealth, environmental expenditures and beyond. The printed publication provides convenient reference to the summary indicators, including analysis of important trends, while the CD-ROM offers straightforward access to dozens of detailed statistical tables that underlie the indicators. An electronic version of the printed publication is included on the CD-ROM and each indicator in the publication is hypertext linked to a group of related statistical tables, allowing the user to easily select detailed statistics for viewing in association with any given indicator. Simple analysis of the statistics can be done directly within the CD-ROM's software. For those who carry out more complex analysis, downloading of data from the CD-ROM in standard spreadsheet format is easily accomplished.

    Release date: 2001-02-23
Analysis (73)

Analysis (73) (10 to 20 of 73 results)

  • Articles and reports: 82-003-X201800800001
    Description:

    The objective of this study is to report the population rate of surgical treatment of incident primary female breast tumours diagnosed from 2010 to 2012 overall, and by disease stage in Canada (excluding Quebec). This study uses newly linked Canadian Cancer Registry and hospital discharge data, created in the Canadian Cancer Treatment Linkage Project by Statistics Canada in 2016.

    Release date: 2018-08-15

  • Articles and reports: 11-633-X2018014
    Description:

    The Canadian Mortality Database (CMDB) is an administrative database that collects information on cause of death from all provincial and territorial vital statistics registries in Canada. The CMDB lacks subpopulation identifiers to examine mortality rates and disparities among groups such as First Nations, Métis, Inuit and members of visible minority groups. Linkage between the CMDB and the Census of Population is an approach to circumvent this limitation. This report describes a linkage between the CMDB (2006 to 2011) and the 2006 Census of Population, which was carried out using hierarchical deterministic exact matching, with a focus on methodology and validation.

    Release date: 2018-02-14

  • Articles and reports: 11-633-X2017006
    Description:

    This paper describes a method of imputing missing postal codes in a longitudinal database. The 1991 Canadian Census Health and Environment Cohort (CanCHEC), which contains information on individuals from the 1991 Census long-form questionnaire linked with T1 tax return files for the 1984-to-2011 period, is used to illustrate and validate the method. The cohort contains up to 28 consecutive fields for postal code of residence, but because of frequent gaps in postal code history, missing postal codes must be imputed. To validate the imputation method, two experiments were devised where 5% and 10% of all postal codes from a subset with full history were randomly removed and imputed.

    Release date: 2017-03-13

  • Articles and reports: 18-001-X2016001
    Description:

    Although the record linkage of business data is not a completely new topic, the fact remains that the public and many data users are unaware of the programs and practices commonly used by statistical agencies across the world.

    This report is a brief overview of the main practices, programs and challenges of record linkage of statistical agencies across the world who answered a short survey on this subject supplemented by publically available documentation produced by these agencies. The document shows that the linkage practices are similar between these statistical agencies; however the main differences are in the procedures in place to access to data along with regulatory policies that govern the record linkage permissions and the dissemination of data.

    Release date: 2016-10-27

  • Articles and reports: 11-633-X2016003
    Description:

    Large national mortality cohorts are used to estimate mortality rates for different socioeconomic and population groups, and to conduct research on environmental health. In 2008, Statistics Canada created a cohort linking the 1991 Census to mortality. The present study describes a linkage of the 2001 Census long-form questionnaire respondents aged 19 years and older to the T1 Personal Master File and the Amalgamated Mortality Database. The linkage tracks all deaths over a 10.6-year period (until the end of 2011, to date).

    Release date: 2016-10-26

  • Articles and reports: 89-648-X2016001
    Description:

    Linkages between survey and administrative data are an increasingly common practice, due in part to the reduced burden to respondents, and to the data that can be obtained at a relatively low cost. Historical linkage, or the linkage of administrative data from previous years to the year of the survey, compounds these benefits by providing additional years of data. This paper examines the Longitudinal and International Study of Adults (LISA), which was linked to historical tax data on personal income tax returns (T1) and those collected from employers’ files (T4), among others not mentioned in this paper. It presents trends in historical linkage rates, compares the coherence of administrative data between the T1 and T4, presents the ability to use the data to create balanced panels, and uses the T1 data to produce age-earnings profiles by sex. The results show that the historical linkage rate is high (over 90% in most cases) and stable over time for respondents who are likely to file a tax return, and that the T1 and T4 administrative sources show similar earnings. Moreover, long balanced panels of up to 30 years in length (at the time of writing) can be created using LISA administrative linkage data.

    Release date: 2016-08-18

  • Articles and reports: 82-003-X201600814647
    Description:

    This study is based on 2006 Census (long-form) socio-demographic information (including Aboriginal identity) that was linked to the Discharge Abstract Database to create a sample for analysis from all provinces and territories except Quebec. The purpose is to provide national figures on acute care hospitalizations of Aboriginal (First Nations living on and off reserve, Métis, Inuit in Inuit Nunangat) and non-Aboriginal people.

    Release date: 2016-08-17

  • Articles and reports: 82-003-X201600814648
    Description:

    This study reports the initial results of the recent Immigrant Landing File-to-Discharge Abstract Database linkage – specifically, a bivariate overview of acute care hospitalization rates by immigration category, landing year, and source world region at the national level.

    Release date: 2016-08-17

  • Articles and reports: 11-633-X2016002
    Description:

    Immigrants comprise an ever-increasing percentage of the Canadian population—at more than 20%, which is the highest percentage among the G8 countries (Statistics Canada 2013a). This figure is expected to rise to 25% to 28% by 2031, when at least one in four people living in Canada will be foreign-born (Statistics Canada 2010).

    This report summarizes the linkage of the Immigrant Landing File (ILF) for all provinces and territories, excluding Quebec, to hospital data from the Discharge Abstract Database (DAD), a national database containing information about hospital inpatient and day-surgery events. A deterministic exact-matching approach was used to link data from the 1980-to-2006 ILF and from the DAD (2006/2007, 2007/2008 and 2008/2009) with the 2006 Census, which served as a “bridge” file. This was a secondary linkage in that it used linkage keys created in two previous projects (primary linkages) that separately linked the ILF and the DAD to the 2006 Census. The ILF–DAD linked data were validated by means of a representative sample of 2006 Census records containing immigrant information previously linked to the DAD.

    Release date: 2016-08-17

  • Articles and reports: 82-003-X201600114306
    Description:

    This article is an overview of the creation, content, and quality of the 2006 Canadian Birth-Census Cohort Database.

    Release date: 2016-01-20
Reference (19)

Reference (19) (0 to 10 of 19 results)

  • Surveys and statistical programs – Documentation: 12-539-X
    Description:

    This document brings together guidelines and checklists on many issues that need to be considered in the pursuit of quality objectives in the execution of statistical activities. Its focus is on how to assure quality through effective and appropriate design or redesign of a statistical project or program from inception through to data evaluation, dissemination and documentation. These guidelines draw on the collective knowledge and experience of many Statistics Canada employees. It is expected that Quality Guidelines will be useful to staff engaged in the planning and design of surveys and other statistical projects, as well as to those who evaluate and analyze the outputs of these projects.

    Release date: 2019-12-04

  • Surveys and statistical programs – Documentation: 82-225-X200701010508
    Description:

    The Record Linkage Overview describes the process used in annual internal record linkage of the Canadian Cancer Registry. The steps include: preparation; pre-processing; record linkage; post-processing; analysis and resolution; resolution entry; and, resolution processing.

    Release date: 2008-01-18

  • Surveys and statistical programs – Documentation: 82-225-X
    Description:

    The compendium of Canadian Cancer Registry procedures manuals set out the rules for reporting cancer data to the CCR for all provincial and territorial cancer registries.

    Release date: 2008-01-18

  • Surveys and statistical programs – Documentation: 82-225-X20070109648
    Description:

    The Record Linkage Overview describes the process used in annual internal record linkage of the Canadian Cancer Registry. The steps include: preparation; pre-processing; record linkage; post-processing; analysis and resolution; resolution entry; and, resolution processing.

    Release date: 2007-06-21

  • Surveys and statistical programs – Documentation: 82-225-X20070109650
    Description:

    The User Guide to Record Linkage Feedback Reports C1 and C2 is intended for the users of the reports. The reports were developed to facilitate the exchange of information and decisions between the Canadian Cancer Registry and the Provincial and Territorial Cancer Registries.

    Release date: 2007-06-21

  • Surveys and statistical programs – Documentation: 82-225-X20060099202
    Description:

    The User Guide to Record Linkage Feedback Reports C1 and C2 is intended for the users of the reports. The reports were developed to facilitate the exchange of information and decisions between the Canadian Cancer Registry and the Provincial and Territorial Cancer Registries.

    Release date: 2006-07-07

  • Surveys and statistical programs – Documentation: 82-225-X20060099203
    Description:

    The user guide to Death Clearance Feedback Reports is intended for users of the feedback reports. The feedback reports were developed to facilitate the exchange of information and decisions between the Canadian Cancer Registry and the Provincial and Territorial Cancer Registries.

    Release date: 2006-07-07

  • Surveys and statistical programs – Documentation: 82-225-X20060099204
    Description:

    The Record Linkage Overview describes the process used in annual internal record linkage of the Canadian Cancer Registry. The steps include: preparation; pre-processing; record linkage; post-processing; analysis and resolution; resolution entry; and, resolution processing.

    Release date: 2006-07-07

  • Surveys and statistical programs – Documentation: 82-225-X20060099206
    Description:

    The Guidelines for Abstracting and Determining Death Certificate Only Cases are intended for use by all provincial and territorial cancer registries during their Death Clearance Process. The guidelines should be used when performing a comparison between the Death Certificate Notification and the cancer registry database.

    Release date: 2006-07-07

  • Surveys and statistical programs – Documentation: 16-505-G
    Description:

    Part of Statistics Canada's Econnections: linking the environment and the economy statistical series, this publication describes in detail the conceptual frameworks, data sources and empirical methods used to compile the Canadian System of Environmental and Resource Accounts (CSERA). Designed to be compatible with the accounting frameworks of the System of National Accounts, the CSERA allows users to easily analyze the linkages between economic activity and the environment in terms of material and energy flows, environmental expenditures and natural resource stocks. This publication will be of interest to researchers in both the economic and environmental fields who want to familiarize themselves with the accounting concepts of the CSERA. It is a companion volume to Environment-economy indicators and detailed statistics (catalogue no. 16-200-XKE), another product in the Econnections series.

    Statistics Canada has updated its 1997 documentation on environmental accounts, Econnections: Concepts, Sources and Methods of the Canadian System of Environmental and Resource Accounts, with publication of the Methodological Guide: Canadian System of Environmental-Economic Accounting.

    Release date: 2006-04-12
Date modified: