Keyword search

Results

All (94)

All (94) (0 to 10 of 94 results)

1. A proposal for the problem of matching probabilities estimation in record linkage Archived
Articles and reports: 11-522-X202200100001
Description: Record linkage aims at identifying record pairs related to the same unit and observed in two different data sets, say A and B. Fellegi and Sunter (1969) suggest each record pair is tested whether generated from the set of matched or unmatched pairs. The decision function consists of the ratio between m(y) and u(y),probabilities of observing a comparison y of a set of k>3 key identifying variables in a record pair under the assumptions that the pair is a match or a non-match, respectively. These parameters are usually estimated by means of the EM algorithm using as data the comparisons on all the pairs of the Cartesian product ?=A×B. These observations (on the comparisons and on the pairs status as match or non-match) are assumed as generated independently of other pairs, assumption characterizing most of the literature on record linkage and implemented in software tools (e.g. RELAIS, Cibella et al. 2012). On the contrary, comparisons y and matching status in ? are deterministically dependent. As a result, estimates on m(y) and u(y) based on the EM algorithm are usually bad. This fact jeopardizes the effective application of the Fellegi-Sunter method, as well as automatic computation of quality measures and possibility to apply efficient methods for model estimation on linked data (e.g. regression functions), as in Chambers et al. (2015). We propose to explore ? by a set of samples, each one drawn so to preserve independence of comparisons among the selected record pairs. Simulations are encouraging.
Release date: 2024-03-25
2. A case study of using Splink: Census duplicate matching Archived
Articles and reports: 11-522-X202200100002
Description: The authors used the Splink probabilistic linkage package developed by the UK Ministry of Justice, to link census data from England and Wales to itself to find duplicate census responses. A large gold standard of confirmed census duplicates was available meaning that the results of the Splink implementation could be quality assured. This paper describes the implementation and features of Splink, gives details of the settings and parameters that we used to tune Splink for our particular project, and gives the results that we obtained.
Release date: 2024-03-25
3. Modelling intra-annual measurement in linked administrative and survey data Archived
Articles and reports: 11-522-X202200100012
Description: At Statistics Netherlands (SN) for some economic sectors two partly-independent intra-annual turnover index series are available: a monthly series based on survey data and a quarterly series based on value added tax data for the smaller units and re-used survey data for the other units. SN aims to benchmark the monthly turnover index series to the quarterly census data on a quarterly basis. This cannot currently be done because the tax data has a different quarterly pattern: the turnover is relatively large in the fourth quarter of the year and smaller in the first quarter. With the current study we aim to describe this deviating quarterly pattern at micro level. In the past we developed a mixture model using absolute turnover levels that could explain part of the quarterly patterns. Because the absolute turnover levels differ between the two series, in the current study we use a model based on relative quarterly turnover levels within a year.
Release date: 2024-03-25
4. Probabilistic or deterministic? Linkage methods tested for the Résil program Archived
Articles and reports: 11-522-X202200100019
Description: The purpose of this article is to compare the linkage results for individuals from French tax sources with those of the 2019 Enquête Annuelle de Recensement (EAR), obtained through different methods. Such a comparison will decide whether the Répertoires Statistiques d'Individus et de Logements (Résil) program should be equipped with a probabilistic matching tool for its administrative source identification and matching engine.
Release date: 2024-03-25
5. Record linkage techniques to identify 2021 Canadian Census dwellings in the new Statistical Building Register Archived
Articles and reports: 11-522-X202200100020
Description: The reconciliation of 2021 census dwellings with the new Statistical Building Register (SBgR) presented linkage challenges. The Census of Population collected information from various dwelling types. For a large proportion of the population, mailing addresses were at the centre: they were used for reaching out to people and collected as contact info. In parallel, the register environment has been evolving. The agency is transitioning from the Address Register (AR) to the SBgR holding both mailing and location addresses, while also covering non-residential buildings. The reconciliation was conducted using a combination of systems, notably the new Register Matching Engine (RME) for difficult cases. The RME holds an interesting range of sophisticated string comparators. A deterministic linkage approach was used, while incorporating some data knowledge like the entropy. Through metadata, the matching expert could also reduce the amounts of false positives and false negatives.
Release date: 2024-03-25
6. Emigration of Immigrants: Results from the Longitudinal Immigration Database
Articles and reports: 91F0015M2024002
Description: This paper examines the emigration of immigrants using the Longitudinal Immigration Database (IMDB). An indirect definition of emigration is proposed that leverages the information available in the IMDB. This study found that emigration of immigrants is a significant phenomenon. Certain characteristics of immigrants, such as having children, admission category and country of birth, have a strong correlation with emigration.
Release date: 2024-02-02
7. Examining the consistency of de facto marital status between tax data and the 2016 Census
Articles and reports: 91F0015M2023001
Description: Using record linkage, this article compares marital status as identified in the 2015 T1 tax data to what was provided in the 2016 Census using record linkage.
Release date: 2023-07-11
8. Maximum entropy classification for record linkage
Articles and reports: 12-001-X202200100007
Description:
By record linkage one joins records residing in separate files which are believed to be related to the same entity. In this paper we approach record linkage as a classification problem, and adapt the maximum entropy classification method in machine learning to record linkage, both in the supervised and unsupervised settings of machine learning. The set of links will be chosen according to the associated uncertainty. On the one hand, our framework overcomes some persistent theoretical flaws of the classical approach pioneered by Fellegi and Sunter (1969); on the other hand, the proposed algorithm is fully automatic, unlike the classical approach that generally requires clerical review to resolve the undecided cases.
Release date: 2022-06-21
9. Measuring the undercoverage of two data sources with a nearly perfect coverage through capture and recapture in the presence of linkage errors Archived
Articles and reports: 11-522-X202100100006
Description:
In the context of its "admin-first" paradigm, Statistics Canada is prioritizing the use of non-survey sources to produce official statistics. This paradigm critically relies on non-survey sources that may have a nearly perfect coverage of some target populations, including administrative files or big data sources. Yet, this coverage must be measured, e.g., by applying the capture-recapture method, where they are compared to other sources with good coverage of the same populations, including a census. However, this is a challenging exercise in the presence of linkage errors, which arise inevitably when the linkage is based on quasi-identifiers, as is typically the case. To address the issue, a new methodology is described where the capture-recapture method is enhanced with a new error model that is based on the number of links adjacent to a given record. It is applied in an experiment with public census data.
Key Words: dual system estimation, data matching, record linkage, quality, data integration, big data.

Release date: 2021-10-22
10. Statistics Canada Quality Guidelines
Surveys and statistical programs – Documentation: 12-539-X
Description:
This document brings together guidelines and checklists on many issues that need to be considered in the pursuit of quality objectives in the execution of statistical activities. Its focus is on how to assure quality through effective and appropriate design or redesign of a statistical project or program from inception through to data evaluation, dissemination and documentation. These guidelines draw on the collective knowledge and experience of many Statistics Canada employees. It is expected that Quality Guidelines will be useful to staff engaged in the planning and design of surveys and other statistical projects, as well as to those who evaluate and analyze the outputs of these projects.
Release date: 2019-12-04

Data (2)

Data (2) ((2 results))

1. Agriculture-Population Linkage Data for the 2001 Census Archived
Table: 95F0303X
Description:
This product presents selected 2001 and historical data from the Census of Agriculture - Census of Population Linkage database. The data are available at the Canada and province levels for free. The data variables include: age; sex; marital status; mother tongue; highest level of schooling; net farm income; as well as farm population counts and income profiles for census farm families and households.
(No linkage databases were created for the 1966 and 1976 Census years, so historical comparisons are not possible for those years.)
Release date: 2003-12-02
2. Indicators and Detailed Statistics (Econnections: Linking the Environment and the Economy) Archived
Table: 16-200-X
Description:
Part of Statistics Canada's Econnections: linking the environment and the economy statistical series, this product consists of a printed publication combined with a CD-ROM. The product offers summary indicators plus detailed statistics that quantify the relationship between economic activity and the environment. Information is presented for issues ranging from greenhouse gas emissions, water and energy use, to natural resource wealth, environmental expenditures and beyond. The printed publication provides convenient reference to the summary indicators, including analysis of important trends, while the CD-ROM offers straightforward access to dozens of detailed statistical tables that underlie the indicators. An electronic version of the printed publication is included on the CD-ROM and each indicator in the publication is hypertext linked to a group of related statistical tables, allowing the user to easily select detailed statistics for viewing in association with any given indicator. Simple analysis of the statistics can be done directly within the CD-ROM's software. For those who carry out more complex analysis, downloading of data from the CD-ROM in standard spreadsheet format is easily accomplished.

Release date: 2001-02-23

Analysis (73)

Analysis (73) (40 to 50 of 73 results)

41. The effect of record linkage errors on risk estimates in cohort mortality studies Archived
Articles and reports: 12-001-X20050018083
Description:
The advent of computerized record linkage methodology has facilitated the conduct of cohort mortality studies in which exposure data in one database are electronically linked with mortality data from another database. This, however, introduces linkage errors due to mismatching an individual from one database with a different individual from the other database. In this article, the impact of linkage errors on estimates of epidemiological indicators of risk such as standardized mortality ratios and relative risk regression model parameters is explored. It is shown that the observed and expected number of deaths are affected in opposite direction and, as a result, these indicators can be subject to bias and additional variability in the presence of linkage errors.
Release date: 2005-07-21
42. A case study in record linkage Archived
Articles and reports: 12-001-X20050018085
Description:
Record linkage is a process of pairing records from two files and trying to select the pairs that belong to the same entity. The basic framework uses a match weight to measure the likelihood of a correct match and a decision rule to assign record pairs as "true" or "false" match pairs. Weight thresholds for selecting a record pair as matched or unmatched depend on the desired control over linkage errors. Current methods to determine the selection thresholds and estimate linkage errors can provide divergent results, depending on the type of linkage error and the approach to linkage. This paper presents a case study that uses existing linkage methods to link record pairs but a new simulation approach (SimRate) to help determine selection thresholds and estimate linkage errors. SimRate uses the observed distribution of data in matched and unmatched pairs to generate a large simulated set of record pairs, assigns a match weight to each pair based on specified match rules, and uses the weight curves of the simulated pairs for error estimation.
Release date: 2005-07-21
43. Using matched substitutes to improve imputations for geographically linked databases Archived
Articles and reports: 12-001-X20050018088
Description:
When administrative records are geographically linked to census block groups, local-area characteristics from the census can be used as contextual variables, which may be useful supplements to variables that are not directly observable from the administrative records. Often databases contain records that have insufficient address information to permit geographical links with census block groups; the contextual variables for these records are therefore unobserved. We propose a new method that uses information from "matched cases" and multivariate regression models to create multiple imputations for the unobserved variables. Our method outperformed alternative methods in simulation evaluations using census data, and was applied to the dataset for a study on treatment patterns for colorectal cancer patients.
Release date: 2005-07-21
44. Data Quality in the 2003 Survey of Labour and Income Dynamics (SLID) Archived
Articles and reports: 75F0002M2005004
Description:
The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey initiated in 1993. The survey was designed to measure changes in the economic well-being of Canadians as well as the factors affecting these changes.
Sample surveys are subject to errors. As with all surveys conducted at Statistics Canada, considerable time and effort is taken to control such errors at every stage of the Survey of Labour and Income Dynamics. Nonetheless errors do occur. It is the policy at Statistics Canada to furnish users with measures of data quality so that the user is able to interpret the data properly. This report summarizes a set of quality measures that has been produced in an attempt to describe the overall quality of SLID data. Among the measures included in the report are sample composition and attrition rates, sampling errors, coverage errors in the form of slippage rates, response rates, tax permission and tax linkage rates, and imputation rates.
Release date: 2005-05-12
45. Simultaneous use of multiple imputation for missing data and disclosure limitation Archived
Articles and reports: 12-001-X20040027755
Description:
Several statistical agencies use, or are considering the use of, multiple imputation to limit the risk of disclosing respondents' identities or sensitive attributes in public use data files. For example, agencies can release partially synthetic datasets, comprising the units originally surveyed with some collected values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. This article presents an approach for generating multiply-imputed, partially synthetic datasets that simultaneously handles disclosure limitation and missing data. The basic idea is to fill in the missing data first to generate m completed datasets, then replace sensitive or identifying values in each completed dataset with r imputed values. This article also develops methods for obtaining valid inferences from such multiply-imputed datasets. New rules for combining the multiple point and variance estimates are needed because the double duty of multiple imputation introduces two sources of variability into point estimates, which existing methods for obtaining inferences from multiply-imputed datasets do not measure accurately. A reference t-distribution appropriate for inferences when m and r are moderate is derived using moment matching and Taylor series approximations.
Release date: 2005-02-03
46. Statistical disclosure control for tables: Determining which method to use Archived
Articles and reports: 11-522-X20030017690
Description:
This paper analyses several important statistical disclosure control (SDC) methods for tables with respect to confidentiality rules, development time and runtime of the software used, and the way tables are used.
Release date: 2005-01-26
47. A strategy for a system of coverage samples for an integrated census Archived
Articles and reports: 11-522-X20030017729
Description:
This paper describes the design of the samples and analyses factors that affect the scope of the direct data collection for the first Integrated Census (IC) experiment.
Release date: 2005-01-26
48. Inferences for finite populations using multiple data sources with different reference times Archived
Articles and reports: 11-522-X20020016733
Description:
While censuses and surveys are often said to measure populations as they are, most reflect information about individuals as they were at the time of measurement, or even at some prior time point. Inferences from such data therefore should take into account change over time at both the population and individual levels. In this paper, we provide a unifying framework for such inference problems, illustrating it through a diverse series of examples including: (1) estimating residency status on Census Day using multiple administrative records, (2) combining administrative records for estimating the size of the US population, (3) using rolling averages from the American Community Survey, and (4) estimating the prevalence of human rights abuses.
Specifically, at the population level, the estimands of interest, such as the size or mean characteristics of a population, might be changing. At the same time, individual subjects might be moving in and out of the frame of the study or changing their characteristics. Such changes over time can affect statistical studies of government data that combine information from multiple data sources, including censuses, surveys and administrative records, an increasingly common practice. Inferences from the resulting merged databases often depend heavily on specific choices made in combining, editing and analysing the data that reflect assumptions about how populations of interest change or remain stable over time.
Release date: 2004-09-13
49. An information-rich environment: linked-record systems and data quality in Canada Archived
Articles and reports: 11-522-X20010016238
Description:
This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.
Research programs building on population-based, longitudinal administrative data and record-linkage techniques are found in England, Scotland, the United States (the Mayo Clinic), Western Australia and Canada. These systems can markedly expand both the methodological and the substantive research in health and health care.
This paper summarizes published, Canadian data quality studies regarding registries, hospital discharges, prescription drugs, and physician claims. It makes suggestions for improving registries, facilitating record linkage and expanding research into social epidemiology. New trends in case identification and health status measurement using administrative data have also been noted. And the differing needs for data quality research in each province have been highlighted.
Release date: 2002-09-12
50. On measuring the quality of indirect small area estimates Archived
Articles and reports: 11-522-X20010016246
Description:
This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.
Samples sizes in small population areas are typically very small. As a result, customary, area-specific, direct estimators of Small Area Means do not provide acceptable quality in terms of Mean Square Error (MSE). Indirect estimators that borrow strength from related areas by linking models based on similar auxiliary data are now widely used for small area estimation. Such linking models are either implicit (as in the case of synthetic estimators) or explicit (as in the case of model-based estimators). In the Frequentist approach, the quality of an indirect estimator is measured by its estimated MSE while the posterior variance of the Small Area Mean is used in the Bayesian approach. This paper reviews some recent work on estimating MSE and the evaluation of posterior variance.
Release date: 2002-09-12

Reference (19)

Reference (19) (0 to 10 of 19 results)

1. Statistics Canada Quality Guidelines
Surveys and statistical programs – Documentation: 12-539-X
Description:
This document brings together guidelines and checklists on many issues that need to be considered in the pursuit of quality objectives in the execution of statistical activities. Its focus is on how to assure quality through effective and appropriate design or redesign of a statistical project or program from inception through to data evaluation, dissemination and documentation. These guidelines draw on the collective knowledge and experience of many Statistics Canada employees. It is expected that Quality Guidelines will be useful to staff engaged in the planning and design of surveys and other statistical projects, as well as to those who evaluate and analyze the outputs of these projects.
Release date: 2019-12-04
2. Canadian Cancer Registry System Guide, 2007 Edition Archived
Surveys and statistical programs – Documentation: 82-225-X200701010508
Description:
The Record Linkage Overview describes the process used in annual internal record linkage of the Canadian Cancer Registry. The steps include: preparation; pre-processing; record linkage; post-processing; analysis and resolution; resolution entry; and, resolution processing.
Release date: 2008-01-18
3. Canadian Cancer Registry Manuals Archived
Surveys and statistical programs – Documentation: 82-225-X
Description:
The compendium of Canadian Cancer Registry procedures manuals set out the rules for reporting cancer data to the CCR for all provincial and territorial cancer registries.
Release date: 2008-01-18
4. Record linkage overview, 2007 edition Archived
Surveys and statistical programs – Documentation: 82-225-X20070109648
Description:
The Record Linkage Overview describes the process used in annual internal record linkage of the Canadian Cancer Registry. The steps include: preparation; pre-processing; record linkage; post-processing; analysis and resolution; resolution entry; and, resolution processing.
Release date: 2007-06-21
5. User guide to record linkage feedback, C1, C2 and C3, 2007 edition Archived
Surveys and statistical programs – Documentation: 82-225-X20070109650
Description:
The User Guide to Record Linkage Feedback Reports C1 and C2 is intended for the users of the reports. The reports were developed to facilitate the exchange of information and decisions between the Canadian Cancer Registry and the Provincial and Territorial Cancer Registries.
Release date: 2007-06-21
6. User guide to record linkage feedback reports C1 and C2, 2006 edition Archived
Surveys and statistical programs – Documentation: 82-225-X20060099202
Description:
The User Guide to Record Linkage Feedback Reports C1 and C2 is intended for the users of the reports. The reports were developed to facilitate the exchange of information and decisions between the Canadian Cancer Registry and the Provincial and Territorial Cancer Registries.
Release date: 2006-07-07
7. User guide to death clearance feedback reports, 2006 edition Archived
Surveys and statistical programs – Documentation: 82-225-X20060099203
Description:
The user guide to Death Clearance Feedback Reports is intended for users of the feedback reports. The feedback reports were developed to facilitate the exchange of information and decisions between the Canadian Cancer Registry and the Provincial and Territorial Cancer Registries.
Release date: 2006-07-07
8. Record linkage overview, 2006 edition Archived
Surveys and statistical programs – Documentation: 82-225-X20060099204
Description:
The Record Linkage Overview describes the process used in annual internal record linkage of the Canadian Cancer Registry. The steps include: preparation; pre-processing; record linkage; post-processing; analysis and resolution; resolution entry; and, resolution processing.
Release date: 2006-07-07
9. Guidelines for abstracting and determining death certificate only (DCO) cases for provincial/territorial cancer registries (PTCRs) in Canada, 2006 edition Archived
Surveys and statistical programs – Documentation: 82-225-X20060099206
Description:
The Guidelines for Abstracting and Determining Death Certificate Only Cases are intended for use by all provincial and territorial cancer registries during their Death Clearance Process. The guidelines should be used when performing a comparison between the Death Certificate Notification and the cancer registry database.
Release date: 2006-07-07
10. Concepts, Sources and Methods of the Canadian System of Environmental and Resource Accounts Archived
Surveys and statistical programs – Documentation: 16-505-G
Description:
Part of Statistics Canada's Econnections: linking the environment and the economy statistical series, this publication describes in detail the conceptual frameworks, data sources and empirical methods used to compile the Canadian System of Environmental and Resource Accounts (CSERA). Designed to be compatible with the accounting frameworks of the System of National Accounts, the CSERA allows users to easily analyze the linkages between economic activity and the environment in terms of material and energy flows, environmental expenditures and natural resource stocks. This publication will be of interest to researchers in both the economic and environmental fields who want to familiarize themselves with the accounting concepts of the CSERA. It is a companion volume to Environment-economy indicators and detailed statistics (catalogue no. 16-200-XKE), another product in the Econnections series.
Statistics Canada has updated its 1997 documentation on environmental accounts, Econnections: Concepts, Sources and Methods of the Canadian System of Environmental and Resource Accounts, with publication of the Methodological Guide: Canadian System of Environmental-Economic Accounting.

Release date: 2006-04-12

Report a problem or mistake on this page

Date modified:: 2024-06-04

Language selection

Search and menus

Search

Keyword search

Filter results by

Keyword(s)

Subject

Type

Year of publication

Geography

Survey or statistical program

Portal

Content

Results

All (94) (0 to 10 of 94 results)

Data (2) ((2 results))

Analysis (73) (40 to 50 of 73 results)

Reference (19) (0 to 10 of 19 results)

Keyword search

Filter results by

Keyword(s)

Subject

Type

Year of publication

Geography

Survey or statistical program

Portal

Content

Results

All (94) (0 to 10 of 94 results)

Data (2) ((2 results))

Analysis (73) (40 to 50 of 73 results)

Reference (19) (0 to 10 of 19 results)

How do I use the filters and the search box?

How do I refine my search?

How does the search work?

How are the results ordered?

How are the results ordered?