Estimating the false negatives due to blocking in record linkage

Articles and reports: 12-001-X202100200002
Description:

When linking massive data sets, blocking is used to select a manageable subset of record pairs at the expense of losing a few matched pairs. This loss is an important component of the overall linkage error, because blocking decisions are made early on in the linkage process, with no way to revise them in subsequent steps. Yet, measuring this contribution is still a major challenge because of the need to model all the pairs in the Cartesian product of the sources, not just those satisfying the blocking criteria. Unfortunately, previous error models are of little use because they typically do not meet this requirement. This paper addresses the issue with a new finite mixture model, which dispenses with clerical reviews, training data, or the assumption that the linkage variables are conditionally independent. It applies when applying a standard blocking procedure for the linkage of a file to a register or a census with complete coverage, where both sources are free of duplicate records.

Issue Number: 2021002
Author(s): Dasylva, Abel; Goussanou, Arthur
Main Product: Survey Methodology
Format Release date More information
HTML January 6, 2022
PDF January 6, 2022

Related information

Subjects and keywords

Subjects