Measuring the undercoverage of two data sources with a nearly perfect coverage through capture and recapture in the presence of linkage errors

Articles and reports: 11-522-X202100100006
Description:

In the context of its "admin-first" paradigm, Statistics Canada is prioritizing the use of non-survey sources to produce official statistics. This paradigm critically relies on non-survey sources that may have a nearly perfect coverage of some target populations, including administrative files or big data sources. Yet, this coverage must be measured, e.g., by applying the capture-recapture method, where they are compared to other sources with good coverage of the same populations, including a census. However, this is a challenging exercise in the presence of linkage errors, which arise inevitably when the linkage is based on quasi-identifiers, as is typically the case. To address the issue, a new methodology is described where the capture-recapture method is enhanced with a new error model that is based on the number of links adjacent to a given record. It is applied in an experiment with public census data.

Key Words: dual system estimation, data matching, record linkage, quality, data integration, big data.

Issue Number: 2021001
Author(s): Dasylva, Abel; Goussanou, Arthur; Nambeu, Christian-Olivier
Main Product: Statistics Canada International Symposium Series: Proceedings
Format Release date More information
PDF October 22, 2021