Analysis

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

Statistics Canada's Trust Centre

Results

All (3)

All (3) ((3 results))

1. Regression analysis of data files that are computer matched - Part II Archived
Articles and reports: 12-001-X19970023613
Description:
Many policy decisions are best made when there is supporting statistical evidence based on analyses of appropriate microdata. Sometimes all the needed data exist but reside in multiple files for which common identifiers (e.g., SIN's, EIN's, or SSN's) are unavailable. This paper demonstrates a methodology for analyzing two such files: (1) when there is common nonunique information subject to significant error and (2) when each source file contains uncommon quantitative data that can be connected with appropriate models. Such a situation might arise with files of businesses only having difficult-to-use name and address information in common, one file with the energy products consumed by the companies, and the other file containing the types and amounts of goods they produce. Another situation might arise with files on individuals in which one file has earnings data, another information about health-related expenses, and a third information about receipts of supplemental payments. The goal of the methodology presented is to produce valid statistical analyses; appropriate microdata files may or may not be produced.
Release date: 1998-03-12
2. Regression analysis of data files that are computer matched Archived
Articles and reports: 12-001-X199300114476
Description:
This paper focuses on how to deal with record linkage errors when engaged in regression analysis. Recent work by Rubin and Belin (1991) and by Winkler and Thibaudeau (1991) provides the theory, computational algorithms, and software necessary for estimating matching probabilities. These advances allow us to update the work of Neter, Maynes, and Ramanathan (1965). Adjustment procedures are outlined and some successful simulations are described. Our results are preliminary and intended largely to stimulate further work.
Release date: 1993-06-15
3. Methods for adjusting for lack of independence in an application of the Fellegi-Sunter model of record linkage Archived
Articles and reports: 12-001-X198900114574
Description:
Let A x B be the product space of two sets A and B which is divided into matches (pairs representing the same entity) and nonmatches (pairs representing different entities). Linkage rules are those that divide A x B into links (designated matches), possible links (pairs for which we delay a decision), and nonlinks (designated nonmatches). Under fixed bounds on the error rates, Fellegi and Sunter (1969) provided a linkage rule that is optimal in the sense that it minimizes the set of possible links. The optimality is dependent on knowledge of certain probabilities that are used in a crucial likelihood ratio. In applying the record linkage model, an independence assumption is often made that allows estimation of the probabilities. If the assumption is not met, then a record linkage procedure using estimates computed under the assumption may not be optimal. This paper contains an examination of methods for adjusting linkage rules when the independence assumption is not valid. The presentation takes the form of an empirical analysis of lists of businesses for which the truth of matches is known. The number of possible links obtained using standard and adjusted computational procedures may be dependent on different samples. Bootstrap methods (Efron 1987) are used to examine the variation due to different samples.
Release date: 1989-06-15

Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (3)

Articles and reports (3) ((3 results))

1. Regression analysis of data files that are computer matched - Part II Archived
Articles and reports: 12-001-X19970023613
Description:
Many policy decisions are best made when there is supporting statistical evidence based on analyses of appropriate microdata. Sometimes all the needed data exist but reside in multiple files for which common identifiers (e.g., SIN's, EIN's, or SSN's) are unavailable. This paper demonstrates a methodology for analyzing two such files: (1) when there is common nonunique information subject to significant error and (2) when each source file contains uncommon quantitative data that can be connected with appropriate models. Such a situation might arise with files of businesses only having difficult-to-use name and address information in common, one file with the energy products consumed by the companies, and the other file containing the types and amounts of goods they produce. Another situation might arise with files on individuals in which one file has earnings data, another information about health-related expenses, and a third information about receipts of supplemental payments. The goal of the methodology presented is to produce valid statistical analyses; appropriate microdata files may or may not be produced.
Release date: 1998-03-12
2. Regression analysis of data files that are computer matched Archived
Articles and reports: 12-001-X199300114476
Description:
This paper focuses on how to deal with record linkage errors when engaged in regression analysis. Recent work by Rubin and Belin (1991) and by Winkler and Thibaudeau (1991) provides the theory, computational algorithms, and software necessary for estimating matching probabilities. These advances allow us to update the work of Neter, Maynes, and Ramanathan (1965). Adjustment procedures are outlined and some successful simulations are described. Our results are preliminary and intended largely to stimulate further work.
Release date: 1993-06-15
3. Methods for adjusting for lack of independence in an application of the Fellegi-Sunter model of record linkage Archived
Articles and reports: 12-001-X198900114574
Description:
Let A x B be the product space of two sets A and B which is divided into matches (pairs representing the same entity) and nonmatches (pairs representing different entities). Linkage rules are those that divide A x B into links (designated matches), possible links (pairs for which we delay a decision), and nonlinks (designated nonmatches). Under fixed bounds on the error rates, Fellegi and Sunter (1969) provided a linkage rule that is optimal in the sense that it minimizes the set of possible links. The optimality is dependent on knowledge of certain probabilities that are used in a crucial likelihood ratio. In applying the record linkage model, an independence assumption is often made that allows estimation of the probabilities. If the assumption is not met, then a record linkage procedure using estimates computed under the assumption may not be optimal. This paper contains an examination of methods for adjusting linkage rules when the independence assumption is not valid. The presentation takes the form of an empirical analysis of lists of businesses for which the truth of matches is known. The number of possible links obtained using standard and adjusted computational procedures may be dependent on different samples. Bootstrap methods (Efron 1987) are used to examine the variation due to different samples.
Release date: 1989-06-15

Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Report a problem or mistake on this page

Date modified:: 2024-09-25