Automatic editing with hard and soft edits

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Sander Scholtus1

Abstract

A considerable limitation of current methods for automatic data editing is that they treat all edits as hard constraints. That is to say, an edit failure is always attributed to an error in the data. In manual editing, however, subject-matter specialists also make extensive use of soft edits, i.e., constraints that identify (combinations of) values that are suspicious but not necessarily incorrect. The inability of automatic editing methods to handle soft edits partly explains why in practice many differences are found between manually edited and automatically edited data. The object of this article is to present a new formulation of the error localisation problem which can distinguish between hard and soft edits. Moreover, it is shown how this problem may be solved by an extension of the error localisation algorithm of De Waal and Quere (2003).

Key Words

Automatic error localisation; Fellegi-Holt paradigm; Branch-and-bound algorithm; Numerical data; Categorical data; Mixed data.

Table of content

1 Introduction

2 Background

3 An error localisation problem with hard and soft edits

4 A short theory of edit failures

5 Solving the error localisation problem with hard and soft edits

6 Example

7 Application

8 Conclusion

 

 

 

 

 


1  Sander Scholtus, Statistics Netherlands, P.O. Box 24500, 2490 HA The Hague, The Netherlands. E-mail: sshs@cbs.nl.

Date modified: