A generalized Fellegi-Holt paradigm for automatic error localization 8. Conclusion

In this article, a new formulation was proposed of the error localization problem in automatic editing. It was suggested to find the (weighted) minimal number of edit operations needed to make an observed record consistent with the edits. The new error localization problem can be seen as a generalization of the problem proposed in a seminal paper by Fellegi and Holt (1976), because the operation that imputes a new value for one variable at a time is an important special case of an edit operation.

The main focus here has been on developing the mathematical theory behind the new error localization problem. It turns out that FM elimination, a technique that has been used in the past to solve the Fellegi-Holt-based error localization problem, can be applied also in the context of the new problem (Section 5). Nevertheless, the task of solving the new error localization problem is challenging from a computational point of view, at least for the numbers of variables, edits, and edit operations that would be encountered in practical applications at statistical institutes. A possible error localization algorithm was outlined in Section 6. More efficient algorithms probably could and should be developed. Similarly to FM elimination, it may be possible to adapt other ideas that have been used to solve the Fellegi-Holt-based problem to the generalized problem considered here.

The discussion in this article was restricted to numerical data and linear edits. The original Fellegi-Holt paradigm has been applied also to categorical and mixed data. Several authors, including Bruni (2004) and de Jonge and van der Loo (2014), have shown that a large class of edits for mixed data can be re-formulated in terms of numerical data and linear edits, with the additional restriction that some of the variables have to be integer-valued. In principle, this means that the results in this article could be applied also to mixed data. To accommodate the fact that some variables are integer-valued, Pugh’s (1992) extension of FM elimination to integers could be used; see also de Waal et al. (2011) for a discussion of this extended elimination technique in the context of Fellegi-Holt-based error localization. It remains to be seen whether this approach is computationally feasible.

Remark 4 in Section 4 hinted at an analogy between error localization in statistical microdata and the field of approximate string matching. In approximate string matching, text strings are compared under the assumption that they may have been partially corrupted (Navarro 2001). Various distance functions have been proposed for this task. The Hamming distance, which counts the number of positions on which two strings differ, may be seen as an analogue of the Fellegi-Holt-based target function (2.2). The generalized error localization problem defined in this paper has its counterpart in the use of the Levenshtein distance or “edit distance” for approximate string matching. It may be interesting to explore this analogy further. In particular, efficient algorithms have been developed for computing edit distances between strings; it might be possible to apply some of the underlying ideas also to the generalized error localization problem.

The new error localization algorithm was applied successfully to a small synthetic data set (Section 7). Overall, the results of this simulation study suggest that the new error localization approach has the potential to achieve a substantial improvement of the quality of automatic editing compared to the approach that is currently used in practice. However, this does require that sufficient information be available to identify all MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9IqqrpepC0xbbL8F4rqqr=hbbG8pue9Fbe9q8 qqvqFr0dXdbrVc=b0P0xb9peuD0xXddrpe0=1qpeea0=yrVue9Fve9 Fve8meaabaqaciGacaGaaeWabaWaaeaaeaaakeaaieaajugybabaaa aaaaaapeGaa83eGaaa@384B@ or at least most MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9IqqrpepC0xbbL8F4rqqr=hbbG8pue9Fbe9q8 qqvqFr0dXdbrVc=b0P0xb9peuD0xXddrpe0=1qpeea0=yrVue9Fve9 Fve8meaabaqaciGacaGaaeWabaWaaeaaeaaakeaaieaajugybabaaa aaaaaapeGaa83eGaaa@384B@ of the relevant edit operations in a particular application. Possible gains in the quality of error localization also have to be weighed in practice against the higher computational demands of the generalized error localization problem.

An obvious candidate for applying the new methodology in practice would be the SBS. However, more research is needed before this method could be applied during regular production. To apply the method in a particular context, it is necessary first to specify the relevant edit operations. Ideally, each edit operation should correspond to a combination of amendments to the data that human editors consider to be a correction for one particular error. In addition, a suitable set of weights w g MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadEhapaWaaSbaaSqaa8qacaWGNbaapaqabaaaaa@3A01@ has to be determined for these edit operations. This would require information about the relative frequencies of the most common types of amendments made during manual editing. Both aspects could be investigated based on historical data before and after manual editing, editing instructions and other documentation used by the editors, and interviews with editors and/or supervisors of editing.

On a more fundamental level, a question of demarcation arises between deductive correction methods and automatic editing under the new error localization problem. In principle, many known types of error could be resolved either by automatic correction rules or by error localization using edit operations. Each approach has its own advantages and disadvantages (Scholtus 2014). It is likely that some compromise will produce the best results, with some errors handled deductively and others by edit operations. However, it is not obvious how best to make this division in practice.

Ultimately, the aim of the new methodology proposed in this article is to improve the usefulness of automatic editing in practice. So far, the results are promising.

Acknowledgements

The views expressed in this article are those of the author and do not necessarily reflect the policies of Statistics Netherlands. The author would like to thank Jeroen Pannekoek, Ton de Waal, and Mark van der Loo for their comments on earlier versions of this article, as well as the Associate Editor and two anonymous referees.

Appendix

Fourier-Motzkin elimination

Consider a system of linear constraints (2.1) and let x f MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadIhapaWaaSbaaSqaa8qacaWGMbaapaqabaaaaa@3A01@ be the variable to be eliminated. First, suppose that x f MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadIhapaWaaSbaaSqaa8qacaWGMbaapaqabaaaaa@3A01@ is involved only in inequalities. For ease of exposition, suppose that the edits are normalized so that all inequalities use the MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiabgwMiZcaa@3985@ operator. The FM elimination method considers all pairs ( r , s ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa Wdbmaabmaapaqaa8qacaWGYbGaaiilaiaadohaaiaawIcacaGLPaaa aaa@3C06@ of inequalities in which the coefficients of x f MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadIhapaWaaSbaaSqaa8qacaWGMbaapaqabaaaaa@3A01@ have opposite signs; that is, a r f a s f < 0. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadggapaWaaSbaaSqaa8qacaWGYbGaamOzaaWdaeqaaOWdbiaa dggapaWaaSbaaSqaa8qacaWGZbGaamOzaaWdaeqaaOWdbiabgYda8i aaicdacaGGUaaaaa@40A8@ Suppose without loss of generality that a r f < 0 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadggapaWaaSbaaSqaa8qacaWGYbGaamOzaaWdaeqaaOWdbiab gYda8iaaicdaaaa@3CB9@ and a s f > 0. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadggapaWaaSbaaSqaa8qacaWGZbGaamOzaaWdaeqaaOWdbiab g6da+iaaicdacaGGUaaaaa@3D70@ From the original pair of edits, the following implied constraint is derived:

j = 1 p a j * x j + b * 0 , ( A .1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbmaaqahabaGaamyya8aadaqhaaWcbaWdbiaadQgaa8aabaWdbiaa cQcaaaGccaWG4bWdamaaBaaaleaapeGaamOAaaWdaeqaaOWdbiabgU caRiaadkgapaWaaWbaaSqabeaapeGaaiOkaaaakiabgwMiZkaaicda aSqaaiaadQgacqGH9aqpcaaIXaaabaGaamiCaaqdcqGHris5aOGaai ilaiaaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaiikaiaacgeacaGG UaGaaGymaiaacMcaaaa@5455@

with a j * = a s f a r j a r f a s j MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadggadaqhaaWcbaGaamOAaaqaaiaacQcaaaGccqGH9aqpcaWG HbWdamaaBaaaleaapeGaam4CaiaadAgaa8aabeaakiaadggadaWgaa WcbaWdbiaadkhacaWGQbaapaqabaGcpeGaeyOeI0Iaamyya8aadaWg aaWcbaWdbiaadkhacaWGMbaapaqabaGccaWGHbWaaSbaaSqaa8qaca WGZbGaamOAaaWdaeqaaaaa@490E@ and b * = a s f b r a r f b s . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadkgapaWaaWbaaSqabeaapeGaaiOkaaaakiabg2da9iaadgga paWaaSbaaSqaa8qacaWGZbGaamOzaaWdaeqaaOWdbiaadkgapaWaaS baaSqaa8qacaWGYbaapaqabaGcpeGaeyOeI0Iaamyya8aadaWgaaWc baWdbiaadkhacaWGMbaapaqabaGcpeGaamOya8aadaWgaaWcbaWdbi aadohaa8aabeaakiaac6caaaa@475D@ Note that a f * = 0 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadggapaWaa0baaSqaa8qacaWGMbaapaqaa8qacaGGQaaaaOGa eyypa0JaaGimaiaacYcaaaa@3D23@ so x f MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadIhapaWaaSbaaSqaa8qacaWGMbaapaqabaaaaa@3A01@ is not involved in (A.1). An inequality of the form (A.1) is derived from each of the above-mentioned pairs ( r , s ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa Wdbmaabmaapaqaa8qacaWGYbGaaiilaiaadohaaiaawIcacaGLPaaa caGGUaaaaa@3CB8@ The full implied system of constraints obtained by FM elimination now consists of these derived constraints, together with all original constraints that do not involve x f . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadIhapaWaaSbaaSqaa8qacaWGMbaapaqabaGccaGGUaaaaa@3ABD@

If there are linear equalities that involve x f , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadIhapaWaaSbaaSqaa8qacaWGMbaapaqabaGccaGGSaaaaa@3ABB@ the above technique could be applied after replacing each linear equality with two equivalent linear inequalities. de Waal and Quere (2003) suggested a more efficient alternative for this case. Suppose that the r th MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadkhapaWaaWbaaSqabeaapeGaaeiDaiaabIgaaaaaaa@3AE4@ constraint in (2.1) is an equality that involves x f . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadIhapaWaaSbaaSqaa8qacaWGMbaapaqabaGccaGGUaaaaa@3ABD@ This constraint can be rewritten as

x f = 1 a r f ( b r + j f a r j x j ) . ( A .2 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadIhapaWaaSbaaSqaa8qacaWGMbaapaqabaGcpeGaeyypa0Za aSaaa8aabaWdbiabgkHiTiaaigdaa8aabaWdbiaadggapaWaaSbaaS qaa8qacaWGYbGaamOzaaWdaeqaaaaak8qadaqadaWdaeaapeGaamOy a8aadaWgaaWcbaWdbiaadkhaa8aabeaak8qacqGHRaWkdaaeqaqaai aadggapaWaaSbaaSqaa8qacaWGYbGaamOAaaWdaeqaaOWdbiaadIha paWaaSbaaSqaa8qacaWGQbaapaqabaaapeqaaiaadQgacqGHGjsUca WGMbaabeqdcqGHris5aaGccaGLOaGaayzkaaGaaiOlaiaaywW7caaM f8UaaGzbVlaaywW7caaMf8UaaiikaiaacgeacaGGUaGaaGOmaiaacM caaaa@5C4B@

By substituting the expression on the right-hand-side of (A.2) for x f MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadIhapaWaaSbaaSqaa8qacaWGMbaapaqabaaaaa@3A01@ in all other constraints, one again obtains an implied system of constraints that does not involve x f MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrVfpeea0xe9Lqpe0x e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadIhapaWaaSbaaSqaa8qacaWGMbaapaqabaaaaa@3A01@ and that can be rewritten in the form (2.1).

For a proof that FM elimination has the fundamental property mentioned in Section 2, see, e.g., de Waal et al. (2011, pages 69-70).

References

Agrawal, R., and Srikant, R. (1994). Fast Algorithms for Mining Association Rules. Technical report, IBM Almaden Research Center, San Jose, California.

Bruni, R. (2004). Discrete models for data imputation. Discrete Applied Mathematics, 144, 59-69.

Chen, B., Thibaudeau, Y. and Winkler, W.E. (2003). A Comparison Study of ACS If-Then-Else, NIM, DISCRETE Edit and Imputation Systems Using ACS Data. Working Paper No. 7, UN/ECE Work Session on Statistical Data Editing, Madrid.

de Jonge, E., and van der Loo, M. (2014). Error Localization as a Mixed Integer Problem with the Editrules Package. Discussion Paper 2014-07, Statistics Netherlands, The Hague. Available at: http://www.cbs.nl.

de Waal, T. (2003). Solving the error localization problem by means of vertex generation. Survey Methodology, 29, 1, 71-79.

de Waal, T., and Coutinho, W. (2005). Automatic editing for business surveys: An assessment for selected algorithms. International Statistical Review, 73, 73-102.

de Waal, T., and Quere, R. (2003). A fast and simple algorithm for automatic editing of mixed data. Journal of Official Statistics, 19, 383-402.

de Waal, T., Pannekoek, J. and Scholtus, S. (2011). Handbook of Statistical Data Editing and Imputation. Hoboken, New Jersey: John Wiley & Sons, Inc.

EDIMBUS (2007). Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys. Eurostat manual prepared by ISTAT, Statistics Netherlands, and SFSO.

Fellegi, I.P., and Holt, D. (1976). A systematic approach to automatic edit and imputation. Journal of the American Statistical Association, 71, 17-35.

Garfinkel, R.S., Kunnathur, A.S. and Liepins, G.E. (1988). Error localization for erroneous data: Continuous data, linear constraints. SIAM Journal on Scientific and Statistical Computing, 9, 922-931.

Ghosh-Dastidar, B., and Schafer, J.L. (2006). Outlier detection and editing procedures for continuous multivariate data. Journal of Official Statistics, 22, 487-506.

Giles, P. (1988). A model for generalized edit and imputation of survey data. The Canadian Journal of Statistics, 16, 57-73.

Granquist, L. (1995). Improving the traditional editing process. In Business Survey Methods, (Eds., B.G. Cox, D.A. Binder, B.N. Chinnappa, A. Christianson, M.J. Colledge and P.S. Kott), John Wiley & Sons, Inc., 385-401.

Granquist, L. (1997). The new view on editing. International Statistical Review, 65, 381-387.

Granquist, L., and Kovar, J. (1997). Editing of survey data: How much is enough? In Survey Measurement and Process Quality, (Eds., L.E. Lyberg, P. Biemer, M. Collins, E.D. de Leeuw, C. Dippo, N. Schwartz and D. Trewin), John Wiley & Sons, Inc., 415-435.

Hedlin, D. (2003). Score functions to reduce business survey editing at the U.K. Office for National Statistics. Journal of Official Statistics, 19, 177-199.

Hidiroglou, M.A., and Berthelot, J.-M. (1986). Statistical editing and imputation for periodic business surveys. Survey Methodology, 12, 1, 73-83.

Kovar, J., and Whitridge, P. (1990). Generalized edit and imputation system; Overview and applications. Revista Brasileira de Estadistica, 51, 85-100.

Kruskal, J.B. (1983). An overview of sequence comparison. In Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, (Eds., D. Sankoff and J.B. Kruskal), Addison-Wesley, 1-44.

Lawrence, D., and McKenzie, R. (2000). The general application of significance editing. Journal of Official Statistics, 16, 243-253.

Liepins, G.E. (1980). A Rigorous, Systematic Approach to Automatic Data Editing and its Statistical Basis. Report ORNL/TM-7126, Oak Ridge National Laboratory.

Liepins, G.E., Garfinkel, R.S. and Kunnathur, A.S. (1982). Error localization for erroneous data: A survey. TIMS/Studies in the Management Sciences, 19, 205-219.

Little, R.J.A., and Smith, P.J. (1987). Editing and imputation of quantitative survey data. Journal of the American Statistical Association, 82, 58-68.

Naus, J.I., Johnson, T.G. and Montalvo, R. (1972). A probabilistic model for identifying errors in data editing. Journal of the American Statistical Association, 67, 943-950.

Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33, 31-88.

Pannekoek, J., Scholtus, S. and van der Loo, M. (2013). Automated and manual data editing: A view on process design and methodology. Journal of Official Statistics, 29, 511-537.

Pugh, W. (1992). The omega test: A fast and practical integer programming algorithm for data dependence analysis. Communications of the ACM, 35, 102-114.

R Development Core Team (2015). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. URL: http://www.R-project.org/.

Ragsdale, C.T., and McKeown, P.G. (1996). On solving the continuous data editing problem. Computers & Operations Research, 23, 263-273.

Riera-Ledesma, J., and Salazar-González, J.J. (2003). New Algorithms for the Editing and Imputation Problem. Working Paper No. 5, UN/ECE Work Session on Statistical Data Editing, Madrid.

Riera-Ledesma, J., and Salazar-González, J.J. (2007). A branch-and-cut algorithm for the continuous error localization problem in data cleaning. Computers & Operations Research, 34, 2790-2804.

Schaffer, J. (1987). Procedure for solving the data-editing problem with both continuous and discrete data types. Naval Research Logistics, 34, 879-890.

Scholtus, S. (2011). Algorithms for correcting sign errors and rounding errors in business survey data. Journal of Official Statistics, 27, 467-490.

Scholtus, S. (2014). Error Localisation using General Edit Operations. Discussion Paper 2014-14, Statistics Netherlands, The Hague. Available at: http://www.cbs.nl.

Tempelman, D.C.G. (2007). Imputation of Restricted Data. Ph. D. Thesis, University of Groningen. Available at: http://www.cbs.nl.

van der Loo, M., and de Jonge, E. (2012). Automatic Data Editing with Open Source R. Working Paper No. 33, UN/ECE Work Session on Statistical Data Editing, Oslo.

Williams, H.P. (1986). Fourier’s method of linear programming and its dual. The American Mathematical Monthly, 93, 681-695.

Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

Privacy notice

Date modified: