# A generalized Fellegi-Holt paradigm for automatic error localization 5. Implied edits for general edit operationsA generalized Fellegi-Holt paradigm for automatic error localization 5. Implied edits for general edit operations

In this section, a result will be derived that establishes whether a given path of edit operations of the form (3.1) can be used to make a given record consistent with a given system of edit rules (i.e., is a feasible solution to the error localization problem). This result uses the FM elimination technique discussed in Section 2.

Let $x$ be a given record and let ${y}_{t}$ be any record that can be obtained by applying, in sequence, the edit operations ${g}_{1},\dots ,{g}_{t}$ to $x:$

${y}_{t}={g}_{t}\circ {g}_{t-1}\circ \cdots \circ {g}_{1}\left(x\right).\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.1\right)$

Write ${g}_{n}\left(x\right)={T}_{n}x+{S}_{n}{\alpha }_{n}+{c}_{n},$ for $n\in \left\{1,\dots ,t\right\}.$ From (5.1) it follows by induction that

$\begin{array}{ll}{y}_{1}\hfill & ={T}_{1}x+{S}_{1}{\alpha }_{1}+{c}_{1},\hfill \\ {y}_{2}\hfill & ={T}_{2}{T}_{1}x+{S}_{2}{\alpha }_{2}+{c}_{2}+{T}_{2}\left({S}_{1}{\alpha }_{1}+{c}_{1}\right),\hfill \end{array}$

and, in general,

${y}_{t}={T}_{t}\cdots {T}_{1}x+{S}_{t}{\alpha }_{t}+{c}_{t}+\sum _{n=2}^{t}{T}_{t}\cdots {T}_{n}\left({S}_{n-1}{\alpha }_{n-1}+{c}_{n-1}\right),\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.2\right)$

where the sum over $n$ is defined to be zero when $t=1.$ Moreover, all terms involving ${S}_{n}{\alpha }_{n}$ vanish in these expressions when ${g}_{n}$ does not contain any free parameters.

The path of edit operations $P=\left[{g}_{1},\dots ,{g}_{t}\right]$ can be applied to $x$ to obtain a record that is consistent with the edits (2.1) if, and only if, there exists a ${y}_{t}$ of the form (5.2) that satisfies $A{y}_{t}+b\odot 0$ and all relevant additional restrictions of the form (3.2) on ${\alpha }_{1},\dots ,{\alpha }_{t}.$ Using (5.2), $A{y}_{t}+b\odot 0$ can be written as:

$\left(A{T}_{t}\cdots {T}_{1}\right)x+\left(A{S}_{t}\right){\alpha }_{t}+\sum _{n=2}^{t}\left(A{T}_{t}\cdots {T}_{n}{S}_{n-1}\right){\alpha }_{n-1}+{b}_{t}\odot 0,\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.3\right)$

with ${b}_{t}=b+A{c}_{t}+{\sum }_{n=2}^{t}A{T}_{t}\cdots {T}_{n}{c}_{n-1}$ a vector of constants.

Interestingly, (5.3) and the possible additional restrictions of the form (3.2) constitute a linear system of the form (2.1) on the extended record ${\left({x}^{\prime },{{\alpha }^{\prime }}_{1},\dots ,{{\alpha }^{\prime }}_{t}\right)}^{\prime }.$ Therefore, FM elimination may be used to remove all free parameters from this system. This yields a system of implied restrictions for $x.$ Moreover, a repeated application of the fundamental property of FM elimination establishes that $x$ satisfies this system of implied edits if, and only if, there exist parameter values for ${\alpha }_{1},\dots ,{\alpha }_{t}$ that, together with $x,$ satisfy (5.3) and (3.2). Hence, it follows that the path of edit operations $P=\left[{g}_{1},\dots ,{g}_{t}\right]$ can lead to a consistent record for $x$ if, and only if, $x$ satisfies the system of implied edits obtained by eliminating ${\alpha }_{1},\dots ,{\alpha }_{t}$ from (5.3) and (if relevant) additional restrictions of the form (3.2).

Example. Consider the following edits in ${x}_{1}$ and ${x}_{2}:$

${x}_{1}\ge 0,\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.4\right)$

${x}_{2}\ge 0,\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.5\right)$

${x}_{1}+{x}_{2}\le 5.\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.6\right)$

Let $g$ be the edit operation that transfers an amount of at most four units between ${x}_{1}$ and ${x}_{2},$ in either direction: $g\left(\left({x}_{1},{x}_{2}\right)\prime \right)=\left({x}_{1}+\alpha ,{x}_{2}-\alpha \right)\prime$ with $-4\le \alpha \le 4.$ For this single edit operation, the system of transformed edits (5.3) is:

${x}_{1}+\alpha \ge 0,\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.7\right)$

${x}_{2}-\alpha \ge 0,\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.8\right)$

${x}_{1}+{x}_{2}\le 5.\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.9\right)$

I also add the following restrictions of the form (3.2) on $\alpha :$

$\alpha \ge -4,\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.10\right)$

$\alpha \le 4.\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.11\right)$

This yields five linear constraints (5.7) $-$ (5.11) on ${x}_{1},$ ${x}_{2},$ and $\alpha ,$ from which $\alpha$ may be removed by FM elimination to obtain:

${x}_{1}\ge -4,\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.12\right)$

${x}_{2}\ge -4,\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.13\right)$

${x}_{1}+{x}_{2}\ge 0,\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.14\right)$

${x}_{1}+{x}_{2}\le 5.\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.15\right)$

According to the theory, any record $\left({x}_{1},{x}_{2}\right)\prime$ that satisfies (5.12) $-$ (5.15) can be made consistent with the original edits (5.4) $-$ (5.6) by transferring a certain amount $-4\le \alpha \le 4$ between ${x}_{1}$ and ${x}_{2}.$ The example record $\left({x}_{1},{x}_{2}\right)\prime =\left(-2,3\right)\prime$ is inconsistent with the original edit rules (5.4) $-$ (5.6) but satisfies (5.12) $-$ (5.15). This implies that the record can be made consistent with the original edits by applying $g.$ It is easy to see that this is true; any choice $2\le \alpha \le 3$ will do.

It is interesting to note that, for the special case that $P$ consists of the single FH operation that imputes ${x}_{j},$ the transformed system of edits (5.3) is obtained by replacing every occurrence of ${x}_{j}$ in the original edits by an unrestricted parameter $\alpha .$ Eliminating $\alpha$ from (5.3) is equivalent in this case to eliminating ${x}_{j}$ directly from the original edits. In this sense, the above result generalizes the fundamental property of FM elimination for FH operations to all edit operations of the form (3.1).

In general, the set of records defined by expression (5.2) depends on the way the edit operations are ordered. Thus, two paths consisting of the same set of edit operations in a different order need not yield the same solution to the error localization problem. In this respect, general edit operations differ from FH operations (Scholtus 2014).

Is something not working? Is there information outdated? Can't find what you're looking for?