# 5. Non-optimal properties of existing methods

Sun Woong Kim, Steven G. Heeringa and Peter W. Solenberger

As described in Section 4, the algorithms for controlled selection may be divided into two parts, manual algorithms before 1980s and computer-intensive algorithms since then. For large controlled selection problems with many cells, the latter class of algorithms may be preferred. But when the problem is small, the former can be easily used without the complexity of the latter. Therefore we would not say that the former is always inferior to the latter. More objective criteria for comparing them would be necessary, and the optimal solution may be adopted as one of the better criteria to compare their strengths or weaknesses.

As discussed by Jessen (1978, pages 375-376), the algorithms of Jessen (1970) aim to minimize the number of arrays in a solution set ${\mathfrak{B}}^{\prime }$ , and the algorithm of Jessen (1978) quite easily achieves that purpose relative to those of Jessen (1970). Thus his algorithms pursue “simplicity” in formulating a solution rather than an optimal solution.

The algorithm of Causey et al. (1985) may give a “partially” optimal solution. Other than the original problem, $A$ , it sequentially creates a small number of new controlled selection problems, and then as a solution it finds only one array ${B}_{k}\text{ }\left(\in \mathfrak{B}\right)$ to be nearest to each problem, starting with $A$ . Each problem is regarded as the transportation problem of Cox and Ernst (1982), which is formed by the objective function mimicking the behavior of

Note that since function (5.1) violates the triangle inequality axiom (iii), it is not a distance function. It needs the inclusion of the $p\text{-th}$ root to be a distance function. Also, each $p\left({B}_{k}\right)$  is calculated by a simple formula. In view of the optimality requirements given by R1 and R2 the Causey et al. algorithm has the following weaknesses: 1) Since other controlled selection problems in addition to the original problem $A$ are involved, it is difficult to obtain the solution consistently based on the closeness between the unique $A$ and every individual ${B}_{k}$ in $\mathfrak{B}$ ; 2) The maximization of the probabilities of selection for the arrays nearest to $A$ is not guaranteed.

Winkler (2001) presented a modification of the method of Causey et al. (1985). Instead of using the transportation problem, he proposed integer linear programming, resulting in slight changes of the $p\left({B}_{k}\right).$ Nevertheless, the Winkler (2001) algorithm is not free from the weaknesses of the Causey et al. (1985) method.

Adopting a network flow problem approach, the Huang and Lin (1998) algorithm imposes the additional subgroup constraints in $A$ , raised by Goodman and Kish (1950). However, it does not attain objectives R1 and R2, just as in Causey et al. (1985) and Winkler (2001), since a new network, instead of a new controlled selection problem, is generated at every iteration, an arbitrary ${B}_{k}\text{\hspace{0.17em}}\left(\in \mathfrak{B}\right)$ is obtained as a solution to the network, and $p\left({B}_{k}\right)$ is calculated by a simple formula.

In contrast, the LP algorithms proposed by Sitter and Skinner (1994) and Tiwari and Nigam (1998) use all possible arrays in $\mathfrak{B}$ . Note that finding all those arrays is an important issue, and that $p\left({B}_{k}\right)$ for all possible arrays are simultaneously obtained by running the software for LP only once. The key idea underlying the algorithm of Sitter and Skinner (1994) is to use a “loss function” defined by

$\sum _{i=1}^{R}{\left({b}_{i.k}-{a}_{i.}\right)}^{2}+\sum _{j=1}^{C}{\left({b}_{.jk}-{a}_{.j}\right)}^{2}.\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.2\right)$

In terms of R1 and R2, their algorithm has the following disadvantages: 1) The closeness between $A$ and ${B}_{k}$ is not well captured by loss function (5.2). This is because it is not a distance function that satisfies axiom (iii), as the marginal totals are used, instead of the cell entries; 2) Loss function (5.2) is irrelevant to the maximization of the probabilities of selection over the arrays nearest to $A$ in Problems 2.1, 2.2, and 2.4, since it is always zero.

The LP method of Tiwari and Nigam (1998) can be used to reduce the selection probabilities of non-preferred arrays (e.g., arrays not containing the PSU corresponding to the cell $ij=23$ in Problem 2.1), which are initially determined by the samplers. For controlled selection problems with integer margins and without considering the non-preferred arrays, their method will give the same solutions as that of Sitter and Skinner (1994).

The solutions from these previous methods will be compared with those from the proposed method in Section 6, on several examples in Section 8.

Is something not working? Is there information outdated? Can't find what you're looking for?