# 4. Optimal solutions

Sun Woong Kim, Steven G. Heeringa and Peter W. Solenberger

Given the set of $L$ possible arrays in $\mathfrak{B}$ , consider the subset ${\mathfrak{B}}^{\text{'}}\text{\hspace{0.17em}}\left(\subseteq \mathfrak{B}\text{\hspace{0.17em}}\right)$ where

$p\left({B}_{k}\right)>0.$

A solution set to a controlled selection problem $A$ denoted as

$\left\{\left({B}_{k},p\left({B}_{k}\right)\right),\text{\hspace{0.17em}}\text{ }{B}_{k}\in {\mathfrak{B}}^{\text{'}}\right\}$

is the set of the arrays that have the required positive selection probabilities $\left(p\left({B}_{k}\right)>0\right)$ . This solution set, or simply a “solution” to the controlled selection problem, is usually obtained by an algorithm to control the constraints in (3.1) through (3.6). As described in the introduction, since Goodman and Kish (1950), many algorithms for obtaining solutions to controlled selection problems have been developed.

Until Groves and Hess (1975) suggested a computer algorithm, most solutions were manually obtained in a process that resembled solving a mathematical puzzle. Furthermore, for most problems it is possible that there is more than one solution set that meets the constraints. Since the 1980s, the computer-intensive controlled selection algorithms using transportation theory, network flow, integer programming, and LP have been developed. These algorithms may depend on highly specialized software or may be programmed to run in major software systems.

However, previous solutions ranging from manual to computer-intensive algorithms have rarely been compared empirically using a standard set of performance criteria. Therefore, we begin with the description of a concept called optimal solution sets, or more simply, optimal solutions.

The controlled selection problem $A$ is only one array, but there may be many possible arrays in $\mathfrak{B}$ . Also, only one array ${B}_{k}$ from any solution to $A$ is randomly chosen by $p\left({B}_{k}\right)$ as the basis for choosing the stratified sample. In general then, we might define an optimal solution as that satisfying the following requirements (R1 and R2):

R1. The solution is obtained based on appropriate and objective measurements of the closeness between $A$ and every single array ${B}_{k}$ in $\mathfrak{B}$ .

R2. The solution, as much as possible, maximizes the probabilities of selection over the arrays nearest to $A$ under such measurements as referenced in R1.

The remainder of this section will address how to specify R1 and R2 for optimal solutions. First, in order to define closeness in R1, a real number $d\left({B}_{k}:A\right)$ representing the distance between $A$ and ${B}_{k}$ , can be considered, where $d$ is a distance function that satisfies the following axioms:

(i)
(ii)
$d\left({B}_{k},A\right)=d\left(A,{B}_{k}\right)$ ;
(iii)

Axiom (iii) is termed the triangle inequality axiom. Distance functions satisfying (i), (ii), and (iii) can be defined by using the two-ordered $RC\text{-tuples}$ $\left({a}_{11},{a}_{12},\cdot \cdot \cdot ,{a}_{RC}\right)$ and $\left({b}_{11k},{b}_{12k},\cdot \cdot \cdot ,{b}_{RCk}\right)$ for $A$ and ${B}_{k}$ . We first define the ordinary or Euclidean distance (2-norm distance):

This function is probably the most familiar measure to define the distance between ${B}_{k}$ and $A$ .

Also, we can define the function called the Chebyshev distance (infinite norm distance):

These distance functions give rise to distinct distance spaces. Owing to (3.2), for any ${B}_{k}$ , the following holds.

$0\le {d}_{2}\left({B}_{k},A\right)<{\left(RC\right)}^{1/2}\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\left(4.3\right)$

and

$0\le {d}_{\infty }\left({B}_{k},A\right)<1.\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\left(4.4\right)$

For instance, for the $3×3$ array in Problem 2.1 and the $8×3$ array in Problem 2.3, $0<{d}_{2}\left({B}_{k},A\right)<3$ and $0<{d}_{2}\left({B}_{k},A\right)<4.9$ , respectively.

Second, as mentioned in R2, regarding the arrays nearest to $A$ under such measurements described in R1, consider the set of arrays in $\mathfrak{B}$ having the minimum $\text{\hspace{0.17em}}{d}_{2}$ or ${d}_{\infty }$ value from $A$ . Let ${\mathfrak{B}}_{2}\left(\subseteq {\mathfrak{B}}^{\prime }\right)$ be the set of the arrays having the minimum $\text{\hspace{0.17em}}{d}_{2}$ value from $A$ and ${\mathfrak{B}}_{\infty }\left(\subseteq {\mathfrak{B}}^{\prime }\right)$ be the set of the arrays having the minimum ${d}_{\infty }$ value from $A$ .

Assuming that all possible arrays in $\mathfrak{B}$ are known, we define optimum arrays as follows.

Definition. The arrays in ${\mathfrak{B}}_{2}\cup {\mathfrak{B}}_{\infty }$ are called the optimum arrays.

Note that in the new algorithm for controlled selection to be described in Section 6, $\text{\hspace{0.17em}}{d}_{2}$ or $\text{\hspace{0.17em}}{d}_{\infty }$ are chosen based on preference. We avoid defining the intersection of ${\mathfrak{B}}_{2}$ and ${\mathfrak{B}}_{\infty }$ as the optimum arrays because this may exclude the other arrays not in ${\mathfrak{B}}_{2}\cap {\mathfrak{B}}_{\infty }$ with the same minimum $\text{\hspace{0.17em}}{d}_{2}$ ( $\text{\hspace{0.17em}}{d}_{\infty }$ ) value. We illustrate below that there may exist a very small number of optimum arrays relative to the number of all possible arrays in $\mathfrak{B}$ for any $A$ . The details of how to find all possible arrays will be described in Section 6 and Section 7.

Illustrations.

For Problem 2.1 through Problem 2.4, it is noted that ${\mathfrak{B}}_{2}\subseteq {\mathfrak{B}}_{\infty }$ . Thus, it is possible to use ${d}_{\infty }$ only in illustrating the optimum arrays.

1. For Problem 2.1, there are six possible arrays satisfying (3.1), (3.2), (3.3), and (3.4). That is, $\mathfrak{B}=\left\{{B}_{k},\text{ }\text{\hspace{0.17em}}k=1,\dots ,6\right\}\text{ }\text{ }$ , as given in Table 4.1. There exists only one optimum array, ${B}_{2}$ , with the minimum value of ${d}_{\infty }=0.5$ .

Table 4.1
$3×3$ Controlled selection problem, optimum array with ${d}_{\infty }=0.5$ and the other arrays
Table summary
This table displays the results of $3×3$ Controlled selection problem. The information is grouped by $A$ , ${B}_{1}$, ${B}_{2}$, ${B}_{3}$, ${B}_{5}$, ${B}_{4}$, ${B}_{6}$(appearing as column headers).
$A$ ${B}_{1}$ ${B}_{2}$ ${B}_{3}$ ${B}_{4}$ ${B}_{5}$ ${B}_{6}$
$\begin{array}{ccc}0.8& 0.5& 0.7\\ 0.7& 0.8& 0.5\\ 0.5& 0.7& 0.8\end{array}$ $\begin{array}{ccc}0& 1& 1\\ 1& 0& 1\\ 1& 1& 0\end{array}$ $\begin{array}{ccc}1& 0& 1\\ 1& 1& 0\\ 0& 1& 1\end{array}$ $\begin{array}{ccc}1& 1& 0\\ 0& 1& 1\\ 1& 0& 1\end{array}$ $\begin{array}{ccc}0& 1& 1\\ 1& 1& 0\\ 1& 0& 1\end{array}$ $\begin{array}{ccc}1& 0& 1\\ 0& 1& 1\\ 1& 1& 0\end{array}$ $\begin{array}{ccc}1& 1& 0\\ 1& 0& 1\\ 0& 1& 1\end{array}$
${d}_{\infty }$ 0.8 0.5 0.7 0.8 0.8 0.8
1. Problem 2.2 has 30 possible arrays, and there are three optimum arrays, shown in Table 4.2

Table 4.2
$4×4$ Optimum arrays with ${d}_{\infty }=0.6$

$\begin{array}{cccc}0& 0& 1& 1\\ 1& 1& 0& 0\\ 0& 0& 1& 1\\ 1& 1& 0& 0\end{array}$
$\begin{array}{cccc}0& 1& 1& 0\\ 1& 0& 0& 1\\ 0& 0& 1& 1\\ 1& 1& 0& 0\end{array}$
$\begin{array}{cccc}0& 1& 1& 0\\ 1& 0& 1& 0\\ 1& 0& 0& 1\\ 0& 1& 0& 1\end{array}$
1. Problem 2.3 has 141 possible arrays. There are six optimum arrays, where each array has the same ${d}_{\infty }=0.6$ . One of them is given in Table 4.3.

Table 4.3
One of six optimum arrays with ${d}_{\infty }=0.6$

$\begin{array}{ccc}1& 2& 0\\ 1& 0& 1\\ 0& 0& 0\\ 1& 1& 0\\ 1& 1& 0\\ 0& 0& 1\\ 0& 0& 0\\ 0& 0& 0\end{array}$
1. There are 159 possible arrays for Problem 2.4, and there is only one optimum array given in Table 4.4.

Table 4.4
$5×5$ Optimum array with ${d}_{\infty }=0.517$

$\begin{array}{ccccc}2& 3& 1& 0& 0\\ 2& 1& 1& 1& 1\\ 0& 2& 2& 2& 1\\ 1& 0& 1& 3& 3\\ 1& 0& 2& 2& 5\end{array}$

Accordingly, based on the definition of optimum arrays as well as $\text{\hspace{0.17em}}{d}_{2}$ and ${d}_{\infty }$ satisfying the axioms (i), (ii), and (iii), we suggest the following specifications (S1 and S2) of R1 and R2 of optimal solutions:

S1. The solution is based on the values of the distance $\text{\hspace{0.17em}}{d}_{2}\left({d}_{\infty }\right)$ between $A$ and every single array ${B}_{k}$ in $\mathfrak{B}$ .

S2. The solution maximizes the probabilities of selection of optimum arrays.

S1 and S2 will be the rudiments of a new algorithm presented in Section 6, and in the next section we turn into the discussion on the previous algorithms from the viewpoint of optimal solutions.

Is something not working? Is there information outdated? Can't find what you're looking for?