Sun Woong Kim, Steven G. Heeringa and Peter W. Solenberger
In the term, “Controlled Selection (or Controlled Sampling)”, “control” has a broad meaning. The pioneering paper of Goodman and Kish (1950, page 351) defined controlled selection as “...any process of selection in which, while maintaining the assigned probability for each unit, the probabilities of selection for some or all preferred combinations of out of units are larger than in stratified random sampling”.
The focus in this paper is upon controls required in deciding the number of units (e.g., primary sampling units (PSUs)) allocated to each stratum cell in a two-way stratification design, where the total number of units to be selected is smaller than the number of strata cells or the expected number of units to be selected from each stratum cell is very small. This assumes that given precision and cost constraints, simply reducing the number of strata cells or increasing the number of the sampled units is not appropriate for the design.
Here controlled selection refers to the following two-stage procedure. First, the controlled selection problem represented by a tabular array with real numbers formed by the two-way stratification design is solved according to a specified algorithm (or technique). The solution to the problem is a set of feasible arrays with nonnegative integer sample allocation to the cells of each array and probabilities of selection corresponding to each array. Second, a random selection of one of the solution arrays is made using the assigned probabilities. The integer number appearing in each cell of the selected solution array then serves as the number of sample units to be allocated to that cell of the two-way stratification. The key to the controlled selection is the algorithm that defines a set of solution arrays that achieve the controls to solve the problem.
Many controlled selection techniques have been developed since Goodman and Kish (1950) first described the application of controlled selection to a specific problem of choosing 17 PSU’s to represent the North Central States of the United States. Bryant, Hartley and Jessen (1960) proposed a simple method which was applicable in a limited number of sample situations. Raghunandanan and Bryant (1971) generalized their method and Chernick and Wright (1983) suggested an alternative. Jessen (1970) proposed two methods called “method 2” and “method 3”, both quite complicated to implement and sometimes failing to provide a solution. Jessen (1978, chapter 11) introduced a simpler algorithm for solving controlled selection problems.
Hess, Riedel and Fitzpatrick (1975) gave a detailed explanation of how to use controlled selection in order to select a representative sample of Michigan’s hospitals. Groves and Hess (1975) first suggested a formal computer algorithm for obtaining solutions to controlled selection problems with two-and three-way stratification. Heeringa and Hess (1983) reported the response to Roe Goodman’s question: How does a computer solution of highly controlled selection compare with a manual solution? The answer was “For the same sample design, computer generated controlled selection often leads to slightly higher variances than does manual controlled selection; but since the differences in precision are small and manual controlled selection is laborious, computer generated controlled selection is preferred.” Lin (1992) improved the algorithm of Groves and Hess (1975) and the software called “PCCONSEL” for their algorithm was presented by Heeringa (1998). Huang and Lin (1998) proposed a more efficient algorithm, which imposes additional constraints in the controlled selection problem with two-way stratification and uses any standard network flow computer package. Hess and Heeringa (2002) summarized investigations on controlled selection over 40 years that have been made at the Survey Research Center, University of Michigan.
Taking a different approach, Causey, Cox and Ernst (1985) proposed an algorithm that applied a transportation model to controlled selection problems with two-way stratification, based on the theory originally suggested in a previous paper of Cox and Ernst (1982). Winkler (2001) developed an integer programming algorithm quite similar to that of Causey et al. (1985). Deville and Tillé (2004) suggested an algorithm called the Cube method.
Following Rao and Nigam (1990, 1992), Sitter and Skinner (1994) applied a linear programming (LP) approach to solve controlled selection problems. Later, Tiwari and Nigam (1998) proposed an LP method that reduces the probabilities of selecting non-preferred samples.
In summary, many different algorithms for controlled selection have been investigated and described in the literature. Those most recently developed are especially computer-intensive, since they are highly dependent on available software and high speed computers. However, in spite of this evolution in the algorithms over about 60 years, a question still remains: In what sense are the solutions to a controlled selection problem obtained from those algorithms optimal?
In this paper, we define in Section 2 the two-way controlled selection problem and revisit several problems of this type that have appeared in the historical literature. In Section 3, we present the desirable constraints. In Section 4, we introduce our concept of optimal solutions to controlled selection problems. In Section 5, we describe the weaknesses in the previous algorithms. In Section 6, we suggest a new algorithm using the LP approach for achieving optimal solutions and a new publicly available software for implementing the new controlled selection algorithm is presented in Section 7. In Section 8, to show the robustness of the new algorithm, it is applied to several example controlled selection problems and the results are compared to those obtained using existing algorithms. We conclude in Section 9.