# Adaptive survey designs to minimize survey mode effects – a case study on the Dutch Labor Force Survey 3. An algorithm for solving the multi-mode optimization problem

In the previous section, we introduced the quality and cost functions and constructed a multi-mode optimization problem. The subpopulation comparability constraint, i.e., the upper limit to the maximum absolute difference between group method effects, makes the problem nonconvex and hard to solve. As a consequence, when trying to solve the multi-mode optimization problem, most general-purpose nonlinear solvers cannot do better than a local optimum. Therefore, the choice of starting points in the solvers plays an important role. As such, we propose a two-step approach. In the first step, we solve a linear programming problem (LP) that addresses the linear constraints (2.1), (2.5), (2.6) and (2.9) $-$ (2.10). In the second step, we use the optimal solution obtained in step 1 as a starting point for a local search algorithm to solve the nonconvex nonlinear problem (NNLP).

We reformulate the optimization problem to make it computationally more tractable. Since $\left|\text{\hspace{0.17em}}f\left(x\right)\text{\hspace{0.17em}}\right|=\mathrm{max}\left\{f\left(x\right)\mathrm{,}-f\left(x\right)\right\},$ we can rewrite the objective function via an additional variable $t$ and impose that $f\left(x\right)\le t$ and $-f\left(x\right)\le t.$ Clearly, $t$ has to be nonnegative. The constraints themselves do not change, they are simply replaced. The multi-mode optimization problem is given in (3.2).

We can derive the LP by removing the non-linear constraints on the comparability of method effects across subpopulations and by replacing the non-linear objective function by one of the linear constraints. We choose for minimization of costs as the LP objective. The resulting LP problem formulation is given by

$$\begin{array}{ll}\underset{p\left(s\mathrm{,}g\right)}{\text{minimize}}\hfill & {\displaystyle \sum _{s\mathrm{,}g}{N}_{g}}p\left(s\mathrm{,}g\right)c\left(s\mathrm{,}g\right)\hfill \\ \text{subjectto}\hfill & {\displaystyle \sum _{s\in {\mathcal{S}}^{R}}{N}_{g}}p\left(s\mathrm{,}g\right)\rho \left(s\mathrm{,}g\right)\ge {R}_{g}\mathrm{,}\text{\hspace{0.17em}}\forall g\in \mathcal{G}\hfill \\ \hfill & {\displaystyle \sum _{s\mathrm{,}g}{N}_{g}}p\left(s\mathrm{,}g\right)\le {S}_{\text{max}}\hfill \\ \hfill & 0\le p\left(s\mathrm{,}g\right)\le \mathrm{1,}\text{\hspace{0.17em}}\forall s\in \mathcal{S}\mathrm{,}\text{\hspace{0.17em}}g\in \mathcal{G}\hfill \\ \hfill & {\displaystyle \sum _{s\in \mathcal{S}}p\left(s\mathrm{,}g\right)}=1,\text{\hspace{0.17em}}\forall g\in \mathcal{G}\hfill \\ \hfill & {\displaystyle \sum _{s\in {\mathcal{S}}^{R}}p\left(s\mathrm{,}g\right)}\text{\hspace{0.17em}}\text{}\text{\hspace{0.17em}}0,\text{\hspace{0.17em}}\forall g\in \mathcal{G}\mathrm{.}\hfill \end{array}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(3.1)$$

$$\begin{array}{ll}\text{Minimize}\hfill & t\hfill \\ \text{subjectto}\hfill & {\displaystyle \sum _{s\mathrm{,}g}}\frac{{w}_{g}p\left(s\mathrm{,}g\right)\rho \left(s\mathrm{,}g\right)D\left(s\mathrm{,}g\right)}{{\displaystyle \sum _{{s}^{\prime}\in {\mathcal{S}}^{R}}}p\left({s}^{\prime}\mathrm{,}g\right)\rho \left({s}^{\prime}\mathrm{,}g\right)}\le t\hfill \\ \hfill & -{\displaystyle \sum _{s\mathrm{,}g}}\frac{{w}_{g}p\left(s\mathrm{,}g\right)\rho \left(s\mathrm{,}g\right)D\left(s\mathrm{,}g\right)}{{\displaystyle \sum _{{s}^{\prime}\in {\mathcal{S}}^{R}}}p\left({s}^{\prime}\mathrm{,}g\right)\rho \left({s}^{\prime}\mathrm{,}g\right)}\le t\hfill \\ \hfill & {\displaystyle \sum _{s\mathrm{,}g}{N}_{g}}p\left(s\mathrm{,}g\right)c\left(s\mathrm{,}g\right)\le B\hfill \\ \hfill & {\displaystyle \sum _{s\in {\mathcal{S}}^{R}}}{N}_{g}p\left(s\mathrm{,}g\right)\rho \left(s\mathrm{,}g\right)\ge {R}_{g}\mathrm{,}\text{\hspace{0.17em}}\forall g\in \mathcal{G}\hfill \\ \hfill & \frac{{\displaystyle \sum _{s\in {\mathcal{S}}^{R}}}p\left(s\mathrm{,}g\right)\rho \left(s\mathrm{,}g\right)D\left(s\mathrm{,}g\right)}{{\displaystyle \sum _{s\in {\mathcal{S}}^{R}}}p\left(s\mathrm{,}g\right)\rho \left(s\mathrm{,}g\right)}-\frac{{\displaystyle \sum _{s\in {\mathcal{S}}^{R}}}p\left(s\mathrm{,}h\right)\rho \left(s\mathrm{,}h\right)D\left(s\mathrm{,}h\right)}{{\displaystyle \sum _{s\in {\mathcal{S}}^{R}}}p\left(s\mathrm{,}h\right)\rho \left(s\mathrm{,}h\right)}\le M\hfill \\ \hfill & {\displaystyle \sum _{s\mathrm{,}g}{N}_{g}}p\left(s\mathrm{,}g\right)\le {S}_{\text{max}}\hfill \\ \hfill & 0\le p\left(s\mathrm{,}g\right)\le \mathrm{1,}\text{\hspace{0.17em}}\forall s\in \mathcal{S}\mathrm{,}\text{\hspace{0.17em}}g\in \mathcal{G}\hfill \\ \hfill & {\displaystyle \sum _{s\in \mathcal{S}}p\left(s\mathrm{,}g\right)}=1,\text{\hspace{0.17em}}\forall g\in \mathcal{G}\hfill \\ \hfill & {\displaystyle \sum _{s\in {\mathcal{S}}^{R}}}p\left(s\mathrm{,}g\right)\text{\hspace{0.17em}}\text{}\text{\hspace{0.17em}}0,\text{\hspace{0.17em}}\forall g\in \mathcal{G}\hfill \\ \hfill & 0\le t\mathrm{.}\hfill \end{array}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}(3.2)$$

To solve the linear problem, we use the simplex method available in R in package $boot.$ Our proposed two-step algorithm thus handles (3.1) in the first step. Denote by ${x}_{\text{LP}}^{*}$ the optimal solution obtained in the LP. In the second step, ${x}_{\text{LP}}^{*}$ is submitted to a nonlinear optimization algorithm as a starting point in order to solve (3.2). For this step, we use nonlinear algorithms available in NLOPT (see Johnson 2013), an open-source library for nonlinear optimization that can be called from R through the $nloptr$ package. The NNLP second step of the algorithm is performed only if the minimal required budget found in the LP first step is smaller than or equal to the available budget $B.$ If the minimal budget is larger, then there is no feasible solution to the optimization problem.

Given that the performance of these algorithms is problem-dependent, we choose to combine two local search algorithms in order to increase the convergence speed. Global optimization algorithms are available in the NLOPT library but their performance for our problem was significantly worse than the selected local optimization algorithms. The two selected local search algorithms are COBYLA (Constrained Optimization by Linear Approximations), introduced by Powell (1998) (see Roy 2007 for an implementation in $\mathcal{C})$ and the Augmented Lagrangian Algorithm (AUGLAG), described in Conn, Gould and Toint (1991) and Birgin and Martinez (2008). The COBYLA method builds successive linear approximations of the objective function and constraints via a simplex of $n+1$ points (in $n$ dimensions), and optimizes these approximations in a trust region at each step. The AUGLAG method combines the objective function and the nonlinear constraints into a single function, i.e., the objective plus a penalty for any violated constraint. The resulting function is then passed to another optimization algorithm as an unconstrained problem. If the constraints are violated by the solution of this sub-problem, then the size of the penalties is increased and the process is repeated. Eventually, the process must converge to the desired solution, if that exists.

As local optimizer for the AUGLAG method we choose MMA (Method of Moving Asymptotes, introduced in Svanberg 2002), based on its performance for our numerical experiments. The strategy behind MMA is as follows. At each point ${x}_{k},$ MMA forms a local approximation, that is both convex and separable, using the gradient of $f\left({x}_{k}\right)$ and the constraint functions, plus a quadratic penalty term to make the approximations conservative, e.g., upper bounds for the exact functions. Optimizing the approximation leads to a new candidate point ${x}_{k+1}.$ If the constraints are met, then the process continues from the new point ${x}_{k+1},$ otherwise, the penalty term is increased and the process is repeated.

The reason for using two local search algorithms is that AUGLAG performs better in finding the neighborhood of the global optimum but COBYLA provides a greater accuracy in locating the optimum. Therefore, the LP optimal solution is first submitted to AUGLAG and after a number of iterations, when the improvement in the objective value is below a specified threshold, the current solution of AUGLAG is submitted to COBYLA for increased accuracy. For our case study, given the precision requirements of the obtained statistics in the survey (0.5%), the results are considered accurate enough if the obtained objective value is within ${10}^{-4}$ away from the global optimum. Any further accuracy gains are completely blurred by the sampling variation and accuracy of the input parameters themselves. The computational times can run up to a few hours. Since the optimization problem does not need to be solved during data collection, this will, however, not pose practical problems.

- Date modified: