Linearly Constrained Optimization
Mathwrist White Paper Series

(January 20, 2023)

Abstract

This document presents a technical overview of the linearly constrained (LC) optimization feature in Mathwrist’s C++ Numerical Programming Library (NPL). The objective is to minimize general n-dimensional smooth functions subject to linear and simple bound constraints. In Mathwrist NPL. we provide several linearly constrained optimization solvers that combine active set algorithm with line search methods.

1 Introduction

Let $\psi(\mathbf{x}):\mathbb{R}^{n}\rightarrow\mathbb{R}$ be a twice continuously differentiable function with gradient $\mathbf{g}(\mathbf{x})$ and Hessian $\mathbf{H}(\mathbf{x})$ . We want to solve the following optimization problem subject to a set of general linear constraints and simple bound constraints,

\min_{x\in\mathbb{R}^{n}}\psi(\mathbf{x}),\mbox{ s.t. }\mathbf{b}_{l}\leq% \mathbf{A}\mathbf{x}\leq\mathbf{b}_{u}\mbox{ and }\mathbf{x}_{l}\leq\mathbf{x}% \leq\mathbf{x}_{u}

(1)

As we discussed in our Linear Programming (LP) and Quadratic Programming (QP) white papers, we rely on an active-set based framework to handle general linear constraints and simple bounds. The formulation (1) of linearly constrained (LC) optimization problems differs from a LP or QP problem only in the form of objective functions.

NPL LC optimization solvers naturally extend the active-set framework with specific customization of computing descent directions using line search methods. Same as LP and QP solvers, the LC solvers directly accept constraint specification in formulation (1). For brevity and without loss of generality, the discusion hereinafter assumes a canonical formulation,

\min_{x\in\mathbb{R}^{n}}\psi(\mathbf{x}),\mbox{ s.t. }\mathbf{A}\mathbf{x}% \geq\mathbf{b}

(2)

Note that our implementation doesn’t convert formulation (1) to the canonical form (2). Please refer to our LP and QP white papers for the details of active-set method, i.e. how simple bounds and upper bounded linear constraints are handled. For the details of line search method, users can refer to our unconstrained optimization (UC) white paper.

We will use the same notation as our LP and QP white papers. In particular, $\mathcal{W}$ and $\mathcal{N}$ denote the index set of working constraints and non-working constraints respectively. $\mathcal{E}$ denotes the subset of equality constraints in LC problem (2). By construction, $\mathcal{E}\in\mathcal{W}$ . $\mathbf{Y}$ and $\mathbf{Z}$ represent the range space and null space matrix wrt active constraints $\mathbf{A}_{\mathcal{W}}$ .

2 Optimality Conditions

The solution $\mathbf{x}^{*}$ of a LC problem satisfies the following necessary and sufficient conditions:

1.

$\mathbf{x}^{*}$ is feasible. $\mathbf{A}_{\mathcal{W}}\mathbf{x}^{*}=\mathbf{b}_{\mathcal{W}},\mathbf{A}_{% \mathcal{N}}\mathbf{x}^{*}\geq\mathbf{b}_{\mathcal{N}}$ ;
2.

$\mathbf{g}(\mathbf{x}^{*})=\mathbf{A}_{\mathcal{W}}^{T}\lambda\mbox{ or % equivalently }\mathbf{Z}^{T}\mathbf{g}(\mathbf{x}^{*})=0$ ;
3.

$\mathbf{Z}^{T}\mathbf{H}(\mathbf{x}^{*})\mathbf{Z}$ is positive semi-definite (necessary) or positive definite (sufficient).
4.

Lagrange multiplier $\lambda_{i}\geq 0$ (necessary) or $\lambda_{i}>0$ (sufficient), $\forall i\in\mathcal{W}-\mathcal{E}$ ;

Once an initial feasible point $\mathbf{x}_{0}$ is found, the active-set method automatically ensures all subsequent points $\mathbf{x}_{k}$ are feasible as well. The second condition is the stationary condition, effectively it requires the reduced gradient,

\tilde{\mathbf{g}}(\mathbf{x}^{*})=\mathbf{Z}^{T}\mathbf{g}(\mathbf{x}^{*})=0.

The third condition requires the reduced Hessian

\tilde{\mathbf{H}}(\mathbf{x}^{*})=\mathbf{Z}^{T}\mathbf{H}(\mathbf{x}^{*})% \mathbf{Z}

to be at least positive semi-definite, also called curvature condition. The stationary and curvature conditions together determine if we can find a descent feasible direction wrt the current work set $\mathcal{W}$ .

The last condition is the Lagrange multiplier condition. If $\lambda_{i}<0$ , one can show that we can delete $\mathbf{A}_{i}$ from the current working set and obtain a descent feasible direction wrt the updated working set.

3 Review of Active-set Workflow

Given an initial guess $\mathbf{x}$ , we use the “crash start” procedure in [2] to find an initial feasible point $\mathbf{x}_{0}$ and meanwhile identify an initial working set of constraints. This is the phase-I feasibility stage. Next, the algorithm enters iterations to repeatedly move along a descent feasible direction $\mathbf{p}$ , update the working set and test optimality conditions.

At the $k$ -th iteration, if $\mathbf{x}_{k}$ is not a stationary point wrt the current working set, or if $\tilde{\mathbf{H}}(\mathbf{x}_{k})$ doesn’t have positive curvature, a descent direction $\mathbf{p}$ is computed as a null space direction $\mathbf{p}=\mathbf{Z}\mathbf{p}_{z}$ for some $\mathbf{p}_{z}$ . By moving along $\mathbf{p}$ with a step length $\alpha$ , we first hit some non-working constraint $i\in\mathcal{N}$ . Now equality holds at $\mathbf{A}_{i}(\mathbf{x}+\alpha\mathbf{p})=\mathbf{b}_{i}$ . Constraint $i$ is then added to the working set $\mathcal{W}$ . We update $\mathbf{x}_{k+1}=\mathbf{x}_{k}+\alpha\mathbf{p}$ .

If $\mathbf{x}_{k}$ is a stationary point with positive curvature, optimality conditions are tested. A working constraint that violates the Lagrange multiplier condition is moved from $\mathcal{W}$ to $\mathcal{N}$ . At this iteration, only the working set is updated. We then start the $k+1$ iteration with $\mathbf{x}_{k+1}=\mathbf{x}_{k}$ . Whenever the working set changes, i.e. constraints are added to or removed from $\mathcal{W}$ , $\mathbf{Y}$ and $\mathbf{Z}$ are efficiently updated by a low-rank matrix update scheme, instead of being refactored from scratch.

4 Direction from Line Search

For any search direction $\mathbf{p}$ and step length $\hat{\alpha}$ , by the Taylor expansion of $\psi(\mathbf{x})$ around $\mathbf{x}_{k}$ ,

\psi(\mathbf{x}_{k}+\hat{\alpha}\mathbf{p})=\psi(\mathbf{x}_{k})+\hat{\alpha}% \mathbf{p}^{T}\mathbf{g}(\mathbf{x}_{k})+\frac{1}{2}\hat{\alpha}^{2}\mathbf{p}% ^{T}\mathbf{H}(\hat{\mathbf{x}})\mathbf{p}

, where

\hat{\mathbf{x}}=\mathbf{x}_{k}+\hat{\alpha}\theta\mathbf{p},0\leq\theta\leq 1

We always move in a null space direction, $\mathbf{p}=\mathbf{Z}\mathbf{p}_{z}$ . The above expansion is written as

$\displaystyle\psi(\mathbf{x}_{k}+\hat{\alpha}\mathbf{p})$	$\displaystyle=$	$\displaystyle\psi(\mathbf{x}_{k})+\hat{\alpha}\mathbf{p}_{z}^{T}\tilde{\mathbf% {g}}(\mathbf{x}_{k})+\frac{1}{2}\hat{\alpha}^{2}\mathbf{p}_{z}^{T}\tilde{% \mathbf{H}}(\hat{\mathbf{x}})\mathbf{p}_{z}\mbox{, where}$	(3)
$\displaystyle\tilde{\mathbf{g}}(\mathbf{x}_{k})$	$\displaystyle=$	$\displaystyle\mathbf{Z}^{T}\mathbf{g}(\mathbf{x}_{k})\mbox{ is the reduced % gradient}.$	(4)
$\displaystyle\tilde{\mathbf{H}}(\hat{\mathbf{x}})$	$\displaystyle=$	$\displaystyle\mathbf{Z}^{T}\mathbf{H}(\hat{\mathbf{x}})\mathbf{Z}\mbox{ is the% reduced Hessian}.$	(5)

At a non-stationary point, we are able to compute a descent direction such that the linear term $\mathbf{p}_{z}^{T}\tilde{\mathbf{g}}(\mathbf{x}_{k})<0$ . However, when the reduced Hessian has positive curvature, the qudaratic term in equation (3) will dominate if $\hat{\alpha}$ is too large. We use a line search method to compute an acceptable step length $\hat{\alpha}$ , which behaves as the upper bound of the actual step length $\alpha$ .

Mathwrist NPL offers two LC solver classes,

1.

class GS_ActiveSetNewton
2.

class GS_ActiveSetQuasiNewton

Both solvers are hierarchically derived from the active-set framework base class. They all use the line search method to compute step length $\hat{\alpha}$ and differ from each other on how to compute the step direction $\mathbf{p}$ .

4.1 Modified Newton Method

In LC solver class GS_ActiveSetNewton, we compute a null space direction component $\mathbf{p}_{z}$ by solving

\mathbf{p}_{z}\tilde{\mathbf{H}}(\mathbf{x}_{k})=-\tilde{\mathbf{g}}(\mathbf{x% }_{k})

(6)

A modified Cholesky decomposition is applied to the reduced Hessian to solve the above linear equation. Whenever possible, we use an efficient low-rank update scheme to update the Cholesky matrix instead of performing a full factorization, [3].

As a by-product, the modified Cholesky decomposition also identifies whether the reduced Hessian $\tilde{\mathbf{H}}(\mathbf{x}_{k})$ is positive definite. At a stationary point, if $\tilde{\mathbf{H}}(\mathbf{x}_{k})$ has negative curvature, the method allows us to compute a negative curvature direction. This is an advantage over the Quasi-Newton method.

On the other hand, the disadvantage of using the modified Newton method is that, we need explicitly compute the Hessian matrix $\mathbf{H}(\mathbf{x}_{k})$ . NPL users have an option to let the LC solver numerically approximate $\mathbf{H}(\mathbf{x}_{k})$ by finite difference. This option control is provided as the last parameter in the constructor of class GS_ActiveSetNewton. When enabled, $\mathbf{H}(\mathbf{x}_{k})$ is approximated by function FunctionND::hess_fwd_diff_grad().

4.2 Quasi-Newton Method

In LC solver class GS_ActiveSetQuasiNewton, we compute a null space direction component $\mathbf{p}_{z}$ by solving

\mathbf{p}_{z}\mathbf{Z}^{T}\mathbf{B}_{k}\mathbf{Z}=-\tilde{\mathbf{g}}(% \mathbf{x}_{k})

(7)

, where $\mathbf{B}_{k}$ is a Quasi-Newton approximation to the real Hessian matrix $\mathbf{H}(\mathbf{x}_{k})$ . $\mathbf{B}_{k}$ is obtained by BFGS update from $\mathbf{B}_{k-1}$ . We can also define the reduced Hessian approximation $\tilde{\mathbf{B}}_{k}=\mathbf{Z}^{T}\mathbf{B}_{k}\mathbf{Z}$ . A reduced Cholesky decomposition is applied to $\tilde{\mathbf{B}}_{k}$ when we solve $\mathbf{p}_{z}$ .

5 Box Constrained Problems

A special case of the linearly constrained optimization problem is when there are only simple bounds imposed to unknown variable $\mathbf{x}$ ,

\min_{x\in\mathbb{R}^{n}}\psi(\mathbf{x}),\mbox{ s.t. }\mathbf{l}\leq\mathbf{x% }\leq\mathbf{u}

(8)

, which is also called a box-constrained optimization problem. The overall logic to solve box-constrained problems is still following an active-set based strategy. But in this situation the working set $\mathcal{W}$ just holds a subset of variables that are fixed at their boundary. The non-working set $\mathcal{N}$ holds free variables.

Now searching in a null space direction $\mathbf{p}$ is simply to search in the subspace of free variables. We use two separate solver classes GS_SimpleBoundNewton and GS_SimpleBoundQuasiNewton to handle this special case more efficiently. Once a descent search direction $\mathbf{p}$ is computed, we still use a line search method to compute the upper bound of step length $\hat{\alpha}$ .

The special case solver class GS_SimpleBoundNewton computes the null space direction component $\mathbf{p}_{z}$ same as the general case in equation (6). Here the reduced gradient and reduced Hessian are just the free-variable parts of the gradient $\mathbf{g}(\mathbf{x}_{k})$ and Hessian $\mathbf{H}(\mathbf{x}_{k})$ . Modified Cholesky is used to solve the linear equation.

The special case solver class GS_SimpleBoundQuasiNewton computes $\mathbf{p}_{z}$ as a Quasi-Newton direction (7). Again, $\mathbf{B}_{k}$ is a Hessian approximation by BFGS update. The reduced Hessian approximation $\tilde{\mathbf{B}}_{k}$ is simply the free variable part of $\mathbf{B}_{k}$ .

6 Termination Control

All four LC solvers

•

GS_ActiveSetNewton
•

GS_ActiveSetQuasiNewton
•

GS_SimpleBoundNewton
•

GS_SimpleBoundQuasiNewton

are all indirectly derived from the IterativeMethod base class. They all share the max number iteration control and convergence tolerance control defined in IterativeMethod base class. Let $\epsilon_{c}$ be the convergence tolerance. At a stationary point, we test Lagrange multiplier condition by $\lambda_{i}\geq-\epsilon_{c}$ .

In addition, all four solvers use a stationary tolerance control $\epsilon_{s}$ to determine whether the current iteration is at a stationary point. If the max-norm of reduced gradient satisfies $\left|\tilde{\mathbf{g}}(\mathbf{x}_{k})\right|_{\infty}<\epsilon_{s}$ , or if $\left|\mathbf{g}(\mathbf{x}_{k})^{T}\mathbf{p}\right|<\epsilon_{s}$ , we consider $\mathbf{x}_{k}$ is at a stationary point wrt the current working set of constraints.

References

[1] Jorge Nocedal and Stephen J. Wright: Numerical Optimization, Springer, 1999
[2] Philip E. Gill, Walter Murray and Margaret H. Wright: Practical Optimization, Academic Press, 1981
[3] Philip E. Gill and Walter Murray: Newton-Type Methods for Unconstrained and Linearly Constrained Optimization. Mathematical Programming 7 (1974), pp. 311-350
[4] Philip E. Gill, G. H. Golub, Walter Murray and Michael. A. Saunders: Methods for Modifying Matrix Factorizations, Mathematics of Computation, Volumn 28, Number 126, April 1974, pages 505-535
[5] Philip E. Gill, Walter Murray and Michael A. Saunders: Methods for computing and modifying the LDV factors of a matrix, Mathematics of Computation, Volumn 29, Number 132, October 1975, pages 1051-1077
[6] Anders. Forsgren, Philip E. Gill and Walter Murray: Computing Modified Newton Directions Using a Partial Cholesky Factorization, SIAM J. SCI. COMPUT. Vol. 16, No. 1, pp. 139-150

Linearly Constrained Optimization Mathwrist White Paper Series