Linearly Constrained Optimization
Mathwrist White Paper Series

Copyright ยฉMathwrist LLC 2023
(January 20, 2023)
Abstract

This document presents a technical overview of the linearly constrained (LC) optimization feature in Mathwristโ€™s C++ Numerical Programming Library (NPL). The objective is to minimize general n-dimensional smooth functions subject to linear and simple bound constraints. In Mathwrist NPL. we provide several linearly constrained optimization solvers that combine active set algorithm with line search methods.

1 Introduction

Let ฯˆโข(๐ฑ):โ„nโ†’โ„ be a twice continuously differentiable function with gradient ๐ โข(๐ฑ) and Hessian ๐‡โข(๐ฑ). We want to solve the following optimization problem subject to a set of general linear constraints and simple bound constraints,

minxโˆˆโ„nโกฯˆโข(๐ฑ),ย s.t.ย โข๐›lโ‰ค๐€๐ฑโ‰ค๐›uโขย andย โข๐ฑlโ‰ค๐ฑโ‰ค๐ฑu (1)

As we discussed in our Linear Programming (LP) and Quadratic Programming (QP) white papers, we rely on an active-set based framework to handle general linear constraints and simple bounds. The formulation (1) of linearly constrained (LC) optimization problems differs from a LP or QP problem only in the form of objective functions.

NPL LC optimization solvers naturally extend the active-set framework with specific customization of computing descent directions using line search methods. Same as LP and QP solvers, the LC solvers directly accept constraint specification in formulation (1). For brevity and without loss of generality, the discusion hereinafter assumes a canonical formulation,

minxโˆˆโ„nโกฯˆโข(๐ฑ),ย s.t.ย โข๐€๐ฑโ‰ฅ๐› (2)

Note that our implementation doesnโ€™t convert formulation (1) to the canonical form (2). Please refer to our LP and QP white papers for the details of active-set method, i.e. how simple bounds and upper bounded linear constraints are handled. For the details of line search method, users can refer to our unconstrained optimization (UC) white paper.

We will use the same notation as our LP and QP white papers. In particular, ๐’ฒ and ๐’ฉ denote the index set of working constraints and non-working constraints respectively. โ„ฐ denotes the subset of equality constraints in LC problem (2). By construction, โ„ฐโˆˆ๐’ฒ. ๐˜ and ๐™ represent the range space and null space matrix wrt active constraints ๐€๐’ฒ.

2 Optimality Conditions

The solution ๐ฑ* of a LC problem satisfies the following necessary and sufficient conditions:

  1. 1.

    ๐ฑ* is feasible. ๐€๐’ฒโข๐ฑ*=๐›๐’ฒ,๐€๐’ฉโข๐ฑ*โ‰ฅ๐›๐’ฉ;

  2. 2.

    ๐ โข(๐ฑ*)=๐€๐’ฒTโขฮปโขย or equivalentlyย โข๐™Tโข๐ โข(๐ฑ*)=0;

  3. 3.

    ๐™Tโข๐‡โข(๐ฑ*)โข๐™ is positive semi-definite (necessary) or positive definite (sufficient).

  4. 4.

    Lagrange multiplier ฮปiโ‰ฅ0 (necessary) or ฮปi>0 (sufficient), โˆ€iโˆˆ๐’ฒ-โ„ฐ;

Once an initial feasible point ๐ฑ0 is found, the active-set method automatically ensures all subsequent points ๐ฑk are feasible as well. The second condition is the stationary condition, effectively it requires the reduced gradient,

๐ ~โข(๐ฑ*)=๐™Tโข๐ โข(๐ฑ*)=0.

The third condition requires the reduced Hessian

๐‡~โข(๐ฑ*)=๐™Tโข๐‡โข(๐ฑ*)โข๐™

to be at least positive semi-definite, also called curvature condition. The stationary and curvature conditions together determine if we can find a descent feasible direction wrt the current work set ๐’ฒ.

The last condition is the Lagrange multiplier condition. If ฮปi<0, one can show that we can delete ๐€i from the current working set and obtain a descent feasible direction wrt the updated working set.

3 Review of Active-set Workflow

Given an initial guess ๐ฑ, we use the โ€œcrash startโ€ procedure in [2] to find an initial feasible point ๐ฑ0 and meanwhile identify an initial working set of constraints. This is the phase-I feasibility stage. Next, the algorithm enters iterations to repeatedly move along a descent feasible direction ๐ฉ, update the working set and test optimality conditions.

At the k-th iteration, if ๐ฑk is not a stationary point wrt the current working set, or if ๐‡~โข(๐ฑk) doesnโ€™t have positive curvature, a descent direction ๐ฉ is computed as a null space direction ๐ฉ=๐™๐ฉz for some ๐ฉz. By moving along ๐ฉ with a step length ฮฑ, we first hit some non-working constraint iโˆˆ๐’ฉ. Now equality holds at ๐€iโข(๐ฑ+ฮฑโข๐ฉ)=๐›i. Constraint i is then added to the working set ๐’ฒ. We update ๐ฑk+1=๐ฑk+ฮฑโข๐ฉ.

If ๐ฑk is a stationary point with positive curvature, optimality conditions are tested. A working constraint that violates the Lagrange multiplier condition is moved from ๐’ฒ to ๐’ฉ. At this iteration, only the working set is updated. We then start the k+1 iteration with ๐ฑk+1=๐ฑk. Whenever the working set changes, i.e. constraints are added to or removed from ๐’ฒ, ๐˜ and ๐™ are efficiently updated by a low-rank matrix update scheme, instead of being refactored from scratch.

4 Direction from Line Search

For any search direction ๐ฉ and step length ฮฑ^, by the Taylor expansion of ฯˆโข(๐ฑ) around ๐ฑk,

ฯˆโข(๐ฑk+ฮฑ^โข๐ฉ)=ฯˆโข(๐ฑk)+ฮฑ^โข๐ฉTโข๐ โข(๐ฑk)+12โขฮฑ^2โข๐ฉTโข๐‡โข(๐ฑ^)โข๐ฉ

, where

๐ฑ^=๐ฑk+ฮฑ^โขฮธโข๐ฉ,0โ‰คฮธโ‰ค1

We always move in a null space direction, ๐ฉ=๐™๐ฉz. The above expansion is written as

ฯˆโข(๐ฑk+ฮฑ^โข๐ฉ) = ฯˆโข(๐ฑk)+ฮฑ^โข๐ฉzTโข๐ ~โข(๐ฑk)+12โขฮฑ^2โข๐ฉzTโข๐‡~โข(๐ฑ^)โข๐ฉzโข, where (3)
๐ ~โข(๐ฑk) = ๐™Tโข๐ โข(๐ฑk)โขย is the reduced gradient. (4)
๐‡~โข(๐ฑ^) = ๐™Tโข๐‡โข(๐ฑ^)โข๐™โขย is the reduced Hessian. (5)

At a non-stationary point, we are able to compute a descent direction such that the linear term ๐ฉzTโข๐ ~โข(๐ฑk)<0. However, when the reduced Hessian has positive curvature, the qudaratic term in equation (3) will dominate if ฮฑ^ is too large. We use a line search method to compute an acceptable step length ฮฑ^, which behaves as the upper bound of the actual step length ฮฑ.

Mathwrist NPL offers two LC solver classes,

  1. 1.

    class GS_ActiveSetNewton

  2. 2.

    class GS_ActiveSetQuasiNewton

Both solvers are hierarchically derived from the active-set framework base class. They all use the line search method to compute step length ฮฑ^ and differ from each other on how to compute the step direction ๐ฉ.

4.1 Modified Newton Method

In LC solver class GS_ActiveSetNewton, we compute a null space direction component ๐ฉz by solving

๐ฉzโข๐‡~โข(๐ฑk)=-๐ ~โข(๐ฑk) (6)

A modified Cholesky decomposition is applied to the reduced Hessian to solve the above linear equation. Whenever possible, we use an efficient low-rank update scheme to update the Cholesky matrix instead of performing a full factorization, [3].

As a by-product, the modified Cholesky decomposition also identifies whether the reduced Hessian ๐‡~โข(๐ฑk) is positive definite. At a stationary point, if ๐‡~โข(๐ฑk) has negative curvature, the method allows us to compute a negative curvature direction. This is an advantage over the Quasi-Newton method.

On the other hand, the disadvantage of using the modified Newton method is that, we need explicitly compute the Hessian matrix ๐‡โข(๐ฑk). NPL users have an option to let the LC solver numerically approximate ๐‡โข(๐ฑk) by finite difference. This option control is provided as the last parameter in the constructor of class GS_ActiveSetNewton. When enabled, ๐‡โข(๐ฑk) is approximated by function FunctionND::hess_fwd_diff_grad().

4.2 Quasi-Newton Method

In LC solver class GS_ActiveSetQuasiNewton, we compute a null space direction component ๐ฉz by solving

๐ฉzโข๐™Tโข๐kโข๐™=-๐ ~โข(๐ฑk) (7)

, where ๐k is a Quasi-Newton approximation to the real Hessian matrix ๐‡โข(๐ฑk). ๐k is obtained by BFGS update from ๐k-1. We can also define the reduced Hessian approximation ๐~k=๐™Tโข๐kโข๐™. A reduced Cholesky decomposition is applied to ๐~k when we solve ๐ฉz.

5 Box Constrained Problems

A special case of the linearly constrained optimization problem is when there are only simple bounds imposed to unknown variable ๐ฑ,

minxโˆˆโ„nโกฯˆโข(๐ฑ),ย s.t.ย โข๐ฅโ‰ค๐ฑโ‰ค๐ฎ (8)

, which is also called a box-constrained optimization problem. The overall logic to solve box-constrained problems is still following an active-set based strategy. But in this situation the working set ๐’ฒ just holds a subset of variables that are fixed at their boundary. The non-working set ๐’ฉ holds free variables.

Now searching in a null space direction ๐ฉ is simply to search in the subspace of free variables. We use two separate solver classes GS_SimpleBoundNewton and GS_SimpleBoundQuasiNewton to handle this special case more efficiently. Once a descent search direction ๐ฉ is computed, we still use a line search method to compute the upper bound of step length ฮฑ^.

The special case solver class GS_SimpleBoundNewton computes the null space direction component ๐ฉz same as the general case in equation (6). Here the reduced gradient and reduced Hessian are just the free-variable parts of the gradient ๐ โข(๐ฑk) and Hessian ๐‡โข(๐ฑk). Modified Cholesky is used to solve the linear equation.

The special case solver class GS_SimpleBoundQuasiNewton computes ๐ฉz as a Quasi-Newton direction (7). Again, ๐k is a Hessian approximation by BFGS update. The reduced Hessian approximation ๐~k is simply the free variable part of ๐k.

6 Termination Control

All four LC solvers

  • โ€ข

    GS_ActiveSetNewton

  • โ€ข

    GS_ActiveSetQuasiNewton

  • โ€ข

    GS_SimpleBoundNewton

  • โ€ข

    GS_SimpleBoundQuasiNewton

are all indirectly derived from the IterativeMethod base class. They all share the max number iteration control and convergence tolerance control defined in IterativeMethod base class. Let ฯตc be the convergence tolerance. At a stationary point, we test Lagrange multiplier condition by ฮปiโ‰ฅ-ฯตc.

In addition, all four solvers use a stationary tolerance control ฯตs to determine whether the current iteration is at a stationary point. If the max-norm of reduced gradient satisfies |๐ ~โข(๐ฑk)|โˆž<ฯตs, or if |๐ โข(๐ฑk)Tโข๐ฉ|<ฯตs, we consider ๐ฑk is at a stationary point wrt the current working set of constraints.

References

  • [1] Jorge Nocedal and Stephen J. Wright: Numerical Optimization, Springer, 1999
  • [2] Philip E. Gill, Walter Murray and Margaret H. Wright: Practical Optimization, Academic Press, 1981
  • [3] Philip E. Gill and Walter Murray: Newton-Type Methods for Unconstrained and Linearly Constrained Optimization. Mathematical Programming 7 (1974), pp. 311-350
  • [4] Philip E. Gill, G. H. Golub, Walter Murray and Michael. A. Saunders: Methods for Modifying Matrix Factorizations, Mathematics of Computation, Volumn 28, Number 126, April 1974, pages 505-535
  • [5] Philip E. Gill, Walter Murray and Michael A. Saunders: Methods for computing and modifying the LDV factors of a matrix, Mathematics of Computation, Volumn 29, Number 132, October 1975, pages 1051-1077
  • [6] Anders. Forsgren, Philip E. Gill and Walter Murray: Computing Modified Newton Directions Using a Partial Cholesky Factorization, SIAM J. SCI. COMPUT. Vol. 16, No. 1, pp. 139-150