Nonlinear Programming
Mathwrist White Paper Series

Copyright Β©Mathwrist LLC 2023
(January 20, 2023)
Abstract

This document presents a technical overview of the nonlinear programming (NLP) feature implemented in Mathwrist’s C++ Numerical Programming Library (NPL). Sequential Quadratic Programming (SQP) is considered as the state of the art nonlinear programming method. Mathwrist NPL provides a SQP solver in a very general formulation.

1 Introduction

Let ψ⁒(𝐱):ℝn→ℝ be a general smooth function with gradient 𝐠⁒(𝐱) and Hessian 𝐇⁒(𝐱). In a very general setup, nonlinear programming (NLP) solves the following optimization problem,

minxβˆˆβ„n⁑ψ⁒(𝐱),Β s.t. (1)
(2)
𝐜l≀c⁒(𝐱)β‰€πœu (3)
𝐛l≀𝐀𝐱≀𝐛u (4)
𝐱l≀𝐱≀𝐱u (5)

Comparing to linearly constrained (LC) optimization problems, we now have an extra set of nonlinear constraints 𝐜l≀c⁒(𝐱)β‰€πœu. The nonlinear constraint functions c⁒(𝐱) are assumed to be at least twice differentiable.

Historically, various methods were developed to tackle NLP problems, i.e. using penalty or barrier functions, augmented Lagrangian method. Sequential quadratic programming (SQP) is considered as the state of the art and one of the most powerful nonlinear programming methods.

Recall that for LC problems, we are able to move along a descent feasible direction 𝐩 wrt an active set 𝒲 of constaints such that 𝐀𝒲⁒𝐩=0. This is no longer the case in NLP. Assume 𝐱k is a feasible point wrt to an active set 𝒲 of nonlinear constraints. In general, there is no such a feasible direction 𝐩 that satisfies c𝒲⁒(𝐱k+α⁒𝐩)=c𝒲⁒(𝐱k), even for a small step length Ξ±. Instead, we have to continuously move along the feasible arc wrt c𝒲⁒(𝐱). For example in 2-d case, the feasible arc of a constraint ci⁒(𝐱) is its contour curve.

Let 𝐱⁒(t)=(x0⁒(t),β‹―,xn-1⁒(t)) be the parameterization of the feasible arc wrt a single parameter t. Let 𝐩 be the tangent vector at point 𝐱k to the arc, 𝐩=(βˆ‚β‘x0βˆ‚β‘t,β‹―,βˆ‚β‘xn-1βˆ‚β‘t)T. In order to remain equality c𝒲⁒(𝐱)=c𝒲⁒(𝐱k), we need

dd⁒t⁒c𝒲⁒(𝐱)=𝐉𝒲⁒(𝐱k)⁒𝐩=0

, where 𝐉𝒲⁒(𝐱k) is the Jacobian matrix of nonlinear constraint functions c𝒲⁒(𝐱k). In other words, we now seek a null space direction of 𝐉𝒲T⁒(𝐱k).

Not surprisingly, this search strategy is also applicable to general linear constraints because if we write c⁒(𝐱)=𝐀𝐱-𝐛, the Jacobian of c⁒(𝐱) is just 𝐀. Searching in a null space direction wrt 𝐀 is exactly how we solve LC problems.

For brevity and without loss of generality, our discussion hereinafter assumes a simplified NLP formulation as below,

minxβˆˆβ„n⁑ψ⁒(𝐱),Β s.t. ⁒c⁒(𝐱)β‰₯0 (6)

Please note that our SQP solver directly accepts the NLP constraint specification in (5). It is not necessary to physically convert formulation (5) to formulation (6). Please refer to our LP and QP white paper series on how we handle upper bounds and mixed constraints.

2 QP Sub Problem

Once we decide to move along a feasible arc of a working set of nonlinear constraints c𝒲⁒(𝐱) from point 𝐱k, we could approximate the objective function ψ⁒(𝐱) by a model function ψ^⁒(𝐱⁒(t)) with arc parameter t,

ψ^⁒(𝐱⁒(t)) = ψ⁒(𝐱k)+dd⁒t⁒ψ⁒(𝐱⁒(t))|t=0⁒t+12⁒d2d⁒t2⁒ψ⁒(𝐱⁒(t))|t=0⁒t2 (7)
= ψ⁒(𝐱k)+t⁒𝐠T⁒(𝐱k)⁒𝐩+12⁒t2⁒𝐩Tβ–½x⁒xℒ⁒(𝐱k)⁒𝐩 (10)

, where β–½x⁒xℒ⁒(𝐱k) is the Hessian matrix of the Lagrangian function ℒ⁒(𝐱,Ξ») wrt 𝐱,

ℒ⁒(𝐱,Ξ»)=ψ⁒(𝐱)-Ξ»T⁒c𝒲⁒(𝐱) (11)

Define vector 𝐝=t⁒𝐩. Minimizing the model function (10) subject to moving along a feasible arc wrt a working set of nonlinear constraints c𝒲⁒(𝐱) is equivalent to solving the following sub QP problem,

mindβˆˆβ„n⁑𝐠T⁒(𝐱k)⁒𝐝+12⁒𝐝Tβ–½x⁒xℒ⁒(𝐱k)⁒𝐝⁒ s.t. ⁒𝐉𝒲⁒(𝐱k)⁒𝐝=0 (12)

3 Sequential Quadratic Programming

3.1 Introduction

Continuing the idea of moving along a feasible arc and solving a tangent step 𝐝 from a sub QP problem, we can develop a sequential quadratic programming strategy. At a feasible point 𝐱k, define

𝐉k := 𝐉⁒(𝐱k)
𝐜k := c⁒(𝐱k)
𝐠k := 𝐠⁒(𝐱k)
β–½x⁒xβ„’k := β–½x⁒xℒ⁒(𝐱k)

We perform the following operations at each major iteration of the SQP method,

  1. 1.

    linearize all nonlinear constraints c⁒(𝐱) to c^⁒(𝐱)=𝐜k+𝐉k⁒(𝐱-𝐱k).

  2. 2.

    approximate c⁒(𝐱)β‰₯0 by c^⁒(𝐱)β‰₯0, equivalently 𝐉k⁒𝐝β‰₯-𝐜k for 𝐝=𝐱-𝐱k.

  3. 3.

    formulate a sub QP problem,

    mindβˆˆβ„n⁑𝐠kT⁒𝐝+12⁒𝐝Tβ–½x⁒xβ„’k⁒𝐝⁒ s.t. ⁒𝐉k⁒𝐝β‰₯-𝐜k (13)
  4. 4.

    compute a step length α of moving along 𝐝.

  5. 5.

    update 𝐱k+1=𝐱k+α⁒𝐝, and recompute 𝐉k+1, 𝐠k+1, 𝐜k+1 and β–½x⁒xβ„’k+1 and continue to the (k+1)-th iteration.

There are some important and subtle issues in this procedure. First, it is unclear how we compute β–½x⁒xβ„’k. This is not only because β–½x⁒xβ„’k involves the Hessian of active constraints c𝒲⁒(𝐱), which could be very costly to compute. But also, when we solve the sub QP problem (13) at minor iterations, the active set 𝒲 changes.

Secondly, vector 𝐝 is the step to the optimal point wrt c^⁒(𝐱). As soon as we move away from 𝐱k, c⁒(𝐱)β‰ c^⁒(𝐱). This is called departure from linearity. In theory, we should continuously move along the tangent direction 𝐩 with infinitesimal step length. For any Ξ±>0 though, it is possible that some nonlinear constraint is violated. So we need a merit function that trades off the reduction of ψ⁒(𝐱) and the violation of c⁒(𝐱) when we move along 𝐝.

Thirdly, the Lagrangian function (11) involves the Lagrange multipliers Ξ» of active nonlinear constraint functions. It is unclear how to obtain these Lagrange multipliers, especially between two major iterations.

Different SQP methods make different choices on these subtle issues. Our implementation is mostly based on the SNOPT method [4]. Users can refer to [4] for further details. Here, we will give a brief summary on those choices implemented in Mathwrist NPL.

3.2 Primal-dual Solution

Fix an active set of constraints c𝒲⁒(𝐱)=0 and consider the unconstrained optimization of the Lagrangian function ℒ⁒(𝐱,Ξ») wrt c𝒲⁒(𝐱),

minxβˆˆβ„n,Ξ»βˆˆβ„m⁑ℒ⁒(x,Ξ») (14)

The first order optimality condition at the solution (𝐱*,λ*) requires

(𝐠⁒(𝐱*)-𝐉T⁒(𝐱*)⁒λ*=0c𝒲⁒(𝐱*)=0) (15)

, which is just the stationary condition and feasibility condition of sub QP problem (12). By working on an augmented unknown space (𝐱,λ) and solving sub QP problems, we obtain an estimate of (𝐱*,λ*) that is used to update (𝐱k+1,λk+1) for the (k+1)-th major SQP iteration.

NPL actually solves an inequality constrained sub QP problem (13). Let 𝒲k+1 be the active set obtained after we solve sub QP (13), which is used to start the (k+1)-th major iteration. Let ΞΌ be the Lagrange multipliers after solving (13). By strict complementary conditions, ΞΌi=0, βˆ€iβˆ‰π’²k+1.

3.3 Quasi-Newton Approximation

One major improvement of modern SQP methods is to replace the Hessian matrix β–½x⁒xβ„’k in the sub QP problem (13) by a Quasi-Newton approximation matrix 𝐁k. Specifically, we use BFGS update on a modified Lagrangian funciton,

β„’m⁒(𝐱,Ξ»)=ψ⁒(𝐱)-Ξ»T⁒(c⁒(𝐱)-c^⁒(𝐱))

Clearly, the Hessian of β„’M⁒(𝐱,Ξ») is same as the conventional Lagrangian function ℒ⁒(𝐱,Ξ») in (11). Further, the gradient and function value of β„’m⁒(𝐱,Ξ») agree with ℒ⁒(𝐱,Ξ») at 𝐱k. Therefore β„’M⁒(𝐱,Ξ») has the same primal-dual solution as ℒ⁒(𝐱,Ξ»).

Then for BFGS update, define

δ = 𝐱k+1-𝐱k
𝐲 = β–½β„’m(𝐱k+1,Ξ»k+1)-β–½β„’m(𝐱k,Ξ»k+1)
= 𝐠k+1-𝐠k-(𝐉k+1-𝐉k)T⁒λk+1

To ensure positive definiteness of 𝐁k+1, we avoid 𝐲T⁒δ to be negative or very small. If 𝐲T⁒δ<σ⁒ , where ⁒σ=α⁒(1-Ξ·)⁒𝐝T⁒𝐁k⁒𝐝, for a constant 0<Ξ·<1, two trial modifications are attempted, details in [4]. If both trials all fail to remedy the definitness of 𝐁k+1, the Hessian approximation is not updated.

3.4 Slack Variable and Merit Function

After obtaining 𝐝 and μ from the sub QP (13), we want to construct a smooth merit function to move along 𝐝 that balances the reduction of ψ⁒(𝐱) and the violation of c⁒(𝐱). Slack variables are introduced to incorporate the violation component.

Let ρ be a vector of constraint violation penalty factors. ρ is 0 at the first iteration and updated at subsequent iterations. At the k-th major iteration, let 𝐬k be the slack variables. The i-th element sk,i of 𝐬k is defined as

sk,i={max⁑(0,ci⁒(𝐱k))⁒, if ⁒ρi=0max⁑(0,ci⁒(𝐱k)-λi/ρi)⁒, otherwise (16)

Let 𝐬^k be the slack variables of the linearization c^⁒(𝐱) after QP termination, 𝐬^k=𝐜k+𝐉k⁒𝐝. With (𝐱,Ξ»,𝐬) as unknown variables, the merit function is formulated as,

ℳρ⁒(𝐱,Ξ»,𝐬)=ψ⁒(𝐱)-Ξ»T⁒(c⁒(𝐱)-𝐬)+12β’βˆ‘i=1mρi⁒(ci⁒(𝐱)-𝐬i)2 (17)

This merit function has very nice theoretical properties detailed in [5]. Most importantly, it seamlessly handles inequality constraints between major iteration using slack variables. It can be shown that the SQP method equipped with this merit function has global convergence. The definiton of 𝐬k in (16) is the local minimum of merit function (17) wrt to 𝐬 alone subject that 𝐬β‰₯0.

We then perform a line search to compute an acceptable step length Ξ± along the following search direction,

(𝐝ξπͺ)=(𝐝μ-Ξ»k𝐬^k-𝐬k)=(𝐝μ-Ξ»k𝐉k⁒𝐝+𝐜k-𝐬k) (18)

The primal-dual augmented unknown variables are updated to start the (k+1)-th major iteration,

𝐱k+1 = 𝐱k+α⁒𝐝
λk+1 = λk+α⁒ξ

3.5 Update Penalty Factor ρ

In the line search method along the search direction (18), we rewrite the merit function as a univariate function ϕ⁒(Ξ±) wrt step length Ξ±,

v⁒(Ξ±) = (𝐱kΞ»k𝐬k)+α⁒(𝐝ξπͺ) (25)
ϕρ⁒(Ξ±) = ℳρ⁒(v⁒(Ξ±)) (26)

In order to make descent step move, we want ϕ′⁒(0) be significantly negative. The choice in SNOPT method [4] is to find ρ such that

ϕρ′⁒(0)<-12⁒𝐝T⁒𝐁k⁒𝐝 (27)

[5] proved the existence of a lower bound ρ^>0 such that βˆ€Ο>ρ^, condition (27) is satisfied. Further, the optimal ρ* of the following linearly constrainted least square problem

minρ⁑12⁒βˆ₯ρβˆ₯2⁒, s.t. ⁒ϕρ′⁒(0)=-12⁒𝐝T⁒𝐁k⁒𝐝,ρβ‰₯0

can be computed analytically. At the termination of sub QP problem (13), if condition (27) is not satisfied, we change ρ by ρ*. On the other hand, if condition (27) is satisfied and ρ>ρ*, ρ can be reduced to ρ*.

4 Initial Working Set and Feasibility

4.1 Initial Working Set 𝒲0

Given an initial guess of 𝐱, we first use the β€œcrash start” procedure [2] to solve a phase-I feasibility problem subject to only the general linear constraints 𝐛l≀𝐀𝐱≀𝐛u and simple bounds 𝐱l≀𝐱≀𝐱u in (5).

If there is no feasibile solution wrt the general linear constraints and simple bounds, the algorithm terminates immediately. Otherwise, we have found a linearly feasible point 𝐱0 and an initial active set 𝒲0 of only linear constraints. From now on, the active set method guarantees all linear constraints and simple bounds in (5) are always satisfied.

4.2 Sub QP Feasibility

After 𝐱0 and 𝒲0 are obtained, the algorithm starts major iterations by first linearizing all nonlinear constraints c⁒(𝐱) at point 𝐱k. We solve the phase-I feasibility problem again, now with the full constraint specification in formulation (13). If a feasible point is identified, the algorithm continues to solve the sub QP as usual. Otherwise, the algorithem switches to an elastic programming (EP) mode.

4.3 Elastic Programming Mode

It is possible that a feasible point does not exist wrt the linearization c^⁒(𝐱k). In this situation, we introduce surplus variables 𝐰 to the NLP formulation to penalize the violation of nonlinear constraints with a given penalty factor γ,

minπ±βˆˆβ„n,π°βˆˆβ„m⁑ψ(e)⁒(𝐱,𝐰)⁒ s.t. ⁒c(e)⁒(𝐱,𝐰)β‰₯0,𝐰β‰₯0

, where

ψ(e)⁒(𝐱,𝐰)=ψ⁒(𝐱)+γ⁒𝐞T⁒𝐰
c(e)⁒(𝐱,𝐰)=c⁒(𝐱)+𝐰

The Lagrange multiplier condition for an elastic constraint ci⁒(𝐱)+𝐰iβ‰₯0 requires 0≀λi≀γ. Let 𝐰k be the value of surplus variables evaluated at 𝐱k. Linearizing all elastic constraints c(e)⁒(𝐱,𝐰) at (𝐱k,𝐰k), the corresponding sub QP problem is

mindβˆˆβ„n,Δ⁒wβˆˆβ„m 𝐠kT⁒𝐝+12⁒𝐝T⁒𝐁k⁒𝐝+γ⁒𝐞T⁒Δ⁒𝐰⁒, s.t.
𝐉k⁒𝐝+Δ⁒𝐰β‰₯-𝐜k-𝐰k
Δ⁒𝐰β‰₯-𝐰k

The algorithm gradually increases the penality factor Ξ³ if the major iterations keep running in EP mode. It will quit from the EP mode as soon as a sub QP problem can find a feasible starting point.

5 Termination Control

5.1 Minor Iterations

The sub QP problem (13) is solved in minor iterations by a QP active-set method. We expose three control functions as the terminate control of the sub QP problem.

  • β€’

    set_qp_max_iter() function specifies the max number of iterations M that is allowed in a sub QP.

  • β€’

    set_qp_stationary_tolerance() function specifies the tolerance level ϡ¯s to test if the stationary condition is satisfied at a QP minor iteration.

  • β€’

    set_qp_converge_tolerance() function specifies the tolerance level ϡ¯c to test if the Lagrange multiplier condition is satisfied. If so, the optimal of sub QP problem is found.

An exception will be thrown when M is reached at QP minor iterations. This mostly happens at the QP phase-I feasibility stage, i.e. the number of general linear constraints and simple bounds is very large but M is relatively small. It rarely happens at QP optimality stage because we use an earlier termination strategy described in the section below. For the details of stationary tolerance control ϡ¯s and convergence tolerance control ϡ¯c, please refer to our QP white paper

5.2 QP Early Termination

Effectively, the SQP algorithm still works when sub QP (13) stops at a stationary but non-optimal point. This allows early terminating QP minor iterations for better performance. At a stationary point, we perform an early termination if one of the following conditions is satisfied,

  • β€’

    the improvement of QP objective is greater than a pre-defined progress control function;

  • β€’

    the actual number of minor iterations is greater than 2⁒M3;

  • β€’

    in EP mode, the Lagrange multiplier for linearized constraints are greater than 2⁒γ.

5.3 Major Iterations

The SQP solver class SQP_ActiveSet is derived from IterativeMethod base class and shares the max number iterations control N and convergence tolerance control Ο΅c defined in IterativeMethod. Both N and Ο΅c are used to terminate SQP major iterations.

The SQP algorithm is considered as converged if the following two conditions are both satisfied,

  1. 1.

    the sub QP terminates in normal state (not an early termination).

  2. 2.

    the step length Ξ± on a line search direction (18) satisfies,

    α⁒|𝐝|∞1+|𝐱k|∞<ϡc

References

  • [1] Jorge Nocedal and Stephen J. Wright: Numerical Optimization, Springer, 1999
  • [2] Philip E. Gill, Walter Murray and Margaret H. Wright: Practical Optimization, Academic Press, 1981
  • [3] Philip. E. Gill and Elizabeth Wong: Sequential Quadratic Programming Methods, UCSD Department of Mathematics, Technical Report NA-10-03
  • [4] Philip. E. Gill, Walter Murray and Michael A. Saunders: SNOPT: An SQP Algorithm for Large-Scale Constrainted Optimization. SIAM Review, Volumn 47, Number 1, pp.99-131
  • [5] Philip. E. Gill, Walter Murray and Michael A. Saunders: Some Theoretical Properties of an Augmented Lagrangian Merit Function, Advances in Optimization and Parallel Computing, P.M. Pardalos, ed., North-Holland, Amsterdam, 1992, pp. 101-128
  • [6] M. J. D. Powell, The Convergence of variable metric methods for nonlinearly constrainted optimization calculation, in Nonlinear Programming, 3 (Proc. Sympos., Special Interest Group Math. Programming, University of Wisconsin, WI, 1977), Academic Press, New York 1978, pp 27-63
  • [7] Samuel K. Eldersveld: Large-Scale Sequential Quadratic Programming Algorithms, Technical Report SOL 92-4, September 1992.