Nonlinear Programming
Mathwrist White Paper Series
Abstract
This document presents a technical overview of the nonlinear programming (NLP) feature implemented in Mathwristβs C++ Numerical Programming Library (NPL). Sequential Quadratic Programming (SQP) is considered as the state of the art nonlinear programming method. Mathwrist NPL provides a SQP solver in a very general formulation.
1 Introduction
Let be a general smooth function with gradient and Hessian . In a very general setup, nonlinear programming (NLP) solves the following optimization problem,
(1) | |||
(2) | |||
(3) | |||
(4) | |||
(5) |
Comparing to linearly constrained (LC) optimization problems, we now have an extra set of nonlinear constraints . The nonlinear constraint functions are assumed to be at least twice differentiable.
Historically, various methods were developed to tackle NLP problems, i.e. using penalty or barrier functions, augmented Lagrangian method. Sequential quadratic programming (SQP) is considered as the state of the art and one of the most powerful nonlinear programming methods.
Recall that for LC problems, we are able to move along a descent feasible direction wrt an active set of constaints such that . This is no longer the case in NLP. Assume is a feasible point wrt to an active set of nonlinear constraints. In general, there is no such a feasible direction that satisfies , even for a small step length . Instead, we have to continuously move along the feasible arc wrt . For example in 2-d case, the feasible arc of a constraint is its contour curve.
Let be the parameterization of the feasible arc wrt a single parameter . Let be the tangent vector at point to the arc, . In order to remain equality , we need
, where is the Jacobian matrix of nonlinear constraint functions . In other words, we now seek a null space direction of .
Not surprisingly, this search strategy is also applicable to general linear constraints because if we write , the Jacobian of is just . Searching in a null space direction wrt is exactly how we solve LC problems.
For brevity and without loss of generality, our discussion hereinafter assumes a simplified NLP formulation as below,
(6) |
2 QP Sub Problem
Once we decide to move along a feasible arc of a working set of nonlinear constraints from point , we could approximate the objective function by a model function with arc parameter ,
(7) | |||||
(10) |
, where is the Hessian matrix of the Lagrangian function wrt ,
(11) |
Define vector . Minimizing the model function (10) subject to moving along a feasible arc wrt a working set of nonlinear constraints is equivalent to solving the following sub QP problem,
(12) |
3 Sequential Quadratic Programming
3.1 Introduction
Continuing the idea of moving along a feasible arc and solving a tangent step from a sub QP problem, we can develop a sequential quadratic programming strategy. At a feasible point , define
We perform the following operations at each major iteration of the SQP method,
-
1.
linearize all nonlinear constraints to .
-
2.
approximate by , equivalently for .
-
3.
formulate a sub QP problem,
(13) -
4.
compute a step length of moving along .
-
5.
update , and recompute , , and and continue to the -th iteration.
There are some important and subtle issues in this procedure. First, it is unclear how we compute . This is not only because involves the Hessian of active constraints , which could be very costly to compute. But also, when we solve the sub QP problem (13) at minor iterations, the active set changes.
Secondly, vector is the step to the optimal point wrt . As soon as we move away from , . This is called departure from linearity. In theory, we should continuously move along the tangent direction with infinitesimal step length. For any though, it is possible that some nonlinear constraint is violated. So we need a merit function that trades off the reduction of and the violation of when we move along .
Thirdly, the Lagrangian function (11) involves the Lagrange multipliers of active nonlinear constraint functions. It is unclear how to obtain these Lagrange multipliers, especially between two major iterations.
3.2 Primal-dual Solution
Fix an active set of constraints and consider the unconstrained optimization of the Lagrangian function wrt ,
(14) |
The first order optimality condition at the solution requires
(15) |
, which is just the stationary condition and feasibility condition of sub QP problem (12). By working on an augmented unknown space and solving sub QP problems, we obtain an estimate of that is used to update for the -th major SQP iteration.
3.3 Quasi-Newton Approximation
One major improvement of modern SQP methods is to replace the Hessian matrix in the sub QP problem (13) by a Quasi-Newton approximation matrix . Specifically, we use BFGS update on a modified Lagrangian funciton,
Clearly, the Hessian of is same as the conventional Lagrangian function in (11). Further, the gradient and function value of agree with at . Therefore has the same primal-dual solution as .
Then for BFGS update, define
To ensure positive definiteness of , we avoid to be negative or very small. If , for a constant , two trial modifications are attempted, details in [4]. If both trials all fail to remedy the definitness of , the Hessian approximation is not updated.
3.4 Slack Variable and Merit Function
After obtaining and from the sub QP (13), we want to construct a smooth merit function to move along that balances the reduction of and the violation of . Slack variables are introduced to incorporate the violation component.
Let be a vector of constraint violation penalty factors. is 0 at the first iteration and updated at subsequent iterations. At the -th major iteration, let be the slack variables. The -th element of is defined as
(16) |
Let be the slack variables of the linearization after QP termination, . With as unknown variables, the merit function is formulated as,
(17) |
This merit function has very nice theoretical properties detailed in [5]. Most importantly, it seamlessly handles inequality constraints between major iteration using slack variables. It can be shown that the SQP method equipped with this merit function has global convergence. The definiton of in (16) is the local minimum of merit function (17) wrt to alone subject that .
We then perform a line search to compute an acceptable step length along the following search direction,
(18) |
The primal-dual augmented unknown variables are updated to start the -th major iteration,
3.5 Update Penalty Factor
In the line search method along the search direction (18), we rewrite the merit function as a univariate function wrt step length ,
(25) | |||||
(26) |
In order to make descent step move, we want be significantly negative. The choice in SNOPT method [4] is to find such that
(27) |
[5] proved the existence of a lower bound such that , condition (27) is satisfied. Further, the optimal of the following linearly constrainted least square problem
can be computed analytically. At the termination of sub QP problem (13), if condition (27) is not satisfied, we change by . On the other hand, if condition (27) is satisfied and , can be reduced to .
4 Initial Working Set and Feasibility
4.1 Initial Working Set
Given an initial guess of , we first use the βcrash startβ procedure [2] to solve a phase-I feasibility problem subject to only the general linear constraints and simple bounds in (5).
If there is no feasibile solution wrt the general linear constraints and simple bounds, the algorithm terminates immediately. Otherwise, we have found a linearly feasible point and an initial active set of only linear constraints. From now on, the active set method guarantees all linear constraints and simple bounds in (5) are always satisfied.
4.2 Sub QP Feasibility
After and are obtained, the algorithm starts major iterations by first linearizing all nonlinear constraints at point . We solve the phase-I feasibility problem again, now with the full constraint specification in formulation (13). If a feasible point is identified, the algorithm continues to solve the sub QP as usual. Otherwise, the algorithem switches to an elastic programming (EP) mode.
4.3 Elastic Programming Mode
It is possible that a feasible point does not exist wrt the linearization . In this situation, we introduce surplus variables to the NLP formulation to penalize the violation of nonlinear constraints with a given penalty factor ,
, where
The Lagrange multiplier condition for an elastic constraint requires . Let be the value of surplus variables evaluated at . Linearizing all elastic constraints at , the corresponding sub QP problem is
The algorithm gradually increases the penality factor if the major iterations keep running in EP mode. It will quit from the EP mode as soon as a sub QP problem can find a feasible starting point.
5 Termination Control
5.1 Minor Iterations
The sub QP problem (13) is solved in minor iterations by a QP active-set method. We expose three control functions as the terminate control of the sub QP problem.
-
β’
set_qp_max_iter() function specifies the max number of iterations that is allowed in a sub QP.
-
β’
set_qp_stationary_tolerance() function specifies the tolerance level to test if the stationary condition is satisfied at a QP minor iteration.
-
β’
set_qp_converge_tolerance() function specifies the tolerance level to test if the Lagrange multiplier condition is satisfied. If so, the optimal of sub QP problem is found.
An exception will be thrown when is reached at QP minor iterations. This mostly happens at the QP phase-I feasibility stage, i.e. the number of general linear constraints and simple bounds is very large but is relatively small. It rarely happens at QP optimality stage because we use an earlier termination strategy described in the section below. For the details of stationary tolerance control and convergence tolerance control , please refer to our QP white paper
5.2 QP Early Termination
Effectively, the SQP algorithm still works when sub QP (13) stops at a stationary but non-optimal point. This allows early terminating QP minor iterations for better performance. At a stationary point, we perform an early termination if one of the following conditions is satisfied,
-
β’
the improvement of QP objective is greater than a pre-defined progress control function;
-
β’
the actual number of minor iterations is greater than ;
-
β’
in EP mode, the Lagrange multiplier for linearized constraints are greater than .
5.3 Major Iterations
The SQP solver class SQP_ActiveSet is derived from IterativeMethod base class and shares the max number iterations control and convergence tolerance control defined in IterativeMethod. Both and are used to terminate SQP major iterations.
The SQP algorithm is considered as converged if the following two conditions are both satisfied,
-
1.
the sub QP terminates in normal state (not an early termination).
-
2.
the step length on a line search direction (18) satisfies,
References
- [1] Jorge Nocedal and Stephen J. Wright: Numerical Optimization, Springer, 1999
- [2] Philip E. Gill, Walter Murray and Margaret H. Wright: Practical Optimization, Academic Press, 1981
- [3] Philip. E. Gill and Elizabeth Wong: Sequential Quadratic Programming Methods, UCSD Department of Mathematics, Technical Report NA-10-03
- [4] Philip. E. Gill, Walter Murray and Michael A. Saunders: SNOPT: An SQP Algorithm for Large-Scale Constrainted Optimization. SIAM Review, Volumn 47, Number 1, pp.99-131
- [5] Philip. E. Gill, Walter Murray and Michael A. Saunders: Some Theoretical Properties of an Augmented Lagrangian Merit Function, Advances in Optimization and Parallel Computing, P.M. Pardalos, ed., North-Holland, Amsterdam, 1992, pp. 101-128
- [6] M. J. D. Powell, The Convergence of variable metric methods for nonlinearly constrainted optimization calculation, in Nonlinear Programming, 3 (Proc. Sympos., Special Interest Group Math. Programming, University of Wisconsin, WI, 1977), Academic Press, New York 1978, pp 27-63
- [7] Samuel K. Eldersveld: Large-Scale Sequential Quadratic Programming Algorithms, Technical Report SOL 92-4, September 1992.