Nonlinear Programming
Mathwrist White Paper Series

(January 20, 2023)

Abstract

This document presents a technical overview of the nonlinear programming (NLP) feature implemented in Mathwrist’s C++ Numerical Programming Library (NPL). Sequential Quadratic Programming (SQP) is considered as the state of the art nonlinear programming method. Mathwrist NPL provides a SQP solver in a very general formulation.

1 Introduction

Let $\psi(\mathbf{x}):\mathbb{R}^{n}\rightarrow\mathbb{R}$ be a general smooth function with gradient $\mathbf{g}(\mathbf{x})$ and Hessian $\mathbf{H}(\mathbf{x})$ . In a very general setup, nonlinear programming (NLP) solves the following optimization problem,

	$\displaystyle\min_{x\in\mathbb{R}^{n}}\psi(\mathbf{x}),\mbox{ s.t. }$		(1)
			(2)
	$\displaystyle\mathbf{c}_{l}\leq c(\mathbf{x})\leq\mathbf{c}_{u}$		(3)
	$\displaystyle\mathbf{b}_{l}\leq\mathbf{A}\mathbf{x}\leq\mathbf{b}_{u}$		(4)
	$\displaystyle\mathbf{x}_{l}\leq\mathbf{x}\leq\mathbf{x}_{u}$		(5)

Comparing to linearly constrained (LC) optimization problems, we now have an extra set of nonlinear constraints $\mathbf{c}_{l}\leq c(\mathbf{x})\leq\mathbf{c}_{u}$ . The nonlinear constraint functions $c(\mathbf{x})$ are assumed to be at least twice differentiable.

Historically, various methods were developed to tackle NLP problems, i.e. using penalty or barrier functions, augmented Lagrangian method. Sequential quadratic programming (SQP) is considered as the state of the art and one of the most powerful nonlinear programming methods.

Recall that for LC problems, we are able to move along a descent feasible direction $\mathbf{p}$ wrt an active set $\mathcal{W}$ of constaints such that $\mathbf{A}_{\mathcal{W}}\mathbf{p}=0$ . This is no longer the case in NLP. Assume $\mathbf{x}_{k}$ is a feasible point wrt to an active set $\mathcal{W}$ of nonlinear constraints. In general, there is no such a feasible direction $\mathbf{p}$ that satisfies $c_{\mathcal{W}}(\mathbf{x}_{k}+\alpha\mathbf{p})=c_{\mathcal{W}}(\mathbf{x}_{k})$ , even for a small step length $\alpha$ . Instead, we have to continuously move along the feasible arc wrt $c_{\mathcal{W}}(\mathbf{x})$ . For example in 2-d case, the feasible arc of a constraint $c_{i}(\mathbf{x})$ is its contour curve.

Let $\mathbf{x}(t)=(x_{0}(t),\cdots,x_{n-1}(t))$ be the parameterization of the feasible arc wrt a single parameter $t$ . Let $\mathbf{p}$ be the tangent vector at point $\mathbf{x}_{k}$ to the arc, $\mathbf{p}=\left(\frac{\partial x_{0}}{\partial t},\cdots,\frac{\partial x_{n-% 1}}{\partial t}\right)^{T}$ . In order to remain equality $c_{\mathcal{W}}(\mathbf{x})=c_{\mathcal{W}}(\mathbf{x}_{k})$ , we need

\frac{d}{dt}c_{\mathcal{W}}(\mathbf{x})=\mathbf{J}_{\mathcal{W}}(\mathbf{x}_{k% })\mathbf{p}=0

, where $\mathbf{J}_{\mathcal{W}}(\mathbf{x}_{k})$ is the Jacobian matrix of nonlinear constraint functions $c_{\mathcal{W}}(\mathbf{x}_{k})$ . In other words, we now seek a null space direction of $\mathbf{J}_{\mathcal{W}}^{T}(\mathbf{x}_{k})$ .

Not surprisingly, this search strategy is also applicable to general linear constraints because if we write $c(\mathbf{x})=\mathbf{A}\mathbf{x}-\mathbf{b}$ , the Jacobian of $c(\mathbf{x})$ is just $\mathbf{A}$ . Searching in a null space direction wrt $\mathbf{A}$ is exactly how we solve LC problems.

For brevity and without loss of generality, our discussion hereinafter assumes a simplified NLP formulation as below,

\min_{x\in\mathbb{R}^{n}}\psi(\mathbf{x}),\mbox{ s.t. }c(\mathbf{x})\geq 0

(6)

Please note that our SQP solver directly accepts the NLP constraint specification in (5). It is not necessary to physically convert formulation (5) to formulation (6). Please refer to our LP and QP white paper series on how we handle upper bounds and mixed constraints.

2 QP Sub Problem

Once we decide to move along a feasible arc of a working set of nonlinear constraints $c_{\mathcal{W}}(\mathbf{x})$ from point $\mathbf{x}_{k}$ , we could approximate the objective function $\psi(\mathbf{x})$ by a model function $\hat{\psi}(\mathbf{x}(t))$ with arc parameter $t$ ,

	$\displaystyle\hat{\psi}(\mathbf{x}(t))$	$\displaystyle=$	$\displaystyle\psi(\mathbf{x}_{k})+\left.\frac{d}{dt}\psi(\mathbf{x}(t))\right\|% _{t=0}t+\left.\frac{1}{2}\frac{d^{2}}{dt^{2}}\psi(\mathbf{x}(t))\right\|_{t=0}t% ^{2}$		(7)
		$\displaystyle=$	$\displaystyle\psi(\mathbf{x}_{k})+t\mathbf{g}^{T}(\mathbf{x}_{k})\mathbf{p}+% \frac{1}{2}t^{2}\mathbf{p}^{T}\bigtriangledown_{xx}\mathcal{L}(\mathbf{x}_{k})% \mathbf{p}$		(10)

, where $\bigtriangledown_{xx}\mathcal{L}(\mathbf{x}_{k})$ is the Hessian matrix of the Lagrangian function $\mathcal{L}(\mathbf{x},\lambda)$ wrt $\mathbf{x}$ ,

\mathcal{L}(\mathbf{x},\lambda)=\psi(\mathbf{x})-\lambda^{T}c_{\mathcal{W}}(% \mathbf{x})

(11)

Define vector $\mathbf{d}=t\mathbf{p}$ . Minimizing the model function (10) subject to moving along a feasible arc wrt a working set of nonlinear constraints $c_{\mathcal{W}}(\mathbf{x})$ is equivalent to solving the following sub QP problem,

\displaystyle\min_{d\in\mathbb{R}^{n}}\mathbf{g}^{T}(\mathbf{x}_{k})\mathbf{d}% +\frac{1}{2}\mathbf{d}^{T}\bigtriangledown_{xx}\mathcal{L}(\mathbf{x}_{k})% \mathbf{d}\mbox{ s.t. }\mathbf{J}_{\mathcal{W}}(\mathbf{x}_{k})\mathbf{d}=0

(12)

3 Sequential Quadratic Programming

3.1 Introduction

Continuing the idea of moving along a feasible arc and solving a tangent step $\mathbf{d}$ from a sub QP problem, we can develop a sequential quadratic programming strategy. At a feasible point $\mathbf{x}_{k}$ , define

$\displaystyle\mathbf{J}_{k}$	$\displaystyle:=$	$\displaystyle\mathbf{J}(\mathbf{x}_{k})$
$\displaystyle\mathbf{c}_{k}$	$\displaystyle:=$	$\displaystyle c(\mathbf{x}_{k})$
$\displaystyle\mathbf{g}_{k}$	$\displaystyle:=$	$\displaystyle\mathbf{g}(\mathbf{x}_{k})$
$\displaystyle\bigtriangledown_{xx}\mathcal{L}_{k}$	$\displaystyle:=$	$\displaystyle\bigtriangledown_{xx}\mathcal{L}(\mathbf{x}_{k})$

We perform the following operations at each major iteration of the SQP method,

1.

linearize all nonlinear constraints $c(\mathbf{x})$ to $\hat{c}(\mathbf{x})=\mathbf{c}_{k}+\mathbf{J}_{k}(\mathbf{x}-\mathbf{x}_{k})$ .
2.

approximate $c(\mathbf{x})\geq 0$ by $\hat{c}(\mathbf{x})\geq 0$ , equivalently $\mathbf{J}_{k}\mathbf{d}\geq-\mathbf{c}_{k}$ for $\mathbf{d}=\mathbf{x}-\mathbf{x}_{k}$ .
3.

formulate a sub QP problem,

$\min_{d\in\mathbb{R}^{n}}\mathbf{g}_{k}^{T}\mathbf{d}+\frac{1}{2}\mathbf{d}^{T% }\bigtriangledown_{xx}\mathcal{L}_{k}\mathbf{d}\mbox{ s.t. }\mathbf{J}_{k}% \mathbf{d}\geq-\mathbf{c}_{k}$ (13)
4.

compute a step length $\alpha$ of moving along $\mathbf{d}$ .
5.

update $\mathbf{x}_{k+1}=\mathbf{x}_{k}+\alpha\mathbf{d}$ , and recompute $\mathbf{J}_{k+1}$ , $\mathbf{g}_{k+1}$ , $\mathbf{c}_{k+1}$ and $\bigtriangledown_{xx}\mathcal{L}_{k+1}$ and continue to the $(k+1)$ -th iteration.

There are some important and subtle issues in this procedure. First, it is unclear how we compute $\bigtriangledown_{xx}\mathcal{L}_{k}$ . This is not only because $\bigtriangledown_{xx}\mathcal{L}_{k}$ involves the Hessian of active constraints $c_{\mathcal{W}}(\mathbf{x})$ , which could be very costly to compute. But also, when we solve the sub QP problem (13) at minor iterations, the active set $\mathcal{W}$ changes.

Secondly, vector $\mathbf{d}$ is the step to the optimal point wrt $\hat{c}(\mathbf{x})$ . As soon as we move away from $\mathbf{x}_{k}$ , $c(\mathbf{x})\neq\hat{c}(\mathbf{x})$ . This is called departure from linearity. In theory, we should continuously move along the tangent direction $\mathbf{p}$ with infinitesimal step length. For any $\alpha>0$ though, it is possible that some nonlinear constraint is violated. So we need a merit function that trades off the reduction of $\psi(\mathbf{x})$ and the violation of $c(\mathbf{x})$ when we move along $\mathbf{d}$ .

Thirdly, the Lagrangian function (11) involves the Lagrange multipliers $\lambda$ of active nonlinear constraint functions. It is unclear how to obtain these Lagrange multipliers, especially between two major iterations.

Different SQP methods make different choices on these subtle issues. Our implementation is mostly based on the SNOPT method [4]. Users can refer to [4] for further details. Here, we will give a brief summary on those choices implemented in Mathwrist NPL.

3.2 Primal-dual Solution

Fix an active set of constraints $c_{\mathcal{W}}(\mathbf{x})=0$ and consider the unconstrained optimization of the Lagrangian function $\mathcal{L}(\mathbf{x},\lambda)$ wrt $c_{\mathcal{W}}(\mathbf{x})$ ,

\min_{x\in\mathbb{R}^{n},\lambda\in\mathbb{R}^{m}}\mathcal{L}(x,\lambda)

(14)

The first order optimality condition at the solution $(\mathbf{x}^{*},\lambda^{*})$ requires

\left(\begin{array}[]{ccc}\mathbf{g}(\mathbf{x}^{*})-\mathbf{J}^{T}(\mathbf{x}% ^{*})\lambda^{*}&=&0\cr c_{\mathcal{W}}(\mathbf{x}^{*})&=&0\end{array}\right)

(15)

, which is just the stationary condition and feasibility condition of sub QP problem (12). By working on an augmented unknown space $(\mathbf{x},\lambda)$ and solving sub QP problems, we obtain an estimate of $(\mathbf{x}^{*},\lambda^{*})$ that is used to update $(\mathbf{x}_{k+1},\lambda_{k+1})$ for the $(k+1)$ -th major SQP iteration.

NPL actually solves an inequality constrained sub QP problem (13). Let $\mathcal{W}_{k+1}$ be the active set obtained after we solve sub QP (13), which is used to start the $(k+1)$ -th major iteration. Let $\mu$ be the Lagrange multipliers after solving (13). By strict complementary conditions, $\mu_{i}=0$ , $\forall i\notin\mathcal{W}_{k+1}$ .

3.3 Quasi-Newton Approximation

One major improvement of modern SQP methods is to replace the Hessian matrix $\bigtriangledown_{xx}\mathcal{L}_{k}$ in the sub QP problem (13) by a Quasi-Newton approximation matrix $\mathbf{B}_{k}$ . Specifically, we use BFGS update on a modified Lagrangian funciton,

\mathcal{L}_{m}(\mathbf{x},\lambda)=\psi(\mathbf{x})-\lambda^{T}\left(c(% \mathbf{x})-\hat{c}(\mathbf{x})\right)

Clearly, the Hessian of $\mathcal{L}_{M}(\mathbf{x},\lambda)$ is same as the conventional Lagrangian function $\mathcal{L}(\mathbf{x},\lambda)$ in (11). Further, the gradient and function value of $\mathcal{L}_{m}(\mathbf{x},\lambda)$ agree with $\mathcal{L}(\mathbf{x},\lambda)$ at $\mathbf{x}_{k}$ . Therefore $\mathcal{L}_{M}(\mathbf{x},\lambda)$ has the same primal-dual solution as $\mathcal{L}(\mathbf{x},\lambda)$ .

Then for BFGS update, define

$\displaystyle\delta$	$\displaystyle=$	$\displaystyle\mathbf{x}_{k+1}-\mathbf{x}_{k}$
$\displaystyle\mathbf{y}$	$\displaystyle=$	$\displaystyle\bigtriangledown\mathcal{L}_{m}(\mathbf{x}_{k+1},\lambda_{k+1})-% \bigtriangledown\mathcal{L}_{m}(\mathbf{x}_{k},\lambda_{k+1})$
	$\displaystyle=$	$\displaystyle\mathbf{g}_{k+1}-\mathbf{g}_{k}-(\mathbf{J}_{k+1}-\mathbf{J}_{k})% ^{T}\lambda_{k+1}$

To ensure positive definiteness of $\mathbf{B}_{k+1}$ , we avoid $\mathbf{y}^{T}\delta$ to be negative or very small. If $\mathbf{y}^{T}\delta<\sigma\mbox{ , where }\sigma=\alpha(1-\eta)\mathbf{d}^{T}% \mathbf{B}_{k}\mathbf{d}$ , for a constant $0<\eta<1$ , two trial modifications are attempted, details in [4]. If both trials all fail to remedy the definitness of $\mathbf{B}_{k+1}$ , the Hessian approximation is not updated.

3.4 Slack Variable and Merit Function

After obtaining $\mathbf{d}$ and $\mu$ from the sub QP (13), we want to construct a smooth merit function to move along $\mathbf{d}$ that balances the reduction of $\psi(\mathbf{x})$ and the violation of $c(\mathbf{x})$ . Slack variables are introduced to incorporate the violation component.

Let $\rho$ be a vector of constraint violation penalty factors. $\rho$ is 0 at the first iteration and updated at subsequent iterations. At the $k$ -th major iteration, let $\mathbf{s}_{k}$ be the slack variables. The $i$ -th element $s_{k,i}$ of $\mathbf{s}_{k}$ is defined as

s_{k,i}=\begin{cases}\max(0,c_{i}(\mathbf{x}_{k}))\mbox{, if }\rho_{i}=0\\ \max(0,c_{i}(\mathbf{x}_{k})-\lambda_{i}/\rho_{i})\mbox{, otherwise}\end{cases}

(16)

Let $\hat{\mathbf{s}}_{k}$ be the slack variables of the linearization $\hat{c}(\mathbf{x})$ after QP termination, $\hat{\mathbf{s}}_{k}=\mathbf{c}_{k}+\mathbf{J}_{k}\mathbf{d}$ . With $(\mathbf{x},\lambda,\mathbf{s})$ as unknown variables, the merit function is formulated as,

\mathcal{M}_{\rho}(\mathbf{x},\lambda,\mathbf{s})=\psi(\mathbf{x})-\lambda^{T}% (c(\mathbf{x})-\mathbf{s})+\frac{1}{2}\sum_{i=1}^{m}\rho_{i}\left(c_{i}(% \mathbf{x})-\mathbf{s}_{i}\right)^{2}

(17)

This merit function has very nice theoretical properties detailed in [5]. Most importantly, it seamlessly handles inequality constraints between major iteration using slack variables. It can be shown that the SQP method equipped with this merit function has global convergence. The definiton of $\mathbf{s}_{k}$ in (16) is the local minimum of merit function (17) wrt to $\mathbf{s}$ alone subject that $\mathbf{s}\geq 0$ .

We then perform a line search to compute an acceptable step length $\alpha$ along the following search direction,

\begin{pmatrix}\mathbf{d}\cr\xi\cr\mathbf{q}\end{pmatrix}=\begin{pmatrix}% \mathbf{d}\cr\mu-\lambda_{k}\cr\hat{\mathbf{s}}_{k}-\mathbf{s}_{k}\end{pmatrix% }=\begin{pmatrix}\mathbf{d}\cr\mu-\lambda_{k}\cr\mathbf{J}_{k}\mathbf{d}+% \mathbf{c}_{k}-\mathbf{s}_{k}\end{pmatrix}

(18)

The primal-dual augmented unknown variables are updated to start the $(k+1)$ -th major iteration,

	$\displaystyle\mathbf{x}_{k+1}$	$\displaystyle=$	$\displaystyle\mathbf{x}_{k}+\alpha\mathbf{d}$
	$\displaystyle\lambda_{k+1}$	$\displaystyle=$	$\displaystyle\lambda_{k}+\alpha\xi$

3.5 Update Penalty Factor $\rho$

In the line search method along the search direction (18), we rewrite the merit function as a univariate function $\phi(\alpha)$ wrt step length $\alpha$ ,

	$\displaystyle v(\alpha)$	$\displaystyle=$	$\displaystyle\begin{pmatrix}\hidden@noalign{}\hfil\textstyle\mathbf{x}_{k}\\ \hidden@noalign{}\hfil\textstyle\lambda_{k}\\ \hidden@noalign{}\hfil\textstyle\mathbf{s}_{k}\end{pmatrix}+\alpha\begin{% pmatrix}\hidden@noalign{}\hfil\textstyle\mathbf{d}\\ \hidden@noalign{}\hfil\textstyle\xi\\ \hidden@noalign{}\hfil\textstyle\mathbf{q}\end{pmatrix}$		(25)
	$\displaystyle\phi_{\rho}(\alpha)$	$\displaystyle=$	$\displaystyle\mathcal{M}_{\rho}(v(\alpha))$		(26)

In order to make descent step move, we want $\phi^{\prime}(0)$ be significantly negative. The choice in SNOPT method [4] is to find $\rho$ such that

\phi_{\rho}^{\prime}(0)<-\frac{1}{2}\mathbf{d}^{T}\mathbf{B}_{k}\mathbf{d}

(27)

[5] proved the existence of a lower bound $\hat{\rho}>0$ such that $\forall\rho>\hat{\rho}$ , condition (27) is satisfied. Further, the optimal $\rho^{*}$ of the following linearly constrainted least square problem

\displaystyle\min_{\rho}\frac{1}{2}\|\rho\|^{2}\mbox{, s.t. }\phi_{\rho}^{% \prime}(0)=-\frac{1}{2}\mathbf{d}^{T}\mathbf{B}_{k}\mathbf{d},\rho\geq 0

can be computed analytically. At the termination of sub QP problem (13), if condition (27) is not satisfied, we change $\rho$ by $\rho^{*}$ . On the other hand, if condition (27) is satisfied and $\rho>\rho^{*}$ , $\rho$ can be reduced to $\rho^{*}$ .

4 Initial Working Set and Feasibility

4.1 Initial Working Set $\mathcal{W}_{0}$

Given an initial guess of $\mathbf{x}$ , we first use the “crash start” procedure [2] to solve a phase-I feasibility problem subject to only the general linear constraints $\mathbf{b}_{l}\leq\mathbf{A}\mathbf{x}\leq\mathbf{b}_{u}$ and simple bounds $\mathbf{x}_{l}\leq\mathbf{x}\leq\mathbf{x}_{u}$ in (5).

If there is no feasibile solution wrt the general linear constraints and simple bounds, the algorithm terminates immediately. Otherwise, we have found a linearly feasible point $\mathbf{x}_{0}$ and an initial active set $\mathcal{W}_{0}$ of only linear constraints. From now on, the active set method guarantees all linear constraints and simple bounds in (5) are always satisfied.

4.2 Sub QP Feasibility

After $\mathbf{x}_{0}$ and $\mathcal{W}_{0}$ are obtained, the algorithm starts major iterations by first linearizing all nonlinear constraints $c(\mathbf{x})$ at point $\mathbf{x}_{k}$ . We solve the phase-I feasibility problem again, now with the full constraint specification in formulation (13). If a feasible point is identified, the algorithm continues to solve the sub QP as usual. Otherwise, the algorithem switches to an elastic programming (EP) mode.

4.3 Elastic Programming Mode

It is possible that a feasible point does not exist wrt the linearization $\hat{c}(\mathbf{x}_{k})$ . In this situation, we introduce surplus variables $\mathbf{w}$ to the NLP formulation to penalize the violation of nonlinear constraints with a given penalty factor $\gamma$ ,

\min_{\mathbf{x}\in\mathbb{R}^{n},\mathbf{w}\in\mathbb{R}^{m}}\psi^{(e)}(% \mathbf{x},\mathbf{w})\mbox{ s.t. }c^{(e)}(\mathbf{x},\mathbf{w})\geq 0,% \mathbf{w}\geq 0

, where

	$\displaystyle\psi^{(e)}(\mathbf{x},\mathbf{w})=\psi(\mathbf{x})+\gamma\mathbf{% e}^{T}\mathbf{w}$
	$\displaystyle c^{(e)}(\mathbf{x},\mathbf{w})=c(\mathbf{x})+\mathbf{w}$

The Lagrange multiplier condition for an elastic constraint $c_{i}(\mathbf{x})+\mathbf{w}_{i}\geq 0$ requires $0\leq\lambda_{i}\leq\gamma$ . Let $\mathbf{w}_{k}$ be the value of surplus variables evaluated at $\mathbf{x}_{k}$ . Linearizing all elastic constraints $c^{(e)}(\mathbf{x},\mathbf{w})$ at $(\mathbf{x}_{k},\mathbf{w}_{k})$ , the corresponding sub QP problem is

	$\displaystyle\min_{d\in\mathbb{R}^{n},\Delta w\in\mathbb{R}^{m}}$		$\displaystyle\mathbf{g}_{k}^{T}\mathbf{d}+\frac{1}{2}\mathbf{d}^{T}\mathbf{B}_% {k}\mathbf{d}+\gamma\mathbf{e}^{T}\Delta\mathbf{w}\mbox{, s.t. }$
			$\displaystyle\mathbf{J}_{k}\mathbf{d}+\Delta\mathbf{w}\geq-\mathbf{c}_{k}-% \mathbf{w}_{k}$
			$\displaystyle\Delta\mathbf{w}\geq-\mathbf{w}_{k}$

The algorithm gradually increases the penality factor $\gamma$ if the major iterations keep running in EP mode. It will quit from the EP mode as soon as a sub QP problem can find a feasible starting point.

5 Termination Control

5.1 Minor Iterations

The sub QP problem (13) is solved in minor iterations by a QP active-set method. We expose three control functions as the terminate control of the sub QP problem.

•

set_qp_max_iter() function specifies the max number of iterations $M$ that is allowed in a sub QP.
•

set_qp_stationary_tolerance() function specifies the tolerance level $\bar{\epsilon}_{s}$ to test if the stationary condition is satisfied at a QP minor iteration.
•

set_qp_converge_tolerance() function specifies the tolerance level $\bar{\epsilon}_{c}$ to test if the Lagrange multiplier condition is satisfied. If so, the optimal of sub QP problem is found.

An exception will be thrown when $M$ is reached at QP minor iterations. This mostly happens at the QP phase-I feasibility stage, i.e. the number of general linear constraints and simple bounds is very large but $M$ is relatively small. It rarely happens at QP optimality stage because we use an earlier termination strategy described in the section below. For the details of stationary tolerance control $\bar{\epsilon}_{s}$ and convergence tolerance control $\bar{\epsilon}_{c}$ , please refer to our QP white paper

5.2 QP Early Termination

Effectively, the SQP algorithm still works when sub QP (13) stops at a stationary but non-optimal point. This allows early terminating QP minor iterations for better performance. At a stationary point, we perform an early termination if one of the following conditions is satisfied,

•

the improvement of QP objective is greater than a pre-defined progress control function;
•

the actual number of minor iterations is greater than $\frac{2M}{3}$ ;
•

in EP mode, the Lagrange multiplier for linearized constraints are greater than $2\gamma$ .

5.3 Major Iterations

The SQP solver class SQP_ActiveSet is derived from IterativeMethod base class and shares the max number iterations control $N$ and convergence tolerance control $\epsilon_{c}$ defined in IterativeMethod. Both $N$ and $\epsilon_{c}$ are used to terminate SQP major iterations.

The SQP algorithm is considered as converged if the following two conditions are both satisfied,

1.

the sub QP terminates in normal state (not an early termination).
2.

the step length $\alpha$ on a line search direction (18) satisfies,

$\frac{\alpha|\mathbf{d}|_{\infty}}{1+\left|\mathbf{x}_{k}\right|_{\infty}}<% \epsilon_{c}$

References

[1] Jorge Nocedal and Stephen J. Wright: Numerical Optimization, Springer, 1999
[2] Philip E. Gill, Walter Murray and Margaret H. Wright: Practical Optimization, Academic Press, 1981
[3] Philip. E. Gill and Elizabeth Wong: Sequential Quadratic Programming Methods, UCSD Department of Mathematics, Technical Report NA-10-03
[4] Philip. E. Gill, Walter Murray and Michael A. Saunders: SNOPT: An SQP Algorithm for Large-Scale Constrainted Optimization. SIAM Review, Volumn 47, Number 1, pp.99-131
[5] Philip. E. Gill, Walter Murray and Michael A. Saunders: Some Theoretical Properties of an Augmented Lagrangian Merit Function, Advances in Optimization and Parallel Computing, P.M. Pardalos, ed., North-Holland, Amsterdam, 1992, pp. 101-128
[6] M. J. D. Powell, The Convergence of variable metric methods for nonlinearly constrainted optimization calculation, in Nonlinear Programming, 3 (Proc. Sympos., Special Interest Group Math. Programming, University of Wisconsin, WI, 1977), Academic Press, New York 1978, pp 27-63
[7] Samuel K. Eldersveld: Large-Scale Sequential Quadratic Programming Algorithms, Technical Report SOL 92-4, September 1992.

Nonlinear Programming Mathwrist White Paper Series