Quadratic Programming
Mathwrist White Paper Series

Copyright ยฉMathwrist LLC 2023
(January 20, 2023)
Abstract

This document presents a technical overview of the Quadratic Programming (QP) feature implemented in Mathwristโ€™s C++ Numerical Programming Library (NPL). We use the active set method with an inertia control strategy to handle both convex and general QP problems.

1 Introduction

Quadratic programming (QP) solves the following minimization problem with a quadratic objective function subject to linear constraints.

min๐ฑโˆˆโ„nโก๐œTโข๐ฑ+12โข๐ฑTโข๐‡๐ฑ,ย s.t.ย โข๐›lโ‰ค๐€๐ฑโ‰ค๐›uโขย andย โข๐ฅโ‰ค๐ฑโ‰ค๐ฎ (1)

Compared to linear programming (LP), QP objective function has an additional quadratic term 12โข๐ฑTโข๐‡๐ฑ with Hessian matrix ๐‡. If ๐‡ is positive definite, the problem is known as a convex QP. Otherwise, it is called a general QP problem.

Following the same notation used in our LP white paper, we shall refer ๐œ as the cost vector, ๐›lโ‰ค๐€๐ฑโ‰ค๐›u as general linear constraints and ๐ฅโ‰ค๐ฑโ‰ค๐ฎ as simple bound (box) constraints respectively. It is possible that some of the linear constraints are equality constraints, in which case their lower bound and upper bound are same.

Our main references include [1] by Jorge Nocedal and Stephen J. Wright, [2] by Gill et al., and [3] by Gill et al. Particularly inspired by [2], we implemented a framework as the outline algorithm for all active-set based methods in NPL. Different optimization problems have different customization in search direction, search step length and optimality conditions. They are largely common in terms of active-set operations. Please refer to our LP white paper for an overview of the active set method.

2 Equality Constrained QP

If all constraints are equality constraints and the Hessian matrix ๐‡ is positive definite, then from any given point ๐ฑ we can reach to the optimal point ๐ฑ*=๐ฑ+๐ฉ with a single step ๐ฉ. This step ๐ฉ is the unique solution of the KKT system,

(๐‡๐€T๐€)โข(-๐ฉฮป)=(๐ ๐)

, where ๐  is the gradient of the objective ๐ =๐œ+๐‡๐ฑ, ๐=๐€๐ฑ-๐›.

Let ๐€T have the following QR factorization,

๐€T=(๐˜๐™)โŸ๐โข(๐‘0)

Since ๐ is a full basis, we can write ๐ฉ=๐˜๐ฉY+๐™๐ฉZ in terms of a range space component ๐ฉY and a null space component ๐ฉZ. Clearly, they individually are the solutions of

๐‘Tโข๐ฉY = -๐ (2)
๐™Tโข๐‡๐™๐ฉZ = -๐™Tโข(๐‡๐˜๐ฉY+๐ ) (3)

Matrix ๐™Tโข๐‡๐™ is called the reduced Hessian ๐‡~. We can interpret the ๐ฉY component as a feasibility step and ๐ฉZ component as the optimality step. In order to solve ๐ฉZ from equation (3), we compute the Cholesky factorization of ๐‡~.

At ๐ฑ*, the gradient is ๐ *=๐œ+๐‡๐ฑ*=๐ +๐‡๐ฉ=๐€Tโขฮป. If we want to make any further move ๐ช yet retaining the feasibility with respect to all equality constraints, ๐ช needs be in the null space, ๐ช=๐™๐ชZ for some ๐ชZ. The change of objective function introduced by ๐ช is,

๐œTโข๐ช+๐ชTโข๐‡๐ฑ*+12โข๐ชTโข๐‡๐ช=๐ชTโข๐ *+12โข๐ชTโข๐‡๐ช

Because ๐ *=๐€Tโขฮป and ๐ช is in the null space, ๐ชTโข๐ *=0. The objective function changed by ๐ช is just the quadratic term 12โข๐ชTโข๐‡๐ช=12โข๐ชZโข๐‡~โข๐ชZ. For positive definite ๐‡, any move from ๐ฑ* causes the objective function to increase. This confirms ๐ฑ* is indeed the optimal point. In fact, we only need the reduced Hessian ๐‡~ to be positive definite.

Finally, if ๐‡~ is indefinite, we can first make a feasibility step move ๐˜๐ฉY and then find a descent null space direction ๐ช=๐™๐ชZ such that the quadratic term ๐ชZโข๐‡~โข๐ชZ is negative. Because there is no any other constraint, we can move at infinite step length along ๐ช. The problem has unbounded solution.

3 Convex QP

For conveniency purposes and without loss of generality, letโ€™s now consider a standard QP formulation with only lower bounded linear constraints.

min๐ฑโˆˆโ„nโก๐œTโข๐ฑ+12โข๐ฑTโข๐‡๐ฑ,ย s.t.ย โข๐€๐ฑโ‰ฅ๐› (4)

, where the Hessian matrix ๐‡ is positive definite. Our QP active set implementation doesnโ€™t convert the general form (1) to this standard form (4). We will discuss how the general constraints and simple bounds are handled in a later section.

Let โ„ฐ denote the index set of linear equality constraints, ๐€โ„ฐโข๐ฑ=๐›โ„ฐ. Assume ๐€โ„ฐ has full row rank. The number of equality constraints is less than the number of unknowns n.

At the start of the k-th iteration, the method is given a feasible point ๐ฑk. It partitions all contraints into a working set ๐’ฒ and a non-working set ๐’ฉ. The working index set ๐’ฒ includes all equality constraints โ„ฐ and some inequality constraints that have equality exactly being held at the current point ๐ฑk. ๐’ฉ includes all other constraints.

3.1 Sub QP

At each iteration, the active set method solves a sub QP problem subject to only equality constraints ๐€๐’ฒโข๐ฑk=๐›๐’ฒ. If ๐‡~ is positive definite, we can reach the local optimal ๐ฑk* at a unit step ๐ฉ. This search step ๐ฉ is the unique solution of the KKT system,

(๐‡๐€๐’ฒT๐€๐’ฒ)โข(-๐ฉฮป)=(๐ ๐ŸŽ) (5)

Note that the second equation in the KKT system now is ๐€๐’ฒโข๐ฉ=0 to retain equality with respect to ๐’ฒ. In other words, ๐ฉ is a null space (optimality) step.

The KKT system (5) simultaneously solves the Lagrange multipliers ฮป at ๐ฑk* for all working constraints. If an inequality constraint i in the working set, iโˆˆ๐’ฒ-โ„ฐ, has negative Lagrange multiplier ฮปi<0, we can remove constraint i from ๐’ฒ and start a new iteration. It can be shown that the search direction ๐ฉ computed from the KKT system (5) using the updated working set is a feasible direction to the deleted constraint i.

3.2 Objective Function Change

When we move along a null space direction ๐ฉ=๐™๐ฉZ for some ๐ชZ, the objective function change is

๐ฉZTโข๐™Tโข๐ +12โข๐ฉZTโข๐™Tโข๐‡๐™๐ฉZ

The ๐™Tโข๐  term is oftern called reduced gradient, denoted as ๐ ~ here. The change is,

๐ฉZTโข๐ ~+12โข๐ฉZTโข๐‡~โข๐ฉZ (6)

3.3 Working Set Operations

After solving ๐ฉ from KKT system (5) and moving along this search direction, we could hit a non-working constraint before a unit step. Let jโˆˆ๐’ฉ be the first blocking constraint among all non-working constraints. Let ฮฑ<1 be the step length that we have moved along ๐ฉ to hit j. In this case we update ๐ฑk+1=๐ฑk+ฮฑโข๐ฉ. At this point, equality ๐€jโข๐ฑk+1=๐›j exactly holds. We therefore remove j from ๐’ฉ and add it to ๐’ฒ. With this updated feasible point and new working set, we start the next iteration.

On the other hand, if we are able to make a unit step move ฮฑ=1, the local optimal ๐ฑ* is reached. We test the Lagrange multipliers for all constraint iโˆˆ๐’ฒ-โ„ฐ. Constraint i with the most negative Lagrange is removed from the current working set ๐’ฒ and added to ๐’ฉ. We update ๐ฑk+1=๐ฑ*=๐ฑ+๐ฉ and start the next iteration with the updated working set.

3.4 Upper Bounds and Mixed Constraints

NPL QP active set solver directly works on the general formulation (1). At a local optimal point ๐ฑ*, the Lagrange multiplier requirement is ฮปiโ‰ฅ0 for lower bounded constraints and ฮปiโ‰ค0 for upper bounded constraints. When we move along a descent feasible direction ๐ฉ, for an inequality constraint jโˆˆ๐’ฉ, if ๐€jโข๐ฉ<0, we check if the lower bound blocks the move. If ๐€jโข๐ฉ>0, we check if its upper bound blocks. The working set ๐’ฒ not only tracks the constraint index, but also which bound has equality being satisfied.

NPL handles the mixture of general linear constraints and simple bounds in the same active set framework. When a variable is equal to one of its boundary values, it is considered as a fixed variable. A variable stays strictly inside the constrained range is a free variable. Let ๐ฑb and ๐ฑf denote the fixed and free variables respectively. Let ๐๐’ฒ and ๐…๐’ฒ be the working constraint coefficient sub matrix corresponding to ๐ฑb and ๐ฑf. The set of working constraints effectively is

(๐ˆ๐๐’ฒ๐…๐’ฒ)โข(๐ฑb๐ฑf)=(๐ฅb๐›๐’ฒ)

At each sub QP iteration, conceptually we have a KKT system in block structure

(๐‡bโขb๐‡bโขf๐ˆ๐๐’ฒT๐‡fโขb๐‡fโขf๐…๐’ฒT๐ˆ๐๐’ฒ๐…๐’ฒ)โข(๐ŸŽ-๐ฉfฮปbฮป๐’ฒ)=(๐ b๐ f๐ŸŽ๐ŸŽ) (8)

, where we have arranged the Hessian matrix ๐‡ and gradient ๐  into fixed variable and free variable blocks. The search direction ๐ฉf now is just relevant to free variables. At a local optimal ๐ฑ*, we now need also test the Lagrange multipliers ฮปb for the simple bound constraints ๐ฅbโ‰ค๐ฑbโ‰ค๐ฎb.

3.5 Null Space Direction and QR Update

From KKT system (8), clearly ๐ฉf is a null space direction to ๐…๐’ฒ. We maintain QR factorization of ๐…๐’ฒT,

๐…๐’ฒT=(๐˜๐™)โŸ๐โข(๐‘0)

Initially, a full QR decomposition is performed. Subsequently at each iteration, both simple bounds and general constraints can join or leave the working set. The block structure in (8) therefore changes accordingly. It is cost prohibitive to recompute a full QR factorization whenever ๐…๐’ฒ changes. An efficient QR update scheme is implemented based on [2], section 5.2.4.

3.6 Cholesky Update of ๐‡~

Because we use Cholesky factorization ๐‡~=๐‹๐‹T to solve equation (3), it is also cost prohibitive to refactor ๐‹ whenever ๐‡~ is modified due to working set changes. We developed an efficient updating scheme to adopt the working set changes and economically obtain the new Cholesky factor from the previous Cholesky factor.

4 General QP

In this section we again assume the standard QP formulation (4) for the convenience of discussion.

First of all, given a general Hessian matrix ๐‡, the reduced Hessian ๐‡~ could still possibly be positive definite at an iteration. In which case, the active set method continues exactly same as it solves the convex QP problems.

Secondly, if the current iteration ๐ฑk is a stationary point wrt the working set and ๐‡~ is positive semi-definite, ๐ฑk is then at a local optimal point. We cannot reduce the objective function value unless the working set changes. The active set method procceds to test Lagrange multiplier conditions also same as convex QP problems.

In the next, we consider the situation when ๐‡~ is positive semi-definite but ๐ฑk is not a stationary point and when ๐‡~ is indefinite. In both cases, we just need find a descent feasible direction with respect to the working set. By moving along this direction, if there is no blocking constraint jโˆˆ๐’ฉ, the problem is unbounded. Otherwise, the blocking constraint is added to the working set. ๐‡~ is changed accordingly. We need start a new iteration and test the definiteness of the new ๐‡~.

4.1 Inertia Control

Based on [3], we implemented the inertia control strategy in combination with an extended KKT system. In summary, inertia control ensures that ๐‡~ has at most one negative eigen value at any point of time. At a local optimal point, when a contraint a* is to be deleted from the working set, inertia control identifies if the deletion operation will cause ๐‡~ being singular or indefinite. If so, a* is marked as deletion-pending but still kept in the extended KKT system. A zero curvature or negative curvature direction ๐ฉ is then computed accordingly.

When we move along ๐ฉ and hit a blocking constraint ai, if adding i to ๐’ฒ lets the iteration reach to a vertex point, then clearly that is a local optimal point. We can remove a* from the extended KKT. If it is not a vertex, by testing the independency of ai wrt ๐€๐’ฒ and a*, inertia control can identify if such a contraint addition operation restores positive definiteness of ๐‡~. If so, again a* can be physically removed from the extended KKT system. Otherwise, the algorithm keeps searching in zero or negative curvature directions and adding blocking constraints. Ultimately, either ๐‡~ becomes positive definite or the current working set is a vertex point.

Most valuably in practical sense, [3] introduced a set of efficient schemes that can economically obtain the solutions of (9), (10) and (11) by updating the solutions of these linear systems at previous iterations. These update schemes are implemented in Mathwrist NPL.

4.2 Positive Semi-definite and Non-stationary

When ๐ฑk is not a stationary point, a null space direction ๐ฉ that satisfies the following linear system is a zero curvature direction. Hence, the quadratic term in (6) vanishes.

(๐‡๐€๐’ฒT๐€๐’ฒ)โข(๐ฉฮฝ)=(๐ŸŽ๐ŸŽ) (9)

Then we just need flip the sign if necessary so that the first term in equation (6) is negative.

4.3 Indefinite and Non-stationary

When ๐‡~ is indefinite at a non-stationary point, it is possible to find a negative curvature direction such that the quadratic term in (6) is negative. Note that we can again flip the sign of ๐ฉZ if necessary to make the first term in (6) negative as well. Such a direction ๐ฉ can be solved by the linear system

(๐‡๐€๐’ฒT๐€๐’ฒ)โข(๐ฉฮฝ)=(๐ ๐ŸŽ) (10)

4.4 Indefinite and Stationary

If ๐‡~ is indefinite but ๐ฑk is a stationary point wrt the current working set, gradient ๐  necessarily is a linear combination of working constraints ๐€๐’ฒ. Multiplying ๐™T to the right hand side of the first equation in (10) will be 0. So we cannot use (10) to find a negative curvature direction.

It can be shown that by following an inertia control strategy [3], the only possibility of having indefinite ๐‡~ at a stationary point is by deleting a constraint ๐€i from the most recent local optimal point with ฮปi=0. This observation allows us to find a negative curvature direction by solving,

(๐‡๐€๐’ฒT๐€iT๐€๐’ฒ๐€i)โข(๐ฉ๐ฐwi)=(๐ŸŽ๐ŸŽ1) (11)

4.5 Initial Working Set

Our active set framework always uses the โ€œcrash startโ€ procedure [2] to find an initial feasible point and an initial working set ๐’ฒ. Then a full QR factorication is performed to ๐…๐’ฒT to obtain the range space ๐˜ and null space ๐™. Please refer to our LP white paper for details of this procedure.

For QP problems, the inertia control strategy requires the initial reduced Hessian ๐™Tโข๐‡๐™ being positive definite. If that is the case, surely for convex QP, we carry out an initial Cholesky factorization ๐‡~=๐‹๐‹T. The active set method is then ready to start its first iteration.

Given a general Hessian ๐‡, it is possible that this initial ๐‡~ is indefinite. We then perform a partial Cholesky [5],

๐Tโข๐‡~โข๐=(๐‹11๐‹21๐ˆ)โข(๐ƒ๐Š)โข(๐‹11T๐‹21T๐ˆ)

, where the permutation ๐ effectively arranges the null space as ๐™=(๐™+๐™-) such that ๐™+Tโข๐‡๐™+=๐‹11โข๐ƒ๐‹11T is positive definite. All columns of ๐™- are treated as artificial constraints and added to the initial working set. The active set method then starts the first iteration. At a local optimal, these artificial constraints are deleted before any real constraint.

5 Termination Control

The active set method has a control parameter N for the max number of iterations allowed. The algorithm always terminates after N iterations. Further, there is an exception-throw control. Once enabled, the algorithm throws an exception after N iterations. If disabled, it just reports the โ€œhalf-wayโ€ solution found after N iterations.

At a local optimal, we test Lagrange multipliers with a โ€œconvergence toleranceโ€ parameter ฯตc>0. If all upper bounded constraints have Lagrange multiplier ฮปโ‰คฯตc and all lower bounded constraints have Lagrange multiplier ฮปโ‰ฅ-ฯตc, Lagrange multiplier condition is considered as satisfied. The algorithm then terminates.

The inertia control strategy internally performs various numerical tests to determine if stationary condition is satisfied. The numerical tolerances used by those tests are critical to the correctness of the algorithm, hence are not exposed as user control. However, we do provide a user interface level โ€œstationary toleranceโ€ ฯตs that can be used as termination control. If the search direction ๐ฉ computed by (5), (9), (10) and (11) satisfies |๐ฉ|โˆž<ฯตs, then we consider a stationary point is reached with respect to the current working set.

References

  • [1] Jorge Nocedal and Stephen J. Wright: Numerical Optimization, Springer, 1999
  • [2] Philip E. Gill, Walter Murray and Margaret H. Wright: Practical Optimization, Academic Press, 1981
  • [3] Philip E. Gill, Walter Murray, Michael A. Saunders and Margaret H. Wright: Inertia-Controlling Methods For General Quadratic Programming, SIAM Review, Volume 33, Number 1, 1991, pages 1-36.
  • [4] Philip E. Gill, Walter Murray, Michael A. Saunders and Margaret H. Wright: Procedures for Optimization Problems with a Mixture of Bounds and General Linear Constraints, ACM Transactions on Mathematical Software, Vol.10, No. 3, September 1984, Pages 282-298
  • [5] Anders. Forsgren, Philip E. Gill and Walter Murray: Computing Modified Newton Directions Using a Partial Cholesky Factorization, SIAM J. SCI. COMPUT. Vol. 16, No. 1, pp. 139-150