Unconstrained Optimization
Mathwrist C++ API Series

(October 5, 2023)

Abstract

This document details the C++ programming interface of multivariate unconstrained optimization features in Mathwrist’s Numerical Programming Library (NPL). Please refer to our unconstrained optimization white paper for a technical overview.

1 Introduction

Let $\psi(\mathbf{x}):\mathbb{R}^{n}\rightarrow\mathbb{R}$ be a twice continuously differentiable function with gradient $\mathbf{g}(\mathbf{x})$ and Hessian $\mathbf{H}(\mathbf{x})$ . Unconstrained optimization solves the following minimization problem,

\min_{\mathbf{x}\in\mathbb{R}^{n}}\psi(\mathbf{x})

The optimality conditions at solution $\mathbf{x}^{*}$ are $\|\mathbf{g}(\mathbf{x}^{*})\|=0$ and $\mathbf{H}(\mathbf{x}^{*})$ is at least positive semi-definite.

Users need define an objective function by deriving it from the FunctionND base class and implementing the eval() virtual function. The eval() function computes function values and gradients. Depending on the actual optimization solver being used, the objective function may also need supply implementation of Hessian matrix calculation. Please refer to our “1-d and n-d functions” white paper and API documentation for the details of n-dimensional funcitons in order to represent $\psi(\mathbf{x})$ .

In Mathwrist NPL, we provide both line search based and trust region based solvers, derived from LineSearch and TrustRegion base classes respectively. Class LineSearch and TrustRegion in turn are all derived from IterativeMethod class for max iterations control and convergence tolerance control. Class IterativeMethod is first introduced and documented in the “Iterative Method” section in our “1-d and n-d Functions” API chapter.

Both line search and trust region algorithms iteratively generate a sequence of improved data points $\mathbf{x}^{k},k=0,\cdots$ converging to solution $\mathbf{x}^{*}$ . The algorithm terminates at iteration $k$ , if $k$ reaches to the max number of iterations, or the optimality conditions are satisfied. In particular, if $\|\mathbf{g}(\mathbf{x}^{k})\|_{\infty}\leq\epsilon$ , where $\epsilon$ is the convergence tolerance, we consider the stationary condition is satisfied.

2 Line Search Method

1 class LineSearch : public IterativeMethod

2 {

3 };

All line search based solvers are defined in the header file line_search.h. Base class LineSearch is a pure virtual class derived from IterativeMethod class that provides controls on the max number of iterations and convergence tolerance $\epsilon$ .

LineSearch base class carries out the main work of iteratively making a descent step move $\mathbf{x}^{k+1}=\mathbf{x}^{k}+\alpha\mathbf{p}$ . Concrete line search derived classes are responsible for computing a descent direction $\mathbf{p}$ at each iteration point $\mathbf{x}^{k}$ . The line search base class calculates a step length $\alpha$ satisfying Wolfe conditions and then makes a step move.

2.1 Objective Function

1 const FunctionND& objective() const { return m_obj; }

Description:

Return a constant reference to the objective function, which is an instance of FunctionND class. LineSearch base class holds a reference to the objective function as data member.

2.2 Solver Operator

1 double operator() (double max_step_move, Matrix& x);

Description:

Solve the unconstrained optimization problem using the line search method.

•

Parameter max_step_move specifies a control parameter $\Delta$ such that the step length $\alpha$ is upper bounded by $\bar{\alpha}$ , $\bar{\alpha}\|\mathbf{p}\|_{2}\leq\Delta$ .
•

Parameter x holds the initial guess on call. On return, it stores the solution $\mathbf{x}^{*}$ found by the solver.
•

The function returns the value of $\psi(\mathbf{x}^{*})$ .

2.3 Restart Control

1 void set_num_iter_restart(size_t n);

2 size_t get_num_iter_restart() const;

Description:

Line search solvers internally perform certain checks to protect the algorithm from being deteriorated due to accumulated numerical noise. Once a solver is determined in “dirty” state at point $\mathbf{x}^{k}$ , a restart operation happens like $\mathbf{x}^{k}$ is supplied as the initial guess. Users can force the restart to occur after every $n$ number of iterations.

2.4 Step Convergence Control

1 void set_step_tolerance(double eps);

2 double step_tolerance() const;

1.

Set the convergence tolerance of step length. Step length interval $(\alpha_{l},\alpha_{u})$ is considered converged when $\alpha_{l}$ numerically equals to $\alpha_{u}$ at this tolerance level.
2.

Return the step length convergence tolerance being used.

2.5 Steepest Descent Method

1 class LineSearchSteepestDescent : public LineSearch

2 {

3 public:

4 explicit LineSearchSteepestDescent(const FunctionND& obj);

5 };

LineSearchSteepestDescent is a concrete line search subtype that computes search direction using the steepest descent method. The constructor takes a constant reference of a FunctionND object as the objective function.

REQUIRED: This objective function needs be alive during the lifecycle of the line search solver.

2.6 Newton Method

1 class LineSearchNewton : public LineSearch

2 {

3 public:

4 explicit LineSearchNewton(

5 const FunctionND& obj,

6 bool discrete = false);

7 };

LineSearchNewton is a concrete line search subtype that computes search direction using a modified Newton method. The constructor takes a constant reference of a FunctionND object as the objective function.

REQUIRED: This objective function needs be alive during the lifecycle of the line search solver.

If the second parameter discrete in the constructor is passed true, the solver will apply forward finite difference method to gradients to approximate the Hessian matrix. In this case, the objective function can ignore the Hessian calculation when implementing the FunctionND::eval() virtual function. If discrete is passed false to the solver constructor, the objective function is expected to compute Hessian.

2.7 Quasi-Newton Method

1 class LineSearchQuasiNewton : public LineSearch

2 {

3 public:

4 explicit LineSearchQuasiNewton(const FunctionND& obj);

5 };

LineSearchQuasiNewton is a concrete line search subtype that computes search direction from Quasi-Newton (BFGS) method. The objective function can ignore Hessian calculation when implementing the FunctionND::eval() virtual function.

The constructor takes a constant reference of a FunctionND object as the objective function.

REQUIRED: This objective function needs be alive during the lifecycle of the line search solver.

2.8 Conjugate Gradient Method

1 class LineSearchConjugateGradient : public LineSearch

2 {

3 public:

4 explicit LineSearchConjugateGradient(const FunctionND& obj);

5 };

LineSearchConjugateGradient is a concrete line search subtype that computes search directions $\mathbf{p}^{0},\cdots,\mathbf{p}^{k}$ as a sequence of Conjugate Gradient directions using PR+ method. The constructor takes a constant reference of a FunctionND object as the objective function.

REQUIRED: This objective function needs be alive during the lifecycle of the line search solver.

3 Trust Region Method

1 class TrustRegion : public IterativeMethod

2 {

3 };

All trust-region base solvers are defined in the header file trust_region.h. Class TrustRegion is a pure virtual class derived from IterativeMethod class that provides controls on the max number of iterations and convergence tolerance $\epsilon$ . Concrete trust region methods are further derived from the TrustRegion base class. These derived sub classes are responsible for computing a descent step move $\mathbf{p}$ , $\|\mathbf{p}\|\leq\Delta$ where $\Delta$ is the trust region radius. The TrustRegion base class iteratively makes step moves and adjust $\Delta$ if necessary.

3.1 Objective Function

1 const FunctionND& objective() const { return m_obj; }

Description:

Return a constant reference to the objective function, which is an instance of FunctionND class. TrustRegion base class holds a reference to the objective function as data member.

3.2 Solver Operator

1 double operator() (double radius, Matrix& x);

Description:

Solve the unconstrained optimization problem using the trust region method.

•

Parameter radius is the initial trust region radius suggested by the caller.
•

Parameter x holds the initial guess on call. On return, it stores the solution $\mathbf{x}^{*}$ found by the solver.
•

The function returns the value of $\psi(\mathbf{x}^{*})$ .

3.3 Max Radius

1 double get_max_radius() const;

2 void set_max_radius(double max_radius);

Description:

At any iteration, the actual radius being used is upper bounded by the max radius. These two functions get and set this max trust regioin radius.

3.4 Ellipse Scaling

1 void enable_ellipse_scaling(bool flag = true);

Description:

This is an experimental feature as of the current release. It enables/disables search in ellipse scaled trust region. Once enabled, the search step $\mathbf{p}$ is bounded by $\|\mathbf{D}\mathbf{p}\|\leq\Delta$ for some scaling matrix $\mathbf{D}$ such that $\mathbf{D}^{-1}\mathbf{H}(\mathbf{x})\mathbf{D}^{-1}$ has better condition number than the original Hessian $\mathbf{H}(\mathbf{x})$ . By default, ellipse scaling is disabled.

3.5 Trust Region with Hessian

1 class TrustRegionWithHessian : public TrustRegion

2 {

3 };

Class TrustRegionWithHessian is an intermediate derived class from TrustRegion base class. It serves only for internal implementation purposes and doesn’t have public interface.

3.6 Exact Direction

1 class TrustRegionExact : public TrustRegionWithHessian

2 {

3 };

Class TrustRegionExact is a concrete trust region subtype method that computes “nearly” exact search step $\mathbf{p}$ for medium size problems, where $\mathbf{p}$ is computed by solving

\left(\mathbf{H}(\mathbf{x}^{k})+\lambda\mathbf{I}\right)\mathbf{p}^{*}=-% \mathbf{g}(\mathbf{x}^{k})

(1)

for some $\lambda\geq 0$ .

3.6.1 Constructor

1 TrustRegionExact(const FunctionND& obj, double max_radius);

Description:

The constructor takes a constant reference of a FunctionND object as the objective function. REQUIRED: This objective function needs be alive during the lifecycle of the line search solver. The second parameter max_radius in the constructor specifies the max trust region radius.

3.6.2 Iterations to find $\lambda$

1 void set_max_lambda_iterations(size_t n);

2 size_t max_lambda_iterations() const;

Description:

Finding the $\lambda$ in equation (1) most of the time involves root finding. These two functions set and get the max number of iterations to be used in the root finding process. By default, root finder will throw an exception when the max number of iterations are reached. This throw behavior can be disabled by calling the enable_exception_on_max_iter() function through the IterativeMethod base class. If exception throw is disabled, the algorithm will use the “half-way” solution of $\lambda$ computed from root finding.

3.7 Conjugate Gradient-Steihaug Direction

1 class TrustRegionSteihaug : public TrustRegionWithHessian

2 {

3 };

Class TrustRegionSteihaug is a concrete trust region subtype method that computes a descent search step $\mathbf{p}$ usually for large size problems, where $\mathbf{p}$ is computed by Conjugate Gradient-Steihaug method.

3.7.1 Constructor

1 TrustRegionSteihaug(const FunctionND& obj, double max_radius);

Description:

The constructor takes a constant reference of a FunctionND object as the objective function. REQUIRED: This objective function needs be alive during the lifecycle of the line search solver. The second parameter max_radius in the constructor specifies the max trust region radius.

3.7.2 Early Termination Control

1 void set_cg_tolerance(double tol);

2 double cg_tolerance() const;

3 void set_cg_exactness(double level);

4 double cg_exactness() const;

Description:

The search direction is a linear combination of a sequence of Conjugate Gradient intermediate step directions. At any Conjugate Gradient intermediate step, we compute the residual vector $\mathbf{r}$ . If $\|\mathbf{r}\|_{\infty}<\delta$ or $\frac{\|\mathbf{r}\|}{\|\mathbf{g}(\mathbf{x}^{k})\|}\leq 1-\xi$ , we stop the Conjugate Gradient sequence. $\delta>0$ is the residual tolerance control parameter. $\xi\in(0,1)$ is the “exactness” control parameter.

1.

Set the tolerance control parameter $\delta$ .
2.

Get the tolerance control parameter $\delta$ .
3.

Set the “exactness” control parameter $\xi$ .
4.

Get the “exactness” control parameter $\xi$ .

Unconstrained Optimization Mathwrist C++ API Series

Abstract

1 Introduction

2 Line Search Method

2.1 Objective Function

2.2 Solver Operator

2.3 Restart Control

2.4 Step Convergence Control

2.5 Steepest Descent Method

2.6 Newton Method

2.7 Quasi-Newton Method

2.8 Conjugate Gradient Method

3 Trust Region Method

3.1 Objective Function

3.2 Solver Operator

3.3 Max Radius

3.4 Ellipse Scaling

3.5 Trust Region with Hessian

3.6 Exact Direction

3.6.1 Constructor

3.6.2 Iterations to find λ

3.7 Conjugate Gradient-Steihaug Direction

3.7.1 Constructor

3.7.2 Early Termination Control

Unconstrained Optimization
Mathwrist C++ API Series

3.6.2 Iterations to find $\lambda$