1-d and n-d Functions
Mathwrist White Paper Series

Copyright ©Mathwrist LLC 2023
(January 20, 2023)
Abstract

This document presents a technical overview of 1-dimensional and n-dimensional functions in Mathwrist’s C++ Numerical Programming Library (NPL). We first briefly introduce 1-d and n-d function interfaces and then discuss the 1-d interpolation, integration, root finding and minimization features in details.

1 Hierarchy of 1-d Math Functions

We provide Function1D class as the common interface for all 1-dimensional functions to be used in the library. For example, our 1-d root finding solver takes Function1D type objects as the objective function.

Listing 1: 1-d Function Interface
1    class Function1D
2    {
3    public:
4       virtual Function1D* clone() const = 0;
6       virtual double eval(
7          double x,
8          double* _derv1 = 0,
9          double* _derv2 = 0) const = 0;
11       virtual double integral(
12          double lo,
13          double hi) const;
14          ...
15    };

A client application needs derive from the 1-d function interface and implements clone(), eval() and integral() virtual functions as necessary. The eval() function returns function value f(x) at a given point x. It also takes optional arguments _derv1 and _derv2. When non-zero pointers are passed to these parameters, it is expected that the derived class computes the first and second derivatives of f(x). On the return of eval(), it is expected that f(x) and f′′(x) are copied to the addresses provided by those non-zero pointers.

The integral() virtual function has a default implementation that is to throw an un-supported exception. If a client application does need to compute integrals, it can override this default behavior with its own implementation. The Function1D base class provides a protected member function that numerically approximates integrals by Quassian Quadrature. For 1-d smooth functions, a client application may choose to override integral() with the quadrature implementation as the example below.

Listing 2: Derived Class of 1-d Function
1    class F : public Function1D
2    {
3    public:
4       double integral(double lo, double hi) const
5       {
6          // gauss_quad() is a protected member function
7          // provided by Function1D base class.
8          return gauss_quad(lo, hi);
9       }
10       ...
11    };

1.1 1-d Piecewise Functions

Mathematically modelling often uses piecewise functions for simplicity. FunctPiecewise is an intermediate class derived from Function1D base class to handle piecewise lookup. Concrete piecewise functions are further derived from this intermediate class with both eval() and integral() functions being implemented.

Listing 3: 1-d Piecewise Functions
1    class FunctPiecewise : public Function1D
2    {
3       ...
4    };
5    class FunctPiecewiseConst : public FunctPiecewise
6    {
7       ...
8    };
9    class FunctPiecewiseLinear : public FunctPiecewise
10    {
11       ...
12    };
13    class FunctPiecewiseLinearJump : public FunctPiecewise
14    {
15       ...
16    };
17    class BSplineCurve : public FunctPiecewise
18    {
19       ...
20    };

Given n number of knot points (x0,,xn-1) in ascending order, these concrete piecewise functions are strictly defined within the range [x0,xn-1]. When numerically equal comparison is involved, we use |a-b|1+|b|<ϵ to test if a and b are equal with numerical tolerance ϵ. When ab comparison is involved, it should be interpretted as a is numerically not greater than b.

1.1.1 Piecewise Constant Lookup

Given n knot points and n-1 number of discrete function values (y0,,yn-2), the FunctPiecewiseConst function computes function value and derivatives as,

f(x) = {yi,if xix<xi+1,i=0,,n-2yn-2,if xn-2xxn-1
f(x) = 0
f′′(x) = 0

1.1.2 Piecewise Linear Lookup

Given n knot points and n number of discrete function values (y0,,yn-1), the FunctPiecewiseLinear function computes function value and derivatives at a point x, xix<xi+1,i[0,,n-2] as,

f(x) = xi+1-xxi+1-xiyi+x-xixi+1-xiyi+1
f(x) = yi+1-yixi+1-xi
f′′(x) = 0

1.1.3 Piecewise Linear(Jump) Lookup

Given n knot points and 2n-2 number of discrete function values (y0,,y2n-3), the FunctPiecewiseLinearJump function computes function value and derivatives at a point x, xix<xi+1,i[0,,n-2] as,

f(x) = xi+1-xxi+1-xiy2i+x-xixi+1-xiy2i+1
f(x) = y2i+1-y2ixi+1-xi
f′′(x) = 0

1.1.4 B-spline Curve

A B-spline curve of order k is defined as f(x)=i=0N-1θiϕi,k(x), where ϕi,k(t) is the i-th B-spline basis function of order k and θi is the coefficient of the i-th basis.

Let (x0,x1,,xm-1) be the set of knot points over domain (x0,xm-1). To facilitate the definition of a B-spline basis function, it is necessary to introduce an augmented knot point vector

x-k+1,,x-1augmented,x0,,xm-1,xm,,xm+k-2augmented

, where we conveniently choose (x-k+1,,x-1)=x0 and (xm,,xm+k-2)=xm-1. Note that we extend the boundary points x0 and xm-1 by k-1 times at both sides. So the total number of knot points is M=m+2×k-2.

A k-order B-spline basis function is a piecewise polynomial of degree k-1 defined by the following recurrence relation:

ϕi,k(x) = ϕi,k-1(x)(x-xi)xi+k-1-xi+ϕj+1,k-1(x)(xi+k-x)xi+k-xi+1,k>1 (2)
ϕi,1(x) = {1,if xix<xi+10,otherwise.,k=1 (5)

From the base case definition (5), it is clear that B-spline of order 1 is a piecewise constant function. B-spline of order 2 is a piecewise linear function. B-spline of higher order k is a (k-2)-times continuously differentiable function.

B-spline basis functions are fully defined by the knot points and the order of the spline. There are total N=m+k-2 number of basis functions. A B-spline curve is then parameterized by the basis coefficients ΘT=(θ0,,θN-1). Now let Φ(x)T=(ϕ0,k(x),,ϕN-1,k(x)) be the vector of B-spline basis functionals. We can then write the B-spline curve as f(x)=ΘTΦ(x).

In order to evaluate a B-spline function efficiently, it is critical to avoid double calculation introduced by the recurrence of (k-1)-order terms in equation (2). Given arbitrary x[x0,xm-1], we developed a very efficient evaluation scheme to compute the function values, derivatives and integrals of the B-spline curve function. Some good references of B-splines include [3] (chapter 5 appendix), [4] and [5].

1.2 Chebyshev Curve

Chebyshev curve is a 1-d function strictly defined within a fixed domain x[a,b]. It is a direct subtype of the Function1D class.

Listing 4: Chebyshev Curve Function
1    class ChebyshevCurve : public Function1D
2    {
3       ...
4    };

A Chebyshev curve of order k is defined as f(x)=i=0kθiTi(t(x)), where Ti() is the i-th order Chebyshev basis polynomial of the first kind. θi is the coefficient of Ti(). Chebyshev basis polynomial Ti(t) is defined by the following recurrence relation over the canonical domain t[-1,1],

Ti+1(t) = 2tTi(t)-Ti-1(t),i>0
T1(t) = t
T0(t) = 1

Mathwrist NPL mapps the physical variable x[a,b] to canonical variable t[-1,1] of the Chebyshev basis functions by t(x)=2x-ab-a-1. Once the range [a,b] and the order k of Chebyshev curve is determined, the function is fully parametrized by the coefficients ΘT=(θ0,,θk). Again, we can write f(x)=ΘT𝐓(t(x)), where 𝐓(t(x)) is the vector of Chebyshev basis functionals. Our main references include [1], [6] and [7].

2 Hierarchy of n-d Math Functions

We provide FunctionND class as the common interface for all n-dimensional functions to be used in the library. For example, our unconstrained or constrained optimization solvers take FunctionND objects as the objective function. A client application needs derive from the n-d function interface and implement the clone(), eval() and integral() functions as necessary.

Listing 5: N-d Function Interface
1    class FunctionND
2    {
3    public:
4       virtual FunctionND* clone() const = 0;
6       virtual double eval(
7          const Matrix& x,
8          Matrix* _g = 0,
9          Matrix* _H = 0) const = 0;
11       virtual double integral(
12          const Matrix& X_LO,
13          const Matrix& X_HI) const;
14          ...
15    };

The eval() function is a pure virtual function. Concrete derived class needs implement it to return the function value f(x) at a given point x. This function also takes optional arguments _g and _H. When non-zero pointers are passed to these parameters, it is expected that the derived class computes the gradient vector and Hessian matrix. On the return of eval(), it is expected that gradient and Hessian have been copied to the addresses provided by those non-zero pointers.

Like the 1-d case, the integral() virtual function provides a default implementation of throwing an un-supported exception. This serves for most client applications that do not involve computing multi-dimensional integrals.

2.1 2-d Piecewise Functions

Class FunctPiecewise2D is an intermediate class, derived from the general FunctionND base class. The 2-d piecewise functions strictly work within a m×n rectangular grid, defined by m knot points (x0,,xm-1) along x axis and n knot points (y0,,yn-1) along y axis. In other words, its working domain is [x0,xm-1]×[y0,yn-1].

Listing 6: 2-d Piecewise Functions
1    class FunctPiecewise2D : public FunctionND
2    {
3       ...
4    };
5    class FunctPiecewise2DConst : public FunctPiecewise2D
6    {
7       ...
8    };
9    class BSplineSurface : public FunctPiecewise2D
10    {
11       ...
12    };
13    class BilinearSurface : public FunctPiecewise2D
14    {
15       ...
16    };

2.1.1 2-d Piecewise Constant Lookup

For a m×n grid, we require (m-1)×(n-1) number of discrete function values zi,j, i=0,,m-2 and j=0,,n-2 to build the function lookup table.

f(x,y) = {zi,j,if xix<xi+1 and yjy<yj+1zi,n-2,if xix<xi+1 and y=yn-1zm-2,j,if x=xm-1 and yjy<yj+1zm-2,n-2,if x=xm-1 and y=yn-1

Function FunctPiecewise2DConst has 0 gradient and Hessian everywhere in the grid.

2.1.2 B-spline Surface

After the knot points are determined, we then choose the order of B-spline basis along each dimension. Let ΦT(x)=(ϕ0,k(x),,ϕM-1,k(x)) be the vector of B-spline basis functions of order k along x. Let ΨT(y)=(ψ0,l(y),,ψN-1,l(y)) be the vector of B-spline basis of order l along y. A B-spline surface function is defined as the tensor product of these two vectors of basis functionals.

f(x,y)=ΦT(x)𝚯Ψ(y)

, where 𝚯 is the basis coefficient matrix.

2.1.3 Bilinear Surface

Bi-linear surface is a special case of the B-spline surface, where the basis orders are 2. For efficiency purposes, this special case is separately represented by the class BilinearSurface, derived from FunctPiecewise2D.

2.2 Chebyshev Surface

Chebyshev surface is a 2-d function defined strictly within a fixed domain (x,y)[a,b]×[c,d]. It is a direct subtype of FunctionND class.

Listing 7: Chebyshev Surface Class
1    class ChebyshevSurface : public FunctionND
2    {
3       ...
4    };

Similar to B-spline surface, Chebyshev surface is defined as the tensor product of two vectors of Chebyshev basis functionals (first kind).

f(x,y) = 𝐓T(s(x))𝚯𝐓(t(y))
𝐓(s) = (T0(s),,Tk(s))
𝐓(t) = (T0(t),,Tl(t))

, where 𝚯 is the basis coefficient matrix. Function s(x) mapps x[a,b] to s[-1,1], t(y) mapps y[c,d] to t[-1,1].

3 1-d Interpolation

3.1 Introduction

For a unknown function f(x) that is continuous over domain [a,b], we observe a list of discrete data points {(xk,f(xk))}, k=0,,n,xk[a,b]. One way to approximate this unknown function f(x) is to construct an interpolation function that “exactly” matches all those data points {(xk,f(xk))}.

By Weierstrass Approximation theorem, we know that for any ϵ>0, there exists a polynomial p(x) such that |f(x)-p(x)|<ϵ,x[a,b]. It is then a natural choice to use polynomials as the approximation functions due to algebraic simplicity. It can be shown that there exists a unique polynomial p(x) of degree at most n such that p(xk)=f(xk), k=0,,n. Further, this polynomial p(x) can be directly constructed by

p(x) = k=0nf(xk)Ln,k(x) (7)
Ln,k(x) = i=0,ikn(x-xi)(xk-xi) (8)

This representation form is called the n-th Lagrange interpolation polynomial. The interpolation polynomial p(x) has an error term

f(x)-p(x)=f(n+1)(ξ(x))(n+1)!k=0n(x-xk) (9)

, where f(n+1)(ξ(x)) is the (n+1)-th derivative for some ξ(x)(x0,xn).

It is possible to construct an interpolation polynomial to match both function values and derivatives. In particular, a Hermite polynomial H(x) of at most 2n+1 degree agrees with f(x) and f(x) at (x0,,xn). H(x) can be constructed as

H(x) = k=0nf(xk)Hn,k(x)+k=0nf(xk)H^n,k(x)
Hn,k(x) = (1-2(x-xk)Ln,k(xk))Ln,k2(x)
H^n,k(x) = (x-xk)Ln,k2(x)

The Hermite polynomial has an error term

f(x)-H(x)=f(2n+2)(ξ(x))(2n+2)!k=0n(x-xk)2

The formulation of Lagrange polynomials, equations (7) and (8), is very convenient and useful in theoretical analysis. However, matching an additional data point by Lagrange or Hermite interpolation both involves increasing the degree of the interpolation polynomial. High degree polynomials with roughly equally spaced data points tend to have very large oscillating error when x moves away from an interpolation data point. The error can go extremely wild near the end points, known as “Runge phenomenon”.

3.2 Cubic Spline Interpolation

Instead of using a single high degree polynomial, cubic spline interpolation uses piecewise cubic polynomials that jointly form a smooth curve to approximate the unknown function f(x). Let Si(x) be the cubic polynomial over the interval (xi-1,xi) for data point (x0,,xn). The set of piecewise polynomials satisfy the following conditions,

f(xi) = Si(xi),i=1,,n (10)
Si(xi) = Si+1(xi),i=1,,n-1 (11)
Si(xi) = Si+1(xi),i=1,,n-1 (12)
Si′′(xi) = Si+1′′(xi),i=1,,n-1 (13)
f(x0) = S1(x0) (14)

There are a few ways to parameterize a cubic spline interpolation. Suppose we write Si(x) as

Si(x)=ai+bi(x-xi)+ci(x-xi)2+di(x-xi)3

We have n number of piecewise polynomials Si(x) hence 4n number of coefficients (ai,bi,ci,di) to solve. Conditions (14) contribute 4n-2 equations. Two additional conditions are usually applied to the end points x0 and xn. Natural boundary conditions require S1′′(x0)=0 and Sn′′(xn)=0. A clamped cubic spline has S1(x0)=f(x0) and Sn(xn)=f(xn). All together, we have 4n equations to solve the 4n number of coefficients.

3.3 Generalized Cubic Spline with Shape Control

In Mathwrist NPL, we generalize the idea of cubic spline interpolation to be able to match one, two or all three quantities of (f(xk),f(xk),f′′(xk)) at a data point xk. Effectively, users can control the level, slope and curvature of the spline curve at any data point, not just the boundary points in natural spline and clamped spline.

Let p(x) be the cubic spline interpolation function. We choose a formulation of using B-spline basis of order 4, p(x)=i=0M-1θiϕi,4(x), where M is the total number of B-spline basis functions. ϕi,4(x) is the i-th B-spline basis of order 4. θi is the coefficient of the i-th basis. Here M is determined from the input number of unique data points xk and the values we want to match at xk. As a minimum, we need at least M=4 input data point and value combinations. We then place M-2 spline knot points internally using an automatic knot-point placement scheme. Lastly, θi is computed by solving a M×M linear system.

3.4 Use of InterpCubicSplineCurve Class

In order to use the generalized cubic spline interpolation, users need create objects from the InterpCubicSplineCurve class, which is a subtype derived from base class Function1D. Next, we need prepare a buffer of interpolation Point objects, where the Point data type is defined as a nested struct in class InterpCubicSplineCurve,

Listing 8: Interpolation Data Point
1    struct Point
2    {
3       enum Type
4       {
5          VALUE,
6          SLOPE,
7          CURVATURE,
8       };
10       double x;
11       double v;
12       Type type;
14       Point() : x(0), v(0), type(VALUE) {}
15    };

A data point includes coordinate x and the value from one of f(x), f(x) or f′′(x) to match. If an application needs to match multiple values at the same x, users will need populate multiple points with different value types. It is not necessary to sort data points by x. But the min and max of x in supplied data points are interpreted as the domain range [a,b] that p(x) is defined. Once the cubic spline p(x) is fit to interpolation data points, it can be used as a regular 1-d function to compute p(x), p(x) and p′′(x) for any x[a,b]. A function evaluation at x[a,b] will trigger an exception.

The code below illustrates an example of interpolating f(x)=sin(x) over [0,π2] with natural boundary condition f′′(0)=0, slope control f(π2)=0 and three values f(0), f(π4) and f(π2) to match.

Listing 9: Interpolation Example
1    InterpCubicSplineCurve curve;
2    typedef InterpCubicSplineCurve::Point Point;
4    // A buffer of 5 points
5    ftl::Buffer<Point> data_points(5);
7    double a = 0, m = 0.25 * PI(), b = 0.5 * PI();
9    // natural boundary
10    data_points[0] = Point(a, 0, Point::CURVATURE);
12    // values
13    data_points[1] = Point(a, 0, Point::VALUE);
14    data_points[2] = Point(m, std::sin(m), Point::VALUE);
15    data_points[3] = Point(b, 1, Point::VALUE);
17    // slope
18    data_points[4] = Point(b, 0, Point::SLOPE);
20    // interpolate the points
21    curve.interp(data_points);
23    // Use the interpolated curve as a regular 1-d function
24    double x = 0.1;
25    double y = curve.eval(x);

4 1-d Integration

4.1 Introduction

The idea of numerically approximating a finite integral abf(x)𝑑x starts from approximating the integrand function f(x) by an interpolation polynomial passing through a list of data points {x0,,xn}. This interpolation polynomial can be constructed from (7). Consequently,

abf(x)𝑑xabp(x)𝑑x=k=0nwkf(xk)

This approximation is called a quadrature rule with quadrature weights

wk=abLn,k(x)𝑑x (15)

The degree of accuracy of a quadrature rule is measured by the highest integer n such that the rule produces exact result for all f(x)=xk,k=0,,n. By construction, the interpolation polynomial p(x) has degree n. The error term (9) then is 0 for f(x)=xk,kn. Therefore regardless how the data points are placed, the qudrature rule constructed from n+1 points {x0.,xn} has at least n degree of accuracy.

4.2 Composite Rule and Romberg Integration

The family of Newton-Cotes quadrature formulas, i.e. the classic trapezoidal, Simpson’s rules, use equally spaced data points. Large oscillating error occurs when we increase the number of data points with a single interpolation polynomial. To avoid this, low degree polynomials are used to approximate f(x) in small subintervals of [a,b]. Then we can sum up the quadrature results of subintervals and derive composite quadrature rules.

Romberg integration is a further improvement of composite quadrature rules. When equally spaced data points are used, it is possible to cancel out the lower order error term by Richardson’s extrapolation. Practically, Romberg integration can take accuracy order or error tolerance as input. The method starts iterations from low order approximation and terminates until the desired order of accuracy is reached, or the approximation error becomes smaller than the tolerance.

4.3 Adaptive Gaussian Quadrature

Unlike Newton-Cotes quadrature formulas, Gaussian quadrature chooses data points {x0,,xn} in an optimal way such that the degree of accuracy becomes to 2n+1. It can be shown that 2n+1 is the highest accuracy that one can possibly reach with n+1 data points.

It turns out that the optimal data points {x0,,xn} over interval [-1,1] are the roots of the (n+1)-th Legendre polynomial. Gaussian quadrature weights computed from (15) are strictly positive. This is a very nice property for stable numerical integration. One way to prove the 2n+1 degree of accuracy can be found in [1], section 4.7.

In Mathwrist NPL, we provide the Integrate1D class that implements an adaptive 3-point Gaussian quadrature method. Generally speaking, the idea of adaptive methods applies to equally spaced composite rules as well. When the integrand function f(x) has dramatic curvature changes, the adaptive methods tend to use less number of data points over smooth area and increase the number of data points if curvature changes a lot. The details of adaptive quadrature can be found in [1], section 4.6.

4.4 Use of Integrate1D Class

A client application first needs define the integrand function f(x) by inheriting the Function1D interface class. The quadrature rule only requires to evaluate function values f(x), not derivatives. Next we need pass a constant reference of the integrand function to the quadrature constructor. Meanwhile, the constructor also accepts an integration precision parameter ϵ that controls if the adaptive method needs refine integral intervals. Let I(a,b) denote the quadrature result over an interval [a,b].

At the i-th iteration of the adaptive method, assume we are working on sub-interval [ai,bi], which is further splitted to two smaller subintervals [al,bl] and [ar,br]. If

|I(ai,bi)-I(al,bl)-I(ar,br)|max(1.0,|I(ai,bi)|)<ϵ

, then we consider the quadrature result over [ai,bi] is accurate enough and doesn’t need to be refined with more data points. The code block below illustrates an example of numerically integrating 01x-x𝑑x.

Listing 10: Example of User-defined 1-D Function
1    struct F : public Function1D
2    {
3       // Clone is not used in this example.
4       Function1D* clone() const { return 0; }
6       double eval(double x, double*, double*) const
7       {
8          return std::pow(x, -x);
9       }
10    };
12    // Create the integrand function.
13    F f;
15    // Quadrature takes integrand and precision as input.
16    Integrate1D quad(f, 1.e-6);
18    // Integrate over [0, 1]
19    double s = quad(0, 1);
20    assert(equal(s, 1.291285997062548, 1.e-6));

5 1-d Root Finding

5.1 Introduction

The root finding problem of a continuous univariate funciton f(x) is to find the root x* such that f(x*)=0. Root finding methods can be classified into two categories. The first class methods only use the function values f(x). Some well-known first class methods include bisection, brent, secant, etc. The second class methods use both f(x) and its derivatives, typically only f(x) in practice. The most powerful method in the second class probably is Newton-Raphson method.

Let {xn,n=0,,} be a sequence converging to root x*. If there exist λ>0 and α>0 such that

limn|xn+1-x*||xn-x*|α=λ

, then we say the sequence converges to x* at the order of α with asymptotic error constant λ. If α=1 with λ<1, the sequence has linear convergence rate. If α=2, the sequence has second order convergence rate.

Bisection method works within a bracketing interval, hence guarantees convergence. It has linear convergence rate. Brent method combines bisection with quadratic interpolation and has the advantage of both guaranteed convergence and superlinear convergence rate, 1α<2. Secant method also has superlinear convergence rate but does not always converge. A secant method with false position correction (so-called “Regula Falsi”) can on-the-fly establishes a bracket interval hence ensures convergence. However, it involves more function evaluation and only has linear convergence rate.

For a smooth function f(x), f(x)0, given an initial guess x0, the Newton-Raphson method generates a sequnce of updates by

xn+1=xn-f(xn)f(xn),n0 (16)

It can be shown that the Newton-Raphson method is equivalent to a fixed point iteration problem, i.e. [1] section 2.2, 2.3. By fixed point iteration convergence theorems, when x0 is within a nearby area of x*, the method converges at quadratic rate. On the other hand, when x0 is far away from x*, the method could diverge for some functions f(x).

5.2 Safeguarded Search Algorithm

In Mathwrist NPL, we provide a root-finding solver class RootSolver1D that can either uses f(x) only or both f(x) and f(x) depending on user control. This solver uses a safeguarded search strategy that combines bisection, rational interpolation and Newton method together to benefit from the advantage of each individual method.

The solver takes 3 parameters a<x0<b as input. Parameter a and b suggests a bracketing interval [a,b]. If this interval actually does not bracket, the solver may extend this interval until it brackets or terminates when the max number of iterations is reached. For the best performance of this algorithm, it is recommended to provide a decent guess of [a,b] based on user’s domain knowledge.

5.2.1 Use f(x)

When f(x) is used, the solver will start in Newton mode and compute trial Newton steps from equation (16) with initial guess x0. If x0 is close to x*, xn converges to x* exactly same as the classic Newton-Raphson method. To handle the case when x0 is far from x*, we perform a series of tests based on fixed point iteration theory to determine whether it is necessary to quit from the Newton mode. The algorithm stops generating trial Newton sequence whenever one of the following situation happens,

  1. 1.

    The tests determined to quit from Newton mode.

  2. 2.

    A trial step xn+1 exceeds the initial interval [a,b].

  3. 3.

    The trial sequence {xn} on the fly establishes a bracketing interval [xk,xn] or [xn,xk], k<n.

In the first two cases, the initial interval [a,b] is examed and extended if necessary. At this point, a true bracketing interval [a,b] is obtained. The algorithm then switches to run in safeguarded mode.

In safeguarded mode, a step update xn+1 may still be computed from Newton equation (16) if the fixed point iteration test still permits. If the test fails or if f(x) is disabled, xn+1 is computed from a rational interpolation. In either case, if xn+1 exceeds the bracketing interval [a,b], bisection method is used.

5.2.2 Not Use f(x)

If f(x) is not enabled initially, the algorithm will directly start from the safeguarded mode once a bracketing interval [a,b] is established. Rational interpolation is used to compute the step updates in this situation.

5.3 Use of RootSolver1D Class

A client application will need define the objective function f(x) by inheritance from the base class Function1D. When f(x) is not available or very expensive to compute, the client application may disable using f(x) on the solver and then ignore the calculaiton of f(x) and f′′(x) inside the Function1D::eval() implementation. By default, RootSolver1D enables the usage of f(x) and expects the objective function to provide the calculation of f(x).

The following example demonstrates a use case of solving f(x)=x-2sin(x)=0.

Listing 11: Example of User-defined Rooting Function
1    struct Fx : public Function1D
2    {
3       double eval(double x, double* _derv1=0, double*_derv2=0) const
4       {
5          if (_derv1 != 0)
6             *_derv1 = 1 - 2 * std::cos(x);
8          return x - 2 * std::sin(x);
9       }
10       // clone is not used, just provide a dummy implementation.
11       Function1D* clone() const { return 0; }
12    };
14    Fx f; // create the objective function
15    RootSolver1D solve(f); // create the solver object
17    // the root
18    double root = 1.895494;
20    // [a, b] is a suggested search interval
21    double a = 1.e-5;
22    double b = 2 * PI();
24    // Initial guess
25    double x0 = 0.1;
27    // Test using derivatives
28    {
29       solve.enable_derivative();
30       double xr = x0;
32       // xr holds root on return. fr holds function value.
33       double fr = solve(a, xr, b);
34       assert(equal(xr, root, 1.e-6));
35       assert(equal(fr, 0));
36    }
37    // Test not using derivatives
38    {
39       solve.enable_derivative(false);
40       double xr = x0;
41       double fr = solve(a, xr, b);
42       assert(equal(xr, root, 1.e-6));
43       assert(equal(fr, 0));
44    }

5.4 Termination Control

The RootSolver1D class is derived from the IterativeMethod base class hence shares all the termination control implemented in IterativeMethod class. In particular, the IterativeMethod::converge_tolerance() method defines a tolerance quantity ϵ for f(x). When |f(xn)|<ϵ, the root finding solver terminates and returns xn as the solution.

6 1-d Minimization

6.1 Introduction

For a univariate function f(x), we want to solve the following minimization problem

minx[xl,xu]f(x) (17)

, where xl and xu are the lower and upper bound of the domain that f(x) is defined. Generally speaking, there could be multiple local optimals in [xl,xu]. Practically, we are only interested at finding a local optimal x* within a resonable range.

Given a bracketing interval [a,b], bisection, Golden section and Fibonacci search use interval reduction strategy to locate a local minimum x*[a,b] until a and b converge. These methods guarantee convergence but have only linear converge rate.

Superlinear or higher order convergence methods can be constructed by using a quadratic model function f^(x) or cubic polynomials to approximate the objective function f(x). For example, if we choose f^(x+p) to be the second order Taylor expansion at point x,

f^(x+p)=f(x)+f(x)p+12f′′(x)p2

, when f′′(x)>0, it suggests a step move p=-f(x)f′′(x) to reach the minimum of f^(x). This is just the Newton step in the equivalent root finding problem f(x)=0. The same limitation and advantage of root-finding Newton method applies in the 1-d minimization context as well. Hence, it suggests a safeguarded approach similar to what we discussed in root finding problems.

The classic Brent method is one of such safeguarded algorithms that uses a quadratic form f^(x) to predict the minimum point. This minimum point is safeguarded by the bracketing interval and interval reduction strategy. It does careful bookkeeping on recent data points and use only function values of these points to fit the quadratic form. This method has superlinear convergence rate.

6.2 Safeguarded Quadratic Approximation

The MiniSolver1D class in NPL combines quadratic polynomial approximation f^(x),

f^(x)=12ax2+bx+c

, together with a Golden section method in a safeguarded strategy to solve the 1-d minimization problem (17). The minimization solver can be customized to use f(x) only or both f(x) and f(x). Users can also set a control of max step length to avoid the step move being too large at each iteration.

The solver has an initial bracketing stage and a subsequent safeguarded searching stage. Three parameters a<x0<b are given to the solver as input. x0 is the initial guess of x*. [a,b](xl,xu) is a good guess of the bracketing interval. Regardless whether f(x) is used, it is possible that we are able to move in a monotonically decreasing sequence all the way down to the boundary point xl or xu. In this situation, the boundary point is returned as the solution by the end of the bracketing stage.

6.2.1 Bracketing Mode

If the initial guess parameters (a,x0,b) do not establish a bracketing interval, the solver will enter bracketing mode to find such an interval. Let point a0 be marked to x0. Let point b0 be one of the initial end points (a,b) such that a0b0 is a descent direction. We then compute the step length using quadratic approximation function f^(x). This is like the line search method for multivariate optimization, except that we are now testing bracket conditions instead of Wolfe conditions.

At each iteration k, if f(x) is disabled, f^(x) is fit to match (f(ak),f(bk),f(ck)), where ck is a Golden section extension point from (ak,bk). Otherwise, f^(x) is fit to (f(ak),f(ak),f(bk)). If f^(x) is convex, its minimum uk is tested with other points for bracketing conditions. When a user control disables f(x) calculation, the bracketing condition involves three points (ak,bk,ck) such that f(ak)>f(bk) and f(ck)>f(bk). If f(x) is available, the bracketing condition implies that either f(bk)>f(ak) or f(bk) has different sign from f(ak). Whenever an extension point is needed, we use Golden section method.

6.2.2 Safeguarded Mode

Once a bracketing interval is identified, the algorithm switches to a safeguarded mode that searches x* strictly inside the interval. Again, f^(x) is fit to three points with function values or two points with derivatives depending on if f(x) is available. If the minimum of f^(x) locates at the outside of the interval, or does not reduce the interval as much as a Golden section search, the Golden section point is used to replace one of the interval end point. This process repeats until the bracketing interval end points converge. The idea of safeguarded polynomial search can be found for example in [2], section 4.1.2.4.

6.3 Use of MiniSolver1D Class

Users need supply an objective function f(x) exactly the same way as the root-finding case in section 5.3. By default, the MiniSolver1D solver uses derivatives f(x). This behavior can be explicitly enabled or disabled by calling the solver class member function void enable_derivative(bool use_df). Users can also put a control on the max step length that x is allowed to move between iterations by calling void set_max_step(double step) function.

The following example demonstrates a use case of solving the minimum of function f(x)=-(16x2-24x+5)e-x.

Listing 12: Example of User-defined Function (1-d minimization)
1    struct Fx : public Function1D
2    {
3       // Clone is not used in this example,
4       // just supply a dummy implementation.
5       Function1D* clone() const { return 0; }
7       // Supply implementation of computing f(x) and f’(x)
8       double eval(double x, double* _derv1, double* _derv2) const
9       {
10          double exp_x = std::exp(-x);
11          double quadratic = 16 * x * x - 24 * x + 5;
13          if (_derv1 != 0)
14          {
15             *_derv1 = quadratic * exp_x - (32 * x - 24) * exp_x;
16          }
18          return -quadratic * exp_x;
19       }
20    };
22    // Create the objective function f(x)
23    Fx f;
25    // Create the solver to search between
26    // lower bound 1.9 and upper bound 3.9.
27    MiniSolver1D solve(f, Bound(1.9, 3.9));
29    // Use derivatives in the solver
30    solve.enable_derivative();
32    // A guess of bracketing interval [a,b].
33    double a = 2;
34    double b = 3;
36    // Initial guess 2.2
37    double x = 2.2;
39    // The actual solution
40    double x_star = 2.868034;
42    double fx = solve(a, x, b);
43    assert(equal(x, x_star, 1.0e-6));
44    assert(equal(fx, f.eval(x_star), 1.0e-6));

6.4 Termination Control

The MiniSolver1D solver class is an iterative method derived from the base class IterativeMethod. It inherits the termination controls implemented by the base class. In particular, the IterativeMethod::set_converge_tolerance() function defines a convergence tolerance δ. At each iteration in the safeguarded search stage, if the bracketing interval [a,b] satisfies |a-b|<δ, it is considered that the algorithm has converged. In practice, we don’t recommend to set δ<ϵ, where ϵ is the machine precision. A convergence tolerance smaller than ϵ, in the magnitude of 1.e-8 on a 32-bit machine, generally won’t reduce function f(x) values further hence just wastes CPU time.

References

  • [1] Richard L. Burden, Douglas J. Faires and Annette M. Burden: Numerical Analysis, 10th edition, Cengage Learning, 2016
  • [2] P. E. Gill, W. Murray and M. H. Wright: Practical Optimization, Emerald, first edition 2007
  • [3] Trevor Hastie, Robert Tibshirani and Jerome Friedman: The Elements of Statistical Learning, Springer, 2001
  • [4] P. M. Prenter: Splines and Variational Methods, Dover Publications, 2008
  • [5] Carl de Boor: A Practical Guide to Splines, Revised Edition, Applied Mathematical Sciences Volume 27, Springer, 2001
  • [6] John P. Boyd: Chebyshev and Fourier Spectral Methods, second edition (revised), Dover Publications, 2001
  • [7] C. Canuto, M. Y. Hussaini, A. Quarteroni and T. A. Zang: Spectral Methods, Springer, 2006