1-d and n-d Functions
Mathwrist White Paper Series
Abstract
This document presents a technical overview of 1-dimensional and n-dimensional functions in Mathwrist’s C++ Numerical Programming Library (NPL). We first briefly introduce 1-d and n-d function interfaces and then discuss the 1-d interpolation, integration, root finding and minimization features in details.
1 Hierarchy of 1-d Math Functions
We provide Function1D class as the common interface for all 1-dimensional functions to be used in the library. For example, our 1-d root finding solver takes Function1D type objects as the objective function.
A client application needs derive from the 1-d function interface and implements clone(), eval() and integral() virtual functions as necessary. The eval() function returns function value at a given point . It also takes optional arguments _derv1 and _derv2. When non-zero pointers are passed to these parameters, it is expected that the derived class computes the first and second derivatives of . On the return of eval(), it is expected that and are copied to the addresses provided by those non-zero pointers.
The integral() virtual function has a default implementation that is to throw an un-supported exception. If a client application does need to compute integrals, it can override this default behavior with its own implementation. The Function1D base class provides a protected member function that numerically approximates integrals by Quassian Quadrature. For 1-d smooth functions, a client application may choose to override integral() with the quadrature implementation as the example below.
1.1 1-d Piecewise Functions
Mathematically modelling often uses piecewise functions for simplicity. FunctPiecewise is an intermediate class derived from Function1D base class to handle piecewise lookup. Concrete piecewise functions are further derived from this intermediate class with both eval() and integral() functions being implemented.
Given number of knot points in ascending order, these concrete piecewise functions are strictly defined within the range . When numerically equal comparison is involved, we use to test if and are equal with numerical tolerance . When comparison is involved, it should be interpretted as is numerically not greater than .
1.1.1 Piecewise Constant Lookup
Given knot points and number of discrete function values , the FunctPiecewiseConst function computes function value and derivatives as,
1.1.2 Piecewise Linear Lookup
Given knot points and number of discrete function values , the FunctPiecewiseLinear function computes function value and derivatives at a point , as,
1.1.3 Piecewise Linear(Jump) Lookup
Given knot points and number of discrete function values , the FunctPiecewiseLinearJump function computes function value and derivatives at a point , as,
1.1.4 B-spline Curve
A B-spline curve of order is defined as , where is the -th B-spline basis function of order and is the coefficient of the -th basis.
Let be the set of knot points over domain . To facilitate the definition of a B-spline basis function, it is necessary to introduce an augmented knot point vector
, where we conveniently choose and . Note that we extend the boundary points and by times at both sides. So the total number of knot points is .
A -order B-spline basis function is a piecewise polynomial of degree defined by the following recurrence relation:
(2) | |||||
(5) |
From the base case definition (5), it is clear that B-spline of order 1 is a piecewise constant function. B-spline of order 2 is a piecewise linear function. B-spline of higher order is a -times continuously differentiable function.
B-spline basis functions are fully defined by the knot points and the order of the spline. There are total number of basis functions. A B-spline curve is then parameterized by the basis coefficients . Now let be the vector of B-spline basis functionals. We can then write the B-spline curve as .
In order to evaluate a B-spline function efficiently, it is critical to avoid double calculation introduced by the recurrence of -order terms in equation (2). Given arbitrary , we developed a very efficient evaluation scheme to compute the function values, derivatives and integrals of the B-spline curve function. Some good references of B-splines include [3] (chapter 5 appendix), [4] and [5].
1.2 Chebyshev Curve
Chebyshev curve is a 1-d function strictly defined within a fixed domain . It is a direct subtype of the Function1D class.
A Chebyshev curve of order is defined as , where is the -th order Chebyshev basis polynomial of the first kind. is the coefficient of . Chebyshev basis polynomial is defined by the following recurrence relation over the canonical domain ,
Mathwrist NPL mapps the physical variable to canonical variable of the Chebyshev basis functions by . Once the range and the order of Chebyshev curve is determined, the function is fully parametrized by the coefficients . Again, we can write , where is the vector of Chebyshev basis functionals. Our main references include [1], [6] and [7].
2 Hierarchy of n-d Math Functions
We provide FunctionND class as the common interface for all n-dimensional functions to be used in the library. For example, our unconstrained or constrained optimization solvers take FunctionND objects as the objective function. A client application needs derive from the n-d function interface and implement the clone(), eval() and integral() functions as necessary.
The eval() function is a pure virtual function. Concrete derived class needs implement it to return the function value at a given point . This function also takes optional arguments _g and _H. When non-zero pointers are passed to these parameters, it is expected that the derived class computes the gradient vector and Hessian matrix. On the return of eval(), it is expected that gradient and Hessian have been copied to the addresses provided by those non-zero pointers.
Like the 1-d case, the integral() virtual function provides a default implementation of throwing an un-supported exception. This serves for most client applications that do not involve computing multi-dimensional integrals.
2.1 2-d Piecewise Functions
Class FunctPiecewise2D is an intermediate class, derived from the general FunctionND base class. The 2-d piecewise functions strictly work within a rectangular grid, defined by knot points along axis and knot points along axis. In other words, its working domain is .
2.1.1 2-d Piecewise Constant Lookup
For a grid, we require number of discrete function values , and to build the function lookup table.
Function FunctPiecewise2DConst has 0 gradient and Hessian everywhere in the grid.
2.1.2 B-spline Surface
After the knot points are determined, we then choose the order of B-spline basis along each dimension. Let be the vector of B-spline basis functions of order along . Let be the vector of B-spline basis of order along . A B-spline surface function is defined as the tensor product of these two vectors of basis functionals.
, where is the basis coefficient matrix.
2.1.3 Bilinear Surface
Bi-linear surface is a special case of the B-spline surface, where the basis orders are 2. For efficiency purposes, this special case is separately represented by the class BilinearSurface, derived from FunctPiecewise2D.
2.2 Chebyshev Surface
Chebyshev surface is a 2-d function defined strictly within a fixed domain . It is a direct subtype of FunctionND class.
Similar to B-spline surface, Chebyshev surface is defined as the tensor product of two vectors of Chebyshev basis functionals (first kind).
, where is the basis coefficient matrix. Function mapps to , mapps to .
3 1-d Interpolation
3.1 Introduction
For a unknown function that is continuous over domain , we observe a list of discrete data points , . One way to approximate this unknown function is to construct an interpolation function that “exactly” matches all those data points .
By Weierstrass Approximation theorem, we know that for any , there exists a polynomial such that . It is then a natural choice to use polynomials as the approximation functions due to algebraic simplicity. It can be shown that there exists a unique polynomial of degree at most such that , . Further, this polynomial can be directly constructed by
(7) | |||||
(8) |
This representation form is called the -th Lagrange interpolation polynomial. The interpolation polynomial has an error term
(9) |
, where is the -th derivative for some .
It is possible to construct an interpolation polynomial to match both function values and derivatives. In particular, a Hermite polynomial of at most degree agrees with and at . can be constructed as
The Hermite polynomial has an error term
The formulation of Lagrange polynomials, equations (7) and (8), is very convenient and useful in theoretical analysis. However, matching an additional data point by Lagrange or Hermite interpolation both involves increasing the degree of the interpolation polynomial. High degree polynomials with roughly equally spaced data points tend to have very large oscillating error when moves away from an interpolation data point. The error can go extremely wild near the end points, known as “Runge phenomenon”.
3.2 Cubic Spline Interpolation
Instead of using a single high degree polynomial, cubic spline interpolation uses piecewise cubic polynomials that jointly form a smooth curve to approximate the unknown function . Let be the cubic polynomial over the interval for data point . The set of piecewise polynomials satisfy the following conditions,
(10) | |||||
(11) | |||||
(12) | |||||
(13) | |||||
(14) |
There are a few ways to parameterize a cubic spline interpolation. Suppose we write as
We have number of piecewise polynomials hence number of coefficients to solve. Conditions (14) contribute equations. Two additional conditions are usually applied to the end points and . Natural boundary conditions require and . A clamped cubic spline has and . All together, we have equations to solve the number of coefficients.
3.3 Generalized Cubic Spline with Shape Control
In Mathwrist NPL, we generalize the idea of cubic spline interpolation to be able to match one, two or all three quantities of at a data point . Effectively, users can control the level, slope and curvature of the spline curve at any data point, not just the boundary points in natural spline and clamped spline.
Let be the cubic spline interpolation function. We choose a formulation of using B-spline basis of order 4, , where is the total number of B-spline basis functions. is the -th B-spline basis of order 4. is the coefficient of the -th basis. Here is determined from the input number of unique data points and the values we want to match at . As a minimum, we need at least input data point and value combinations. We then place spline knot points internally using an automatic knot-point placement scheme. Lastly, is computed by solving a linear system.
3.4 Use of InterpCubicSplineCurve Class
In order to use the generalized cubic spline interpolation, users need create objects from the InterpCubicSplineCurve class, which is a subtype derived from base class Function1D. Next, we need prepare a buffer of interpolation Point objects, where the Point data type is defined as a nested struct in class InterpCubicSplineCurve,
A data point includes coordinate and the value from one of , or to match. If an application needs to match multiple values at the same , users will need populate multiple points with different value types. It is not necessary to sort data points by . But the min and max of in supplied data points are interpreted as the domain range that is defined. Once the cubic spline is fit to interpolation data points, it can be used as a regular 1-d function to compute , and for any . A function evaluation at will trigger an exception.
The code below illustrates an example of interpolating over with natural boundary condition , slope control and three values , and to match.
4 1-d Integration
4.1 Introduction
The idea of numerically approximating a finite integral starts from approximating the integrand function by an interpolation polynomial passing through a list of data points . This interpolation polynomial can be constructed from (7). Consequently,
This approximation is called a quadrature rule with quadrature weights
(15) |
The degree of accuracy of a quadrature rule is measured by the highest integer such that the rule produces exact result for all . By construction, the interpolation polynomial has degree . The error term (9) then is 0 for . Therefore regardless how the data points are placed, the qudrature rule constructed from points has at least degree of accuracy.
4.2 Composite Rule and Romberg Integration
The family of Newton-Cotes quadrature formulas, i.e. the classic trapezoidal, Simpson’s rules, use equally spaced data points. Large oscillating error occurs when we increase the number of data points with a single interpolation polynomial. To avoid this, low degree polynomials are used to approximate in small subintervals of . Then we can sum up the quadrature results of subintervals and derive composite quadrature rules.
Romberg integration is a further improvement of composite quadrature rules. When equally spaced data points are used, it is possible to cancel out the lower order error term by Richardson’s extrapolation. Practically, Romberg integration can take accuracy order or error tolerance as input. The method starts iterations from low order approximation and terminates until the desired order of accuracy is reached, or the approximation error becomes smaller than the tolerance.
4.3 Adaptive Gaussian Quadrature
Unlike Newton-Cotes quadrature formulas, Gaussian quadrature chooses data points in an optimal way such that the degree of accuracy becomes to . It can be shown that is the highest accuracy that one can possibly reach with data points.
It turns out that the optimal data points over interval are the roots of the -th Legendre polynomial. Gaussian quadrature weights computed from (15) are strictly positive. This is a very nice property for stable numerical integration. One way to prove the degree of accuracy can be found in [1], section 4.7.
In Mathwrist NPL, we provide the Integrate1D class that implements an adaptive 3-point Gaussian quadrature method. Generally speaking, the idea of adaptive methods applies to equally spaced composite rules as well. When the integrand function has dramatic curvature changes, the adaptive methods tend to use less number of data points over smooth area and increase the number of data points if curvature changes a lot. The details of adaptive quadrature can be found in [1], section 4.6.
4.4 Use of Integrate1D Class
A client application first needs define the integrand function by inheriting the Function1D interface class. The quadrature rule only requires to evaluate function values , not derivatives. Next we need pass a constant reference of the integrand function to the quadrature constructor. Meanwhile, the constructor also accepts an integration precision parameter that controls if the adaptive method needs refine integral intervals. Let denote the quadrature result over an interval .
At the -th iteration of the adaptive method, assume we are working on sub-interval , which is further splitted to two smaller subintervals and . If
, then we consider the quadrature result over is accurate enough and doesn’t need to be refined with more data points. The code block below illustrates an example of numerically integrating .
5 1-d Root Finding
5.1 Introduction
The root finding problem of a continuous univariate funciton is to find the root such that . Root finding methods can be classified into two categories. The first class methods only use the function values . Some well-known first class methods include bisection, brent, secant, etc. The second class methods use both and its derivatives, typically only in practice. The most powerful method in the second class probably is Newton-Raphson method.
Let be a sequence converging to root . If there exist and such that
, then we say the sequence converges to at the order of with asymptotic error constant . If with , the sequence has linear convergence rate. If , the sequence has second order convergence rate.
Bisection method works within a bracketing interval, hence guarantees convergence. It has linear convergence rate. Brent method combines bisection with quadratic interpolation and has the advantage of both guaranteed convergence and superlinear convergence rate, . Secant method also has superlinear convergence rate but does not always converge. A secant method with false position correction (so-called “Regula Falsi”) can on-the-fly establishes a bracket interval hence ensures convergence. However, it involves more function evaluation and only has linear convergence rate.
For a smooth function , , given an initial guess , the Newton-Raphson method generates a sequnce of updates by
(16) |
It can be shown that the Newton-Raphson method is equivalent to a fixed point iteration problem, i.e. [1] section 2.2, 2.3. By fixed point iteration convergence theorems, when is within a nearby area of , the method converges at quadratic rate. On the other hand, when is far away from , the method could diverge for some functions .
5.2 Safeguarded Search Algorithm
In Mathwrist NPL, we provide a root-finding solver class RootSolver1D that can either uses only or both and depending on user control. This solver uses a safeguarded search strategy that combines bisection, rational interpolation and Newton method together to benefit from the advantage of each individual method.
The solver takes 3 parameters as input. Parameter and suggests a bracketing interval . If this interval actually does not bracket, the solver may extend this interval until it brackets or terminates when the max number of iterations is reached. For the best performance of this algorithm, it is recommended to provide a decent guess of based on user’s domain knowledge.
5.2.1 Use
When is used, the solver will start in Newton mode and compute trial Newton steps from equation (16) with initial guess . If is close to , converges to exactly same as the classic Newton-Raphson method. To handle the case when is far from , we perform a series of tests based on fixed point iteration theory to determine whether it is necessary to quit from the Newton mode. The algorithm stops generating trial Newton sequence whenever one of the following situation happens,
-
1.
The tests determined to quit from Newton mode.
-
2.
A trial step exceeds the initial interval .
-
3.
The trial sequence on the fly establishes a bracketing interval or , .
In the first two cases, the initial interval is examed and extended if necessary. At this point, a true bracketing interval is obtained. The algorithm then switches to run in safeguarded mode.
In safeguarded mode, a step update may still be computed from Newton equation (16) if the fixed point iteration test still permits. If the test fails or if is disabled, is computed from a rational interpolation. In either case, if exceeds the bracketing interval , bisection method is used.
5.2.2 Not Use
If is not enabled initially, the algorithm will directly start from the safeguarded mode once a bracketing interval is established. Rational interpolation is used to compute the step updates in this situation.
5.3 Use of RootSolver1D Class
A client application will need define the objective function by inheritance from the base class Function1D. When is not available or very expensive to compute, the client application may disable using on the solver and then ignore the calculaiton of and inside the Function1D::eval() implementation. By default, RootSolver1D enables the usage of and expects the objective function to provide the calculation of .
The following example demonstrates a use case of solving .
5.4 Termination Control
The RootSolver1D class is derived from the IterativeMethod base class hence shares all the termination control implemented in IterativeMethod class. In particular, the IterativeMethod::converge_tolerance() method defines a tolerance quantity for . When , the root finding solver terminates and returns as the solution.
6 1-d Minimization
6.1 Introduction
For a univariate function , we want to solve the following minimization problem
(17) |
, where and are the lower and upper bound of the domain that is defined. Generally speaking, there could be multiple local optimals in . Practically, we are only interested at finding a local optimal within a resonable range.
Given a bracketing interval , bisection, Golden section and Fibonacci search use interval reduction strategy to locate a local minimum until and converge. These methods guarantee convergence but have only linear converge rate.
Superlinear or higher order convergence methods can be constructed by using a quadratic model function or cubic polynomials to approximate the objective function . For example, if we choose to be the second order Taylor expansion at point ,
, when , it suggests a step move to reach the minimum of . This is just the Newton step in the equivalent root finding problem . The same limitation and advantage of root-finding Newton method applies in the 1-d minimization context as well. Hence, it suggests a safeguarded approach similar to what we discussed in root finding problems.
The classic Brent method is one of such safeguarded algorithms that uses a quadratic form to predict the minimum point. This minimum point is safeguarded by the bracketing interval and interval reduction strategy. It does careful bookkeeping on recent data points and use only function values of these points to fit the quadratic form. This method has superlinear convergence rate.
6.2 Safeguarded Quadratic Approximation
The MiniSolver1D class in NPL combines quadratic polynomial approximation ,
, together with a Golden section method in a safeguarded strategy to solve the 1-d minimization problem (17). The minimization solver can be customized to use only or both and . Users can also set a control of max step length to avoid the step move being too large at each iteration.
The solver has an initial bracketing stage and a subsequent safeguarded searching stage. Three parameters are given to the solver as input. is the initial guess of . is a good guess of the bracketing interval. Regardless whether is used, it is possible that we are able to move in a monotonically decreasing sequence all the way down to the boundary point or . In this situation, the boundary point is returned as the solution by the end of the bracketing stage.
6.2.1 Bracketing Mode
If the initial guess parameters do not establish a bracketing interval, the solver will enter bracketing mode to find such an interval. Let point be marked to . Let point be one of the initial end points such that is a descent direction. We then compute the step length using quadratic approximation function . This is like the line search method for multivariate optimization, except that we are now testing bracket conditions instead of Wolfe conditions.
At each iteration , if is disabled, is fit to match , where is a Golden section extension point from . Otherwise, is fit to . If is convex, its minimum is tested with other points for bracketing conditions. When a user control disables calculation, the bracketing condition involves three points such that and . If is available, the bracketing condition implies that either or has different sign from . Whenever an extension point is needed, we use Golden section method.
6.2.2 Safeguarded Mode
Once a bracketing interval is identified, the algorithm switches to a safeguarded mode that searches strictly inside the interval. Again, is fit to three points with function values or two points with derivatives depending on if is available. If the minimum of locates at the outside of the interval, or does not reduce the interval as much as a Golden section search, the Golden section point is used to replace one of the interval end point. This process repeats until the bracketing interval end points converge. The idea of safeguarded polynomial search can be found for example in [2], section 4.1.2.4.
6.3 Use of MiniSolver1D Class
Users need supply an objective function exactly the same way as the root-finding case in section 5.3. By default, the MiniSolver1D solver uses derivatives . This behavior can be explicitly enabled or disabled by calling the solver class member function void enable_derivative(bool use_df). Users can also put a control on the max step length that is allowed to move between iterations by calling void set_max_step(double step) function.
The following example demonstrates a use case of solving the minimum of function .
6.4 Termination Control
The MiniSolver1D solver class is an iterative method derived from the base class IterativeMethod. It inherits the termination controls implemented by the base class. In particular, the IterativeMethod::set_converge_tolerance() function defines a convergence tolerance . At each iteration in the safeguarded search stage, if the bracketing interval satisfies , it is considered that the algorithm has converged. In practice, we don’t recommend to set , where is the machine precision. A convergence tolerance smaller than , in the magnitude of on a 32-bit machine, generally won’t reduce function values further hence just wastes CPU time.
References
- [1] Richard L. Burden, Douglas J. Faires and Annette M. Burden: Numerical Analysis, 10th edition, Cengage Learning, 2016
- [2] P. E. Gill, W. Murray and M. H. Wright: Practical Optimization, Emerald, first edition 2007
- [3] Trevor Hastie, Robert Tibshirani and Jerome Friedman: The Elements of Statistical Learning, Springer, 2001
- [4] P. M. Prenter: Splines and Variational Methods, Dover Publications, 2008
- [5] Carl de Boor: A Practical Guide to Splines, Revised Edition, Applied Mathematical Sciences Volume 27, Springer, 2001
- [6] John P. Boyd: Chebyshev and Fourier Spectral Methods, second edition (revised), Dover Publications, 2001
- [7] C. Canuto, M. Y. Hussaini, A. Quarteroni and T. A. Zang: Spectral Methods, Springer, 2006