Data and Model Fitting
Mathwrist White Paper Series
Abstract
This document presents a technical overview of data and model fitting features implemented in Mathwrist’s C++ Numerical Programming Library (NPL).
1 Linear Least Square
Linear regression assumes a dimensional linear model parameterized by coefficient vector . Given number of observations , we can fit the linear regression model by least squre fit,
(1) |
The -th element of vector and the -th row of matrix are the observation and .
The solution of linear least square problem (1) can be obtained by solving the linear system . Matrix is at least positive semi-definite. If it has rank deficiency or is ill-conditioned, a better way of calibrating model parameter is to add a regularization term and solve,
(2) |
, where is a penalty factor and the regularization matrix is positive definite. If is the identity matrix, (2) becomes to the standard ridge regression. Therefore problem formulation (2) sometimes is also called generalized ridge regression, which has the closed form solution:
(3) |
Mathwrist NPL provides a function linear_fit::grr() to solve this generalized ridge regression problem (2).
1.1 Generalized Cross Validation
Based on the linear regression model assumption, we can write , where is the residual (noise) vector. In generalized ridge regression (2), the penalty factor could be an experimental choice. However, it is often desired to automatically choose the “optimal” penalty based on the noisy level of .
Generalized Cross Validation (GCV) is a very successful technique to compute based on the assumption that followings a Gaussian distribution. In GCV, the optimal penalty factor is the solution of the following 1-dimensional minimization problem.
(4) |
, where is the unique symmetric influence matrix,
If a parameter is passed to function linear_fit::grr(), we will use GCV to compute to solve the generalized ridge regression (2).
1.2 Linear Constraints
The linear model parameter maybe imposed to general linear constraints and simple bounds. Mathwrist NPL provides a function linear_fit::lsq() to solve the following linearly constrained linear least square problem,
(5) | |||
(6) | |||
(7) |
, which effectively is a convex quadratic programming (QP) problem.
2 Nonlinear Least Square
Given a nonlinear mapping function with model parameter and number of observations , , the nonlinear least squre fit recovers model parameters by minimizing the norm of residual vector,
(8) |
, where the -th element . Let be the Jacobian matrix of the residual vector . The gradient and Hessian of are
(9) | |||||
(10) | |||||
(11) |
In other words, the gradient of the nonlinear least square objective function in (8) is the residual weighted sum of the gradient of each residual. The Hessian of has a first order term , which is at least positive semi-definite and a second order term , which is a residual weighted sum of the Hessian of each individual residual function.
NPL provides a modified Gauss-Newton method and a Levenberg-Marquardt method to solve the unconstrained nonlinear least square problem (8). The modified Gauss-Newton method uses a line search based algorithm with additional care to the second order Hessian term . Our implementation of the Levenberg-Marquardt method is a special case of the general trust region based algorithm.
2.1 Modified Gauss-Newton
The classic Gauss-Newton method approximates the true Hessian matrix in equation (11) by . It uses a line search algorithm and iteratively computes a Newton search direction at each step,
This Hessian approximation has quite a few limitations. First, if the Jacobian matrix doesn’t have full rank, it produces unstable model calibration. Second, if the residual is naturally large or non-negligible at certain point of the model calibration process, ignoring the second order term in equation (11) produces incorrect search direction .
To address this, we implemented a modified Gauss-Newton method that follows a similar idea as [2], chapter 4. Effectively, at the -th iteration in the line search, we recompute the SVD decomposition . It can be shown that the Newton direction wrt the true Hessian solves
(12) |
, where . Let be the leading submatrix of number of dominant singulars in . Accordingly, let be the first columns of , in other words the principle components. We test whether is small enough relative to the smallest singulars in . If so, we can ignore the second order term from the true Hessian and just compute a direction spanned by the principle components. Let be the first elements of . Equation (12) reduces to,
If cannot be ignored, we compute it using finite difference and solve the direction in the full space of , . Equation (12) becomes to,
(13) |
The second order term could be indefinite. We use modified Cholesky to solve equation (13). This is similar to the modified Newton method documented in our unconstrained optimization white paper.
Our modified Gauss-Newton method terminates whenever any of the following conditions are satisfied in a line search iteration,
-
1.
the residual satisfies , where is the residual tolerance.
-
2.
the max number of iterations is reached.
-
3.
the gradient in equation (9) satisfies , where is the convergence tolerance.
2.2 Levenberg-Marquardt
Levenberg-Marquardt (LM) method is a special case of the trust region algorithm that uses to approximate the true Hessian (11). At the -th trust region iteration, we solve a sub problem
(14) |
is a solution of the trust region subproblem (14) if and only if such that
-
1.
is positive semidefinite and
-
2.
and
-
3.
The first condition is automatically satisfied in the case of LM method. Write as a function of computed from the second condition,
(15) |
If , is an exact solution of trust region sub problem (14). Otherwise we can always find a such that . The main cost of root finding is to repeatedly solve equation (15). Let have a QR decomposition . Based on the idea in [1], section 10.2, we can economically obtain an upper triangular from such that
Also, as suggested in [1] section 4.2, we perform root finding on a more numerically stable formulation . The LM method terminates whenever any of the following conditions are satisfied after solving a trust region sub problem (14),
-
1.
the residual satisfies , where is the residual tolerance.
-
2.
the max number of iterations is reached.
-
3.
the gradient equation (9) satisfies , where is the convergence tolerance.
2.3 Regularization and Constraints
In practice, it is often desired to regularize the model parameters and perhaps impose additional constraints. Mathwrist NPL offers a general nonlinear least square fit method that recovers model parameters from the following optimization problem,
(16) | |||
(17) | |||
(18) |
, where again and the second term is a regularization term. is at least positive semidefinete and typically choosen to make the model “smooth”, or close to certain desired level. The penalty factor could be a user input or automatically computed by the Generalized Cross Validation (GCV) method.
If the constraint specification in (18) is missing, our general nonlinear least square uses the line search modified Newton method to solve an unconstrained optimiation problem. Otherwise, it solves a linearly constrained optimization problem using a combination of line search modified Newton method and active-set method. The algorithm terminates whenever any of the following conditions are satisfied at a major iteration,
-
1.
the residual satisfies , where is the residual tolerance.
-
2.
the max number of iterations is reached.
-
3.
both the stationary and convergence conditions are satisfied with , where is the convergence tolerance.
3 Curve Fitting
Curve fitting is a 1-dimentional function approximation problem. Given a set of data points observed from a unknown function , for , we want to approximates by a smooth curve that is parameterized by .
Let be a vector of basis functions of certain form. We can then write the curve approximation function as a linear combination of the basis vector and coefficient vector .
In order to recover model parameter from observed data points , we offer two curve fitting methods. The first approach is a regularized least squre fit. The second approach minimizes some curve roughness measure subject to fitting error constraints.
3.1 Formulation
In the first approach, we construct a basis matrix where the -th element of the matrix is . The sum of square of residuals (SSR) is
We recover the curve model parameter from data with desired “smoothness” property by solving a regularized least square problem,
(19) |
The second term in formulation (19) is a regularization term that penalizes the roughness of curve . The roughness matrix needs be at least positive semi-definite. The penalty factor could be a user input or automatically computed by GCV method.
Alternatively, we can formulate the smooth curve fitting problem as minimizing the roughness of the curve and subject to the constraints that fitting errors are under certain tolerance level.
(20) | |||
(21) |
3.2 Choices of Basis
We provide two choices of basis functions , namely B-spline and Chebyshev polynomial of the first kind. B-spline basis are piecewise polynomials of the same degree but different “locality”. Chebyshev polynomials are orthogonal polynomials of different degree but on the same global canonical domain . Internally, we map physical domain to canonical domain . These two different basis choices are represent by C++ class SmoothSplineCurve and SmoothChebyshevCurve respectively.
Because B-spline basis are overlapped piecewise polynomials, the placement of knot points has significant impact to the result of fitting. In general, it is desired to have data points uniformally scattered across all pieces. If the data implies dramatic curvature changes, adding an extra connecting piece could help smooth transition. Based on our experience, B-spline could be vulnerable to data noises. Careful attention to regularization and perhaps with additional shape constraints are often needed.
Chebyshev polynomial has many nice theoretical properties. Most relevant to data fitting, it is very close to the ideal “minimax” approximation polynomial. In other words, Chebyshev polynomials are somewhat naturally “smooth” and numerically more stable. Each Chebyshev basis function is a “global” polynomial over . We just need determine the number of basis to be used in , which is the highest degree of Chebyshev polynomial series. Increasing of course increases the freedom of fitting. Practically, very large degree , i.e. greater than 7 or 8, is often unnecessary.
3.3 Roughness Measure
The regularization choice in formulation (19) and objective function in formulation (21) are to make the curve function “smoother”. If one chooses being the identity matrix, the roughness measure is to reduce the norm of basis coefficients , which tends to produce a flat curve closer and closer to as we increase the penalty factor .
3.3.1 Derivative Based
The classic definition, i.e. chapter XIV in [4], of the roughness measure of curve function over is
Let denote the -th order derivative of . NPL offers 4 levels of derivative-based roughness matrix construction, .
Once users make the choice on , we internally carry out calculations to write the roughness measure in the form of .
3.3.2 Divided Difference Based
The classic roughness measure favorites small magnitude of curvature, but does not have preference over the sign change of curvature. For this reason, we offer another regularization choice that penalizes the divided difference of derivatives,
, where the index traverses through the knot points for B-spline basis and predefined points for Chebyshev basis. Again, users only need choose the level of . We internally compute .
3.4 Roughness Leverage
It is possible to partition into sub intervals and apply different roughness weights in different sub interval, i.e.
The total roughness measure then is a weighted sum of local roughness. The sub interval weights play the role as micro leverage factors. Users can set the leverage factors through a piecewise constant function .
3.5 Shape Constraints
In both formulations (19) and (21), we may impose additional shape constraints at selected points ,
(22) |
The shape constraints (22) effectively restrict the function value , slope or curvature to be bounded within a certain range. For example, we could impose and constraints to fit a curve that satisfies natural boundary conditions.
3.6 Nonlinear Model Fit
Here, we design a mathematical model that is to be approximated by a smooth curve. The dependent variable is connected to independent variable by a known nonlinear mapping function through the model, i.e. .
Let be the vector of residual functions where the -th element is the residual error for observation . The formulation (19) changes to the following regularized nonlinear least square problem,
(23) |
And formulation (21) now becomes to an nonlinear programming problem,
(24) | |||
(25) |
Again, additional curve shape constraint (22) may be imposed to both formulation (23) and (25). These two model calibration methods internally use our linearly constraint (LC) optimization solver and nonlinear programming (NLP) solver respectively. Please refer to our LC and NLP white paper for further details.
4 Surface Fitting
Given number of data points , , which are observed from an unknown 2-dimensional function , we want to approximate by a smooth surface function written as the tensor product of two sets of basis functions and ,
(26) |
, where is the coefficient matrix of the tensor product. The objective of surface fitting is to recover from observed data points.
In analogue to curve fitting, we offer two surface fitting methods in Mathwrist NPL. The first approach once again is a regularized least square fit,
(27) |
, where is some choice of roughness regularization term. is the roughness penalty factor from user input or computed by GCV method. In the alternative approach, we directly minimize the roughness measure subject to bounded fitting error constraints.
(28) | |||
(29) |
4.1 Choice of Basis
The basis functions and are of the same type, either B-spline or Chebyshev polynomials of the first kind. These two basis choices are represented by two different C++ surface class types, SmoothSplineSurface and SmoothChebyshevSurface respectively. As a special case of B-spline basis, when we choose and to be piecewise linear functions, function is a bi-linear surface. This special case deserves separate treatment. It is represented by a separate surface C++ class SmoothBilinearSurface for more efficient calculation.
4.2 Roughness Measure
We offer 3 different surface roughness measures, Frobenius norm, Dirichlet energy and Thin-plate energy. Frobenius norm is equivalent to the norm in curve fitting situation.
Let be the domain of the surface. Dirichlet energy is defined as the square integral of the gradient norm,
The standard “thin-plate energy” is a rotation-invariant measure defined as the following,
Thin-plate energy roughness is not applicable to bilinear surface.
4.3 Roughness Leverage
Users can partition the whole surface domain into sub areas , , and supply a 2-d piecewise constant function to surface fitting.
Let be the weights computed from for area . We then compute the total roughness measure as the weighted sum of roughness over those sub surface areas.
4.4 Shape Constraints
4.5 Nonlinear Model Fit
Similar to nonlinear curve model fitting, here we have a known 2-d nonlinear mapping function that connects independent variables to function value through a model . It is the model function that is approximated by a smooth surface, i.e. .
Let be the vector of residual functions where the -th element is the residual error for observation . Users can choose to calibrate surface model parameter either by a regularized nonlinear least square fit,
(31) |
, or by solving an nonlinear programming problem,
(32) | |||
(33) |
References
- [1] Jorge Nocedal and Stephen J. Wright: Numerical Optimization, Springer, 1999
- [2] Philip E. Gill, Walter Murray and Margaret H. Wright: Practical Optimization, Academic Press, 1981
- [3] Trevor Hastie, Robert Tibshirani and Jerome Friedman: The Elements of Statistical Learning, Springer, 2001
- [4] Carl de Boor: A Practical Guide to Splines, Revised Edition, Applied Mathematical Sciences Volume 27, Springer, 2001
- [5] John P. Boyd: Chebyshev and Fourier Spectral Methods, second edition (revised), Dover Publications, 2001