Numerical Linear Algebra
Mathwrist White Paper Series
Abstract
This document presents a technical overview of the numerical linear algebra (LA) functionality implemented in Mathwrist’s C++ Numerical Programming Library (NPL). The LA portion of NPL is a mixture of in-house implementation and wrapping third party’s optimized Blas/Lapack functions in an object-oriented approach, i.e. as memeber functions of the Matrix class. For the later, we explicitly list those functions in relevant sections. For the in-house implementation part, we will give more implementation details.
1 Matrix, View and Iterator
1.1 Matrix
One fundamental built-in block of NPL is the Matrix class. We don’t introduce a separate data type for vectors. Instead, they are just 1-row or 1-column instances of the matrix class. Matrix row and column index starts from 0.
A matrix object is always associated with some underlying memory storage. The actual memory allocation scheme is handled by the library as implementation details. From a user perspective, it is sufficient to know that dynamic memory is allocated to store matrix elements. When a matrix object goes out of scope, its memory is freed automatically.
The standard use case of creating a matrix is by calling the constructor Matrix::Matrix(m,n), which allocates a -element buffer in heap. In this case, matrix elements are contiguous double values placed in row-wise major. Internally, the library could have different memory representation scheme for special matrices. But only the standard use case is exposed to API level. Most of linear algebra functionality is implemented as member functions of the Matrix class in matrix.h.
1.2 Matrix Views
Very often, we will need work on a part or a transformed representation of an original matrix. For efficiency purpose, we introduce the concept of matrix views that can be conveniently used just like regular matrix objects. A matrix view in this case shares the same data storage as the original matrix. Hence it is important to make sure the original matrix object is alive while using its views.
There are two types of matrix views, Matrix::View and Matrix::ViewConst both as nested classes of the Matrix class. From a constant matrix object, one can only obtain a view of Matrix::ViewConst type, which ensures the content of an original matrix cannot be modified by the view. On the other hand, a non-constant matrix could be modified by its views. The following code example illuatrates this.
Attention needs be paid when we copy matrix and view objects using the assignment operator. The behavior of this assignment operation depends on the actual types of its left hand side(LHS) and right hand side(RHS) parameters.
-
1.
LHS is a matrix: This matrix object is first resized to the same size as the RHS object and then an elemen-wise copy (deep copy) is performed.
-
2.
LHS is a non-constant view:
-
(a)
RHS is a matrix: Because a view has a predefined size, an element-wise copy happens if both sides have the same size. Otherwise, an exception is thrown.
-
(b)
RHS is also a non-constant view: No deep copy is involved. LHS becomes the same view as the RHS view object.
-
(c)
RHS is a constant view: RHS is implicitly converted to constant matrix object, then 2.a applies.
-
(a)
-
3.
LHS is a constant view: RHS can only be a constant view as well. In this case, both LHS and RHS become the same view on the original matrix. All other RHS types will cause a comilation error.
The following code example reflects the behavior of view assignment operations described above.
Matrix views can be passed as arguments anywhere when a matrix object is expected. We can also call a matrix member function through a view object using the operator - notation.
1.3 Matrix Expression
A special type of matrix views are matrix expressions, introduced as the following matrix nested classes that are all derived from Matrix::View.
-
•
Matrix::Negation, unary expression;
-
•
Matrix::Addition, binary expression;
-
•
Matrix::Substraction, binary expression;
-
•
Matrix::Multiplication, binary expression;
These expressions are designed with efficiency consideration so as to provide some syntax sugar for programmers. The C++ language ensures arithmetic precedence when multiple operators appear in one single expression. We can write cleaner code using matrix expressions illustrated by the code example below:
Matrix expressions are special in the sense that each expression object has its own memory allocated to store the elements of expression result. A regular view on the contrary just shares the same data storage as its original matrix source.
1.4 Permutations
Matrix permutation reflects a sequence of row interchanges or column interchanges on a target matrix. Conceptually, we can think of permutation itself as a square matrix, where there is exactly one element equal to 1 in each row and column. All other elements are 0. When this permutation matrix left or right multiplies a regular matrix , effectively row or column interchange are applied to .
We represent matrix permutation as a nested class Matrix::Permutation in matrix.h. It can left multiply or right multiply a regular matrix object to generate a permutated view of the original matrix. The permutated view then can be used conveniently as a regular matrix.
Multiplication can also happen between two permutation objects. The end result is a new permutation object the reflects the accumulated row/column interchanges. Let and denote and interchange respectively. is to first perform row interchange on matrix and then followed by row interchange on the result of the first permutation. To perform the same sequence of interchanges on columns, we will use right multiplication .
Finally, when a permutation is applied to a constant matrix, it results to a Matrix::ViewConst object, a constant view of the matrix as the code example below.
1.5 Iterators
The elements of a matrix can be accessed via the Matrix::operator() member function. Each of such access is a separate function call, which could cause a lot of overhead if an application needs frequently access to matrix elements. In most cases though, we will be working with matrix objects that have elements contiguous in memory, i.e. directly on an original matrix object, or on a non-permutated view of the matrix. In these typical use cases, we can use matrix iterators for more efficient element access.
Matrix iterators are designed by the following mechanism inside the definition of Matrix class.
Essentially, an iterator support is a shared pointer of dense matrix traits that implements iterators by pointer arithmetic. When we obtain an iterator support from a non-permutated view or from a matrix directly, this shared pointer is not null. From that point on, we can simply use iterators to access matrix elements. Matrix iterator implementation consist of very light inline functions which are more efficient than Matrix::operator() member access function.
From a non-constant matrix we can obtain an iterator_support that allows us to both read and write the value of matrix elements. From a constant matrix, we obtain const_iterator_support that supports read access only. The code example below demonstrates how to use iterators.
2 Multiplication, Inverse and Factorization
Table 1 lists the matrix multiplication, inverse and factorization methods supported in NPL. They are all member functions of the Matrix class. Each of them takes matrix reference as output arguments. On function return, these arguments hold the result. The remark column in this table indicates whether it is wrapping third party’s optimized Blas/Lapack implementation or in-house implementation. For the former, the corresponding Blas/Lapack routine name is given. For in-house implementation, we give further details in separate sub sections.
Type | Function name | Remark |
---|---|---|
Multiplication | Matrix::multiply() | 3rd party, dgemm() |
Inverse | Matrix::inv() | 3rd party, dsytrf(), dpotrf(), dgetrf() |
Eigen | Matrix::eigen() | 3rd party, dsyev(), dgeev() |
SVD | Matrix::svd() | 3rd party, dgesvd() |
LU | Matrix::lu() | 3rd party, dgetrf() |
QR | Matrix::qr() | 3rd party, dgeqrf() |
Full QR | Matrix::full_qr() | In-house |
Cholesky | Matrix::cholesky() | In-house |
Reduced Cholesky | Matrix::cholesky_reduced() | In-house |
Modified Cholesky | Matrix::cholesky_modified() | In-house |
Partial Cholesky | Matrix::cholesky_partial() | In-house |
2.1 Inverse
Let be the matrix object that we want to compute its inverse . The Matrix::inv() function takes an optional Matrix::Form parameter that helps to choose what algorithm to use, based on the form of . There are three special forms that are relevant in this context.
2.1.1 Singular Form
First of all, if is singular, doesn’t exist. If is near singular, computing in alternative ways could possibly introduce too much numerical noise. In this situation, passing the “SINGULAR” matrix form parameter will tell the function to use SVD method underneath.
Let be the SVD of . Because and are unitary, we have
The elements in are non-negative and arranged in descending order. Let be the last element of that has condition number greater that the internal threshold , we truncate to get the first columns of , the first rows of and the first diagonals of to obtain sub matrices , and respectively. Finally we compute,
2.1.2 Symmetric Form
When is symmetric, we can set the Matrix::Form parameter to “SYMMETRIC”. This indicates to use factorization and compute.
The factorization precisely has floating point operation cost .
2.1.3 Positive Definite
When is positive definite, we can set the Matrix::Form parameter to “POS_DEFINITE”. This indicates to use Cholesky factorization which also has floating point operation cost but avoids the subsequent minor cost of floating point division to compute in the symmetric form case.
2.1.4 General Forms
This is the default case where is a general non-singular matrix. The inverse function will use factorization which has floating point operation cost.
2.2 Eigen
Given a matrix , its eigen decomposition is computed in two phases. The first phase involves the order of number of floating point operations. The second phase generally requires floating point operations in order to converge to machine precision. However, when is symmetric, the second phase can be computed much faster, even in the order of if only eigen values are required. Details of eigen algorithms can be found in chapter V in the book [1] by Lloyd N. Trefethen and David Bau. Our Matrix::eigen() function uses the Matrix::Form parameter to distinguish the symmetric case from the general case.
2.3 SVD
Given a generate matrix , the svd() function computes a compact form of singular value decomposition . Let . is resized to . is resized to and is resized to . The singular values in are non-negative and arranged in descending order.
2.4 QR vs. Full QR
Given a matrix , let . The QR decomposition computes a reduced size orthonormal basis matrix and a upper triangular matrix such that . In this reduced version, when , consists of only the first orthonormal basis in columns. The full QR decomposition on the other hand computes a full basis expansion matrix using Householder reflector algorithm.
The reduced size QR and full QR have the same behavior on matrix . When , is a upper triangular matrix. When , is an upper trapezoid matrix.
2.5 Variety of Cholesky
Our Cholesky factorization method is based on Algorithm 23.1 outlined in the book [1] by Lloyd N. Trefethen and David Bau. In addition, we also implemented a few other Cholesky decompositin methods of different flavors. They are used internally in our numerical optimization algorithms and could be relevant to other practical problems.
2.5.1 Reduced Cholesky
Very often in practice, we could be facing the problem of computing the Cholesky factor of , where is a matrix and is a matrix, . So has reduced rank, compared to .
If we know is positive definite, Matrix::cholesky_reduced() provides more efficient factorization than first computing matrix multiplications and then performing the standard Cholesky decomposition.
This reduced Cholesky factorization method stops and returns false as soon as it finds is actually indefinite. Note that when is indefinite, the reduced rank matrix could still possibly be positive definite. It is just that Matrix::cholesky_reduced() only succeeds when is positive definite.
2.5.2 Modified Cholesky
When a matrix is indefinite, Matrix::cholesky_modified() function finds a remedy diagonal matrix of minimized norm such that the modified matrix becomes positive definite. This function is based on the algorithm in the book by Gill, Murray and Wright [3], pp 108.
2.5.3 Partial Cholesky
For a general square matrix , we can explore the positive definiteness structure of the matrix by pivoting as much as possible and then compute the Cholesky factor of its leading sub matrix. Matrix::cholesky_partial() is an implementation of the partial Cholesky algorithm originally proposed in the paper by A. Forsgren, P. E. Gill, and W. Murray [4].
Let be the permutation matrix introduced by the pivoting operation in the algorithm. The permutated matrix has a block structure and can be decomposed as,
, where is a unit lower triangular matrix and is a positive definite diagonal matrix such that the leading sub matrix has Cholesky factor
3 Linear System
Tabel 2 lists the available methodology to solve a linear system , depending on both the size and the matrix form of . Please refer to matrix inverse section 2.1 for details of the Matrix::Form parameter.
Once again, the remark column in the table indicates whether it is an in-house implementation or wrapping third party’s Lapack implementation instead. For the later, the Lapack routine name is provided.
Form of | Methodology | Remark |
---|---|---|
QR | 3rd party, dgelsy() | |
is singular or | SVD | 3rd party, dgelsd() |
is symmetric | 3rd party, dsysv() | |
is positive definite | Cholesky | 3rd party, dposv() |
is a general matrix | LU | 3rd party, dgesv() |
is upper triangular | backward substitution | In-house |
is lower triangular | forward substitution | In-house |
is tridiagonal | backward/forward elimination | In-house |
is positive definite | conjugate gradient update | In-house |
and QR is known | range space solver | In-house |
is the result of column | ||
pivoting | LU update | In-house |
General guidelines and details of using the appropriate methodology are given in the following subsections.
3.1
When , the linear system is underdetermined and has infinite number of solutions. The QR factorization method effectively produces a minimum norm solution of . We first compute , where is the range space and is the null space.
Since is an orthonormal basis, we can write solution vector in terms of a range space component and null space component . Clearly,
, where can be found uniquely by forward substitution on equation .
Different choice of produces different solution of . In practice, we let to obtain a minimum norm solution. If the QR factorization is known, we can directly use the range-space solver LinearSystem::range_space_solve().
3.2 Singular or
When , the linear system is overdetermined and doesn’t have an exact solution. When is singular, the linear system either doesn’t have an exact solution or has infinite many solutions. The solver effectively computes the least square solution of using SVD.
3.3 Symmetric
For any symmetric matrix , there exists a factorization where is a lower triangular unit matrix and is a diagonal matrix. Further, if is positive definite, all elements in are positive, then has a Cholesky factor.
3.4 Large Positive Definite
When is a very large positive definite matrix, computing the exact solution of could be very time consuming. Instead, one might be satisfied with a “near” solution such that the residual is small enough.
In this situation, we can use conjugate gradient (CG) update method to generate a sequence of . For some initial guess , we compute the residual vector . The initial search direction is taken as . We always pass as input at the -th iteration. Subsequently, the CG method updates as improved “guess” with smaller residual . It also updates the search direction that should be taken at the next iteration. Users can stop the iterations when is considered small enough. In the worst case, the CG method ensures the exact soluton can be found after number of iterations. Our implementation is based on [2], chapter 5.
3.5 LU Update of
Suppose initially we know a LU decomposition of matrix . Subsequently, matrix is obtained by pivoting the -th column of with a vector . In this situation, the LU decomposition of can be efficiently computed by updating and . LinearSystem::LU provides a linear system solver to solve . Our implementation is based on [5].
References
- [1] Lloyd N. Trefethen and David Bau, III: Numerical Linear Algebra, SIAM, 1997
- [2] Jorge Nocedal and Stephen J. Wright: Numerical Optimization, Springer, 1999
- [3] Philip E. Gill, Walter Murray and Margaret H. Wright: Practical Optimization, Academic Press, 1981
- [4] Anders. Forsgren, Philip E. Gill and Walter Murray: Computing Modified Newton Directions Using a Partial Cholesky Factorization, SIAM J. SCI. COMPUT. Vol. 16, No. 1, pp. 139-150
- [5] J.J. Forrest and J. A. Tomlin: Updated Triangular Factors of the Basis to Maintain Sparsity in the Product from Simplex Method, Mathematical Programming 2(1972) 263-278