Numerical Linear Algebra
Mathwrist White Paper Series

Copyright ©Mathwrist LLC 2023
(January 20, 2023)
Abstract

This document presents a technical overview of the numerical linear algebra (LA) functionality implemented in Mathwrist’s C++ Numerical Programming Library (NPL). The LA portion of NPL is a mixture of in-house implementation and wrapping third party’s optimized Blas/Lapack functions in an object-oriented approach, i.e. as memeber functions of the Matrix class. For the later, we explicitly list those functions in relevant sections. For the in-house implementation part, we will give more implementation details.

1 Matrix, View and Iterator

1.1 Matrix

One fundamental built-in block of NPL is the Matrix class. We don’t introduce a separate data type for vectors. Instead, they are just 1-row or 1-column instances of the matrix class. Matrix row and column index starts from 0.

A matrix object is always associated with some underlying memory storage. The actual memory allocation scheme is handled by the library as implementation details. From a user perspective, it is sufficient to know that dynamic memory is allocated to store matrix elements. When a matrix object goes out of scope, its memory is freed automatically.

The standard use case of creating a m×n matrix is by calling the constructor Matrix::Matrix(m,n), which allocates a (m×n)-element buffer in heap. In this case, matrix elements are contiguous double values placed in row-wise major. Internally, the library could have different memory representation scheme for special matrices. But only the standard use case is exposed to API level. Most of linear algebra functionality is implemented as member functions of the Matrix class in matrix.h.

1.2 Matrix Views

Very often, we will need work on a part or a transformed representation of an original matrix. For efficiency purpose, we introduce the concept of matrix views that can be conveniently used just like regular matrix objects. A matrix view in this case shares the same data storage as the original matrix. Hence it is important to make sure the original matrix object is alive while using its views.

There are two types of matrix views, Matrix::View and Matrix::ViewConst both as nested classes of the Matrix class. From a constant matrix object, one can only obtain a view of Matrix::ViewConst type, which ensures the content of an original matrix cannot be modified by the view. On the other hand, a non-constant matrix could be modified by its views. The following code example illuatrates this.

Listing 1: Constant vs Non-constant Views
1    // create a 5 x 5 matrix of zero elements
2    Matrix m(5,5);
4    // Obtain a 3 x 3 leading sub matrix view
5    Matrix::View v = m.sub(0,0,3,3);
7    // Change the value from the view also
8    // changes the content of the original matrix.
9    v(1,1) = 0.5;
11    // Now the (1,1) element is 0.5.
12    std::cout << m(1,1) << std::endl;
14    // Let’s have a constant reference to matrix m.
15    const Matrix& mr = m;
17    // From a const matrix, we can only get a constant view.
18    Matrix::ViewConst vc = mr.row(0);
20    // This triggers compilation error.
21    vc(1,1) = 0.5;

Attention needs be paid when we copy matrix and view objects using the assignment operator. The behavior of this assignment operation depends on the actual types of its left hand side(LHS) and right hand side(RHS) parameters.

  1. 1.

    LHS is a matrix: This matrix object is first resized to the same size as the RHS object and then an elemen-wise copy (deep copy) is performed.

  2. 2.

    LHS is a non-constant view:

    1. (a)

      RHS is a matrix: Because a view has a predefined size, an element-wise copy happens if both sides have the same size. Otherwise, an exception is thrown.

    2. (b)

      RHS is also a non-constant view: No deep copy is involved. LHS becomes the same view as the RHS view object.

    3. (c)

      RHS is a constant view: RHS is implicitly converted to constant matrix object, then 2.a applies.

  3. 3.

    LHS is a constant view: RHS can only be a constant view as well. In this case, both LHS and RHS become the same view on the original matrix. All other RHS types will cause a comilation error.

The following code example reflects the behavior of view assignment operations described above.

Listing 2: View Assignment Operator
1    Matrix m(3, 3);
2    Matrix n(2, 2);
4    // m is a diagonal matrix with values = 0.5.
5    m.diagonalize(0.5);
7    // set n with some values.
8    n(0, 1) = 0.2;
9    n(1, 0) = 0.3;
11    // v is a leading sub matrix view of m.
12    Matrix::View v = m.sub(0, 0, 2, 2);
14    // RHS is a matrix, content of n is copied to m.
15    v = n;
17    // The leading 2 x 2 sub matrix of m has the same
18    // value as n now.
19    std::cout << m(0,0) << std::endl;
20    std::cout << m(0,1) << std::endl;
21    std::cout << m(1,0) << std::endl;
22    std::cout << m(1,1) << std::endl;
24    // u is the first row of n.
25    Matrix::View u = n.row(0);
27    // Now v is also the first row view of n.
28    v = u;
30    // Changing v changes n but not m.
31    v(0, 0) = 1;
32    std::cout << m(0,0) << std::endl; // value = 0
33    std::cout << n(0,0) << std::endl; // value = 1
35    // This should trigger an exception since the size
36    // of v is 1 x 2, different from the size of m.
37    bool got_error = false;
38    try
39    {
40       v = m;
41    }
42    catch (...)
43    {
44       got_error = true;
45    }
46    assert(got_error);

Matrix views can be passed as arguments anywhere when a matrix object is expected. We can also call a matrix member function through a view object using the operator -> notation.

Listing 3: Implicit Conversion of Views
1   Matrix m(5,5);
3   // v is a view of m itself
4   Matrix::View v = m.self();
6   // same way to access elements
7   m(0,0) = 1.0;
8   assert(v(0,0) == 1.0);
10   // using -> notation to access matrix member functions.
11   assert(v->row_size() == 5);
13   Matrix b(2,5);
14   Matrix c;
16   // pass v to a function that takes matrix as arguments.
17   b.multiply(v, c);

1.3 Matrix Expression

A special type of matrix views are matrix expressions, introduced as the following matrix nested classes that are all derived from Matrix::View.

  • Matrix::Negation, unary expression;

  • Matrix::Addition, binary expression;

  • Matrix::Substraction, binary expression;

  • Matrix::Multiplication, binary expression;

These expressions are designed with efficiency consideration so as to provide some syntax sugar for programmers. The C++ language ensures arithmetic precedence when multiple operators appear in one single expression. We can write cleaner code using matrix expressions illustrated by the code example below:

Listing 4: Matrix Expressions
1   // A single expression
2   Matrix a = -b * c + (d - e) * f;
4   // Equivalent sequence by individual function calls.
5   Matrix temp1 = b;
6   temp1.negate();
7   Matrix temp2;
8   temp1.multiply(c, temp2);
9   Matrix temp3 = d - e;
10   Matrix temp4;
11   temp3.multiply(f, temp4);
12   a = temp2 + temp4;

Matrix expressions are special in the sense that each expression object has its own memory allocated to store the elements of expression result. A regular view on the contrary just shares the same data storage as its original matrix source.

1.4 Permutations

Matrix permutation reflects a sequence of row interchanges or column interchanges on a target matrix. Conceptually, we can think of permutation itself as a square matrix, where there is exactly one element equal to 1 in each row and column. All other elements are 0. When this permutation matrix left or right multiplies a regular matrix M, effectively row or column interchange are applied to M.

We represent matrix permutation as a nested class Matrix::Permutation in matrix.h. It can left multiply or right multiply a regular matrix object to generate a permutated view of the original matrix. The permutated view then can be used conveniently as a regular matrix.

Listing 5: Matrix Permutation
1   // Create a matrix
2   //  | 1 2 |
3   //  | 3 4 |
4   Matrix a(2,2);
5   a(0,0) = 1;
6   a(0,1) = 2;
7   a(1,0) = 3;
8   a(1,1) = 4;
10   // A 2 x 2 permuation matrix representing
11   // (0,1) row or column interchange.
12   Matrix::Permutation p(2, 0, 1);
14   // (p * a) generates the (0,1) row
15   // interchanged view of a.
16   Matrix::View v = p * a;
18   // b has content
19   //  | 3 4 |
20   //  | 1 2 |
21   Matrix b = v;
23   // Changing v also changes a in
24   // permutated position.
25   v(0,1) = 1;
27   // a becomes to
28   //  | 1 2 |
29   //  | 3 1 |
31   // (a * p) generates the (0,1) column
32   // interchanged view of a.
33   v = a * p;
35   // c has content
36   //  | 2 1 |
37   //  | 1 3 |
38   Matrix c = v;

Multiplication can also happen between two permutation objects. The end result is a new permutation object the reflects the accumulated row/column interchanges. Let Pi,j and Pk,l denote (i,j) and (k,l) interchange respectively. Pk,l×Pi,j×M is to first perform (i,j) row interchange on matrix M and then followed by (k,l) row interchange on the result of the first permutation. To perform the same sequence of interchanges on columns, we will use right multiplication M×Pi,j×Pk,l.

Listing 6: Permutated View
1   // 3 x 3 permutation matrix representing
2   // (0,1) row or column interchange.
3   Matrix::Permutation p01(3,0,1);
5   // 3 x 3 permutation matrix representing
6   // (1,2) row or column interchange.
7   Matrix::Permutation p12(3,1,2);
9   // A target 3 x 4 matrix.
10   Matrix m(3, 4);
12   // First perform (0,1) row interchange and
13   // then (1,2) row interchange.
14   Matrix::View v = p12 * p01 * m;
16   // A target 4 x 3 matrix.
17   Matrix n(4, 3);
19   // First perform (0,1) column interchange and
20   // then (1,2) column interchange.
21   Matrix::View u = n * p01 * p12;

Finally, when a permutation is applied to a constant matrix, it results to a Matrix::ViewConst object, a constant view of the matrix as the code example below.

Listing 7: Constant Permutated View
1   Matrix a(2,2);
2   Matrix::Permutation p(2, 0, 1);
4   // Hold a constant reference on a.
5   const Matrix& ac = a;
7   // Permuation on a constant matrix can
8   // only result to a constant view.
9   Matrix::ViewConst v = p * a;

1.5 Iterators

The elements of a matrix can be accessed via the Matrix::operator() member function. Each of such access is a separate function call, which could cause a lot of overhead if an application needs frequently access to matrix elements. In most cases though, we will be working with matrix objects that have elements contiguous in memory, i.e. directly on an original matrix object, or on a non-permutated view of the matrix. In these typical use cases, we can use matrix iterators for more efficient element access.

Matrix iterators are designed by the following mechanism inside the definition of Matrix class.

Listing 8: Definition of Matrix Iterator
1 class Matrix
2 {
3   template<typename _Element>
4   struct _DenseTraits
5   {
6     typedef ftl::PointerJumpingIterator<_Element> row_iterator;
7     typedef ftl::PointerJumpingIterator<_Element> col_iterator;
8     ...
9     row_iterator row_begin(size_t i) const;
10     row_iterator row_end(size_t i) const;
11     col_iterator col_begin(size_t i) const;
12     col_iterator col_end(size_t i) const;
13   };
14 public:
15   typedef _DenseTraits<double> iterator_traits;
16   typedef _DenseTraits<const double> const_iterator_traits;
17   typedef std::shared_ptr<iterator_traits> iterator_support;
18   typedef std::shared_ptr<const_iterator_traits>
19     const_iterator_support;
21   ...
23   iterator_support it_support();
24   const_iterator_support it_support() const;
26 };

Essentially, an iterator support is a shared pointer of dense matrix traits that implements iterators by pointer arithmetic. When we obtain an iterator support from a non-permutated view or from a matrix directly, this shared pointer is not null. From that point on, we can simply use iterators to access matrix elements. Matrix iterator implementation consist of very light inline functions which are more efficient than Matrix::operator() member access function.

From a non-constant matrix we can obtain an iterator_support that allows us to both read and write the value of matrix elements. From a constant matrix, we obtain const_iterator_support that supports read access only. The code example below demonstrates how to use iterators.

Listing 9: Use Matrix Iterators
1   Matrix m(5,5);
3   // Elements of m are contiguous in memory.
4   // Hence an iterator support is available.
5   Matrix::iterator_support its = m.it_support();
7   // its should be not null.
8   assert(its);
10   // Change the element value of the first row.
11   Matrix::iterator_traits::row_iterator
12     it = its->row_begin(0);
13   Matrix::iterator_traits::row_iterator
14     it_end = its->row_end(0);
16   for (; it != it_end; ++it)
17     (*it) = 0.5;
19   // constant matrix has read-only iterator support.
20   const Matrix& mr = m;
21   Matrix::const_iterator_support itcs = mr.it_support();
22   assert(itcs);
24   Matrix::const_iterator_traits::row_iterator
25     itc = itcs->col_begin(0);
26   std::cout << *itc << std::endl; // read-only
28   // A permutated view doesn’t have iterator support,
29   // because its elements are non longer contiguous in memory.
30   Matrix::Permutation p(5, 0, 1);
31   Matrix::View v = p * m;
33   // Now the iterator support is a null pointer.
34   its = v->it_support();
35   assert(its.get() == 0);

2 Multiplication, Inverse and Factorization

Table 1 lists the matrix multiplication, inverse and factorization methods supported in NPL. They are all member functions of the Matrix class. Each of them takes matrix reference as output arguments. On function return, these arguments hold the result. The remark column in this table indicates whether it is wrapping third party’s optimized Blas/Lapack implementation or in-house implementation. For the former, the corresponding Blas/Lapack routine name is given. For in-house implementation, we give further details in separate sub sections.

Table 1: Matrix Multiplication/Factorization Functions
Type Function name Remark
Multiplication Matrix::multiply() 3rd party, dgemm()
Inverse Matrix::inv() 3rd party, dsytrf(), dpotrf(), dgetrf()
Eigen Matrix::eigen() 3rd party, dsyev(), dgeev()
SVD Matrix::svd() 3rd party, dgesvd()
LU Matrix::lu() 3rd party, dgetrf()
QR Matrix::qr() 3rd party, dgeqrf()
Full QR Matrix::full_qr() In-house
Cholesky Matrix::cholesky() In-house
Reduced Cholesky Matrix::cholesky_reduced() In-house
Modified Cholesky Matrix::cholesky_modified() In-house
Partial Cholesky Matrix::cholesky_partial() In-house

2.1 Inverse

Let 𝐀 be the n×n matrix object that we want to compute its inverse 𝐀-1. The Matrix::inv() function takes an optional Matrix::Form parameter that helps to choose what algorithm to use, based on the form of 𝐀. There are three special forms that are relevant in this context.

2.1.1 Singular Form

First of all, if 𝐀 is singular, 𝐀-1 doesn’t exist. If 𝐀 is near singular, computing 𝐀-1 in alternative ways could possibly introduce too much numerical noise. In this situation, passing the “SINGULAR” matrix form parameter will tell the function to use SVD method underneath.

Let 𝐔𝐒𝐕T be the SVD of 𝐀. Because 𝐔 and 𝐕 are unitary, we have

𝐀-1=𝐕𝐒-1𝐔T

The elements in 𝐒 are non-negative and arranged in descending order. Let Si be the last element of 𝐒 that has condition number greater that the internal threshold 1.e-12, we truncate to get the first i columns of 𝐕, the first i rows of 𝐔T and the first i diagonals of 𝐒 to obtain sub matrices 𝐕i, 𝐔iT and 𝐒i respectively. Finally we compute,

𝐀-1=𝐕i𝐒i-1𝐔iT

2.1.2 Symmetric Form

When 𝐀 is symmetric, we can set the Matrix::Form parameter to “SYMMETRIC”. This indicates to use 𝐋𝐃𝐋T factorization and compute.

𝐀 = 𝐋𝐃𝐋T
𝐀-1 = 𝐋𝐃-1𝐋T

The 𝐋𝐃𝐋T factorization precisely has floating point operation cost 13n3.

2.1.3 Positive Definite

When 𝐀 is positive definite, we can set the Matrix::Form parameter to “POS_DEFINITE”. This indicates to use Cholesky factorization which also has floating point operation cost 13n3 but avoids the subsequent minor cost of n floating point division to compute 𝐃-1 in the symmetric form case.

2.1.4 General Forms

This is the default case where 𝐀 is a general non-singular matrix. The inverse function will use 𝐋𝐔 factorization which has 23n3 floating point operation cost.

2.2 Eigen

Given a n×n matrix 𝐀, its eigen decomposition is computed in two phases. The first phase involves the order of 𝒪(n3) number of floating point operations. The second phase generally requires 𝒪(n3) floating point operations in order to converge to machine precision. However, when 𝐀 is symmetric, the second phase can be computed much faster, even in the order of 𝒪(n2) if only eigen values are required. Details of eigen algorithms can be found in chapter V in the book [1] by Lloyd N. Trefethen and David Bau. Our Matrix::eigen() function uses the Matrix::Form parameter to distinguish the symmetric case from the general case.

2.3 SVD

Given a generate m×n matrix 𝐀, the svd() function computes a compact form of singular value decomposition 𝐀=𝐔𝐒𝐕T. Let r=min(m,n). 𝐔 is resized to m×r. 𝐒 is resized to r×1 and 𝐕 is resized to n×r. The singular values in 𝐒 are non-negative and arranged in descending order.

2.4 QR vs. Full QR

Given a m×n matrix 𝐀, let r=min(m,n). The QR decomposition computes a reduced size m×r orthonormal basis matrix 𝐐 and a r×n upper triangular matrix 𝐑 such that 𝐐𝐑=𝐀. In this reduced version, when m>n, 𝐐 consists of only the first n orthonormal basis in columns. The full QR decomposition on the other hand computes a full m×m basis expansion matrix 𝐐 using Householder reflector algorithm.

The reduced size QR and full QR have the same behavior on matrix 𝐑. When m>n, 𝐑 is a n×n upper triangular matrix. When m<n, 𝐑 is m×n an upper trapezoid matrix.

2.5 Variety of Cholesky

Our Cholesky factorization method is based on Algorithm 23.1 outlined in the book [1] by Lloyd N. Trefethen and David Bau. In addition, we also implemented a few other Cholesky decompositin methods of different flavors. They are used internally in our numerical optimization algorithms and could be relevant to other practical problems.

2.5.1 Reduced Cholesky

Very often in practice, we could be facing the problem of computing the Cholesky factor of 𝐖T𝐇𝐖, where 𝐇 is a n×n matrix and 𝐖 is a n×m matrix, m<n. So 𝐖T𝐇𝐖 has reduced rank, compared to 𝐇.

If we know 𝐇 is positive definite, Matrix::cholesky_reduced() provides more efficient factorization than first computing matrix multiplications and then performing the standard Cholesky decomposition.

This reduced Cholesky factorization method stops and returns false as soon as it finds 𝐇 is actually indefinite. Note that when 𝐇 is indefinite, the reduced rank matrix 𝐖T𝐇𝐖 could still possibly be positive definite. It is just that Matrix::cholesky_reduced() only succeeds when 𝐇 is positive definite.

2.5.2 Modified Cholesky

When a matrix 𝐇 is indefinite, Matrix::cholesky_modified() function finds a remedy diagonal matrix 𝐄 of minimized norm 𝐄 such that the modified matrix 𝐇¯=𝐇+𝐄 becomes positive definite. This function is based on the algorithm in the book by Gill, Murray and Wright [3], pp 108.

2.5.3 Partial Cholesky

For a general square matrix 𝐇, we can explore the positive definiteness structure of the matrix by pivoting as much as possible and then compute the Cholesky factor of its leading sub matrix. Matrix::cholesky_partial() is an implementation of the partial Cholesky algorithm originally proposed in the paper by A. Forsgren, P. E. Gill, and W. Murray [4].

Let 𝐏 be the permutation matrix introduced by the pivoting operation in the algorithm. The permutated matrix has a block structure and can be decomposed as,

𝐏T𝐇𝐏=(𝐇11𝐇12𝐇21𝐇22)=(𝐋11𝐋21𝐈)(𝐃𝐊)(𝐋11T𝐋21T𝐈)

, where 𝐋11 is a unit lower triangular matrix and 𝐃 is a positive definite diagonal matrix such that the leading sub matrix 𝐇11 has Cholesky factor

𝐇11=𝐋11𝐃𝐋11T

3 Linear System

Tabel 2 lists the available methodology to solve a linear system 𝐀𝐱=𝐛, depending on both the (m,n) size and the matrix form of 𝐀. Please refer to matrix inverse section 2.1 for details of the Matrix::Form parameter.

Once again, the remark column in the table indicates whether it is an in-house implementation or wrapping third party’s Lapack implementation instead. For the later, the Lapack routine name is provided.

Table 2: Linear System Solvers
Form of 𝐀 Methodology Remark
m<n QR 3rd party, dgelsy()
𝐀 is singular or m>n SVD 3rd party, dgelsd()
𝐀 is symmetric 𝐋𝐃𝐋T 3rd party, dsysv()
𝐀 is positive definite Cholesky 3rd party, dposv()
𝐀 is a general matrix LU 3rd party, dgesv()
𝐀 is upper triangular backward substitution In-house
𝐀 is lower triangular forward substitution In-house
𝐀 is tridiagonal backward/forward elimination In-house
𝐀 is positive definite conjugate gradient update In-house
m<n and QR is known range space solver In-house
𝐀i is the result of column
pivoting 𝐀i-1=𝐋i-1𝐔i-1 LU update In-house

General guidelines and details of using the appropriate methodology are given in the following subsections.

3.1 m<n

When m<n, the linear system is underdetermined and has infinite number of solutions. The QR factorization method effectively produces a minimum norm solution of 𝐱. We first compute 𝐀T=𝐐𝐑=(𝐘𝐙)𝐑, where 𝐘 is the n×m range space and 𝐙 is the n×(n-m) null space.

Since 𝐐 is an orthonormal basis, we can write solution vector 𝐱=𝐘𝐱Y+𝐙𝐱Z in terms of a range space component 𝐱Y and null space component 𝐱Z. Clearly,

𝐀𝐱=𝐀𝐘𝐱Y=𝐑T𝐱Y

, where 𝐱Y can be found uniquely by forward substitution on equation 𝐑T𝐱Y=𝐛.

Different choice of 𝐱Z produces different solution of 𝐱. In practice, we let 𝐱Z=0 to obtain a minimum norm solution. If the QR factorization is known, we can directly use the range-space solver LinearSystem::range_space_solve().

3.2 Singular 𝐀 or m>n

When m>n, the linear system is overdetermined and doesn’t have an exact solution. When 𝐀 is singular, the linear system either doesn’t have an exact solution or has infinite many solutions. The solver effectively computes the least square solution of 𝐱 using SVD.

3.3 Symmetric 𝐀

For any symmetric matrix 𝐀, there exists a 𝐋𝐃𝐋T factorization where 𝐋 is a lower triangular unit matrix and 𝐃 is a diagonal matrix. Further, if 𝐀 is positive definite, all elements in 𝐃 are positive, 𝐀 then has a Cholesky factor.

3.4 Large Positive Definite 𝐀

When 𝐀 is a very large n×n positive definite matrix, computing the exact solution of 𝐱 could be very time consuming. Instead, one might be satisfied with a “near” solution 𝐱¯ such that the residual 𝐫=𝐀𝐱¯-𝐛 is small enough.

In this situation, we can use conjugate gradient (CG) update method to generate a sequence of (𝐱k,𝐩k,𝐫k). For some initial guess 𝐱0, we compute the residual vector 𝐫0=𝐀𝐱0-𝐛. The initial search direction 𝐩0 is taken as 𝐩0=-𝐫0. We always pass (𝐱k,𝐩k,𝐫k) as input at the k-th iteration. Subsequently, the CG method updates 𝐱k+1 as improved “guess” with smaller residual 𝐫k+1. It also updates the search direction 𝐩k+1 that should be taken at the next iteration. Users can stop the iterations when 𝐫k is considered small enough. In the worst case, the CG method ensures the exact soluton can be found after n number of iterations. Our implementation is based on [2], chapter 5.

3.5 LU Update of 𝐀

Suppose initially we know a LU decomposition of matrix 𝐀0=𝐋0𝐔0. Subsequently, matrix 𝐀i is obtained by pivoting the j-th column of 𝐀i-1 with a vector 𝐩i. In this situation, the LU decomposition of 𝐀i can be efficiently computed by updating 𝐋i-1 and 𝐔i-1. LinearSystem::LU provides a linear system solver to solve 𝐀i𝐱=𝐛. Our implementation is based on [5].

References

  • [1] Lloyd N. Trefethen and David Bau, III: Numerical Linear Algebra, SIAM, 1997
  • [2] Jorge Nocedal and Stephen J. Wright: Numerical Optimization, Springer, 1999
  • [3] Philip E. Gill, Walter Murray and Margaret H. Wright: Practical Optimization, Academic Press, 1981
  • [4] Anders. Forsgren, Philip E. Gill and Walter Murray: Computing Modified Newton Directions Using a Partial Cholesky Factorization, SIAM J. SCI. COMPUT. Vol. 16, No. 1, pp. 139-150
  • [5] J.J. Forrest and J. A. Tomlin: Updated Triangular Factors of the Basis to Maintain Sparsity in the Product from Simplex Method, Mathematical Programming 2(1972) 263-278