Skip to the content.
« Back to homepage

Linear Algebra

Linear Algebra is a key mathematics field behind computer science and numerical computating. A thorough coverage of this topic is apparently beyond the scope of this book. Please refer to [@strang2006linear] for this subject. In this chapter we will follow the basic structure of this book, first giving you a overall picture, then focussing on how to use the functions provided in Owl to solve problems and better understand some basic linear algebra concepts.

The high level APIs of Linear Algebra are provided in the Linalg module. The module provides four types of number types: single precision, double precision, complex single precision, and complex double precision. They are included in Linalg.S, Linalg.D, Linalg.C and Linalg.Z modules respectively. Besides, the Linalg.Generic can do everything that S/D/C/Z can but needs some extra type information.

Vectors and Matrices

The fundamental problem of linear algebra: solving linear equations. This is more efficiently expressed with vectors and matrices. Therefore, we need to first get familiar with these basic structures in Owl.

Similar to the Linalg module, all the matrix functions can be accessed from the Dense.Matrix module, and support four different type of modules. The Mat module is an alias of Dense.Matrix.D. Except for some functions such as re, most functions are shared by these four submodules. Note that that matrix module is actually built on the Ndarray module, and thus the supported functions are quite similar, and matrices and ndarrays can interoperate with each other. The vectors are expressed using Matrix in Owl.

Creating Matrices

There are multiple functions to help you in creating an initial matrix to start with.

  Mat.empty 5 5;;        (* create a 5 x 5 matrix with initialising elements *)
  Mat.create 5 5 2.;;    (* create a 5 x 5 matrix and initialise all to 2. *)
  Mat.zeros 5 5;;        (* create a 5 x 5 matrix of all zeros *)
  Mat.ones 5 5;;         (* create a 5 x 5 matrix of all ones *)
  Mat.eye 5;;            (* create a 5 x 5 identity matrix *)
  Mat.uniform 5 5;       (* create a 5 x 5 random matrix of uniform distribution *)
  Mat.uniform_int 5 5;;  (* create a 5 x 5 random integer matrix *)
  Mat.sequential 5 5;;   (* create a 5 x 5 matrix of sequential integers *)
  Mat.semidef 5;;        (* create a 5 x 5 random semi-definite matrix *)
  Mat.gaussian 5 5;;     (* create a 5 x 5 random Gaussian matrix *)
  Mat.bernoulli 5 5      (* create a 5 x 5 random Bernoulli  matrix *)

Owl can create some special matrices with specific properties. For example, a magic square is a n x n matrix (where n is the number of cells on each side) filled with distinct positive integers in the range $1,2,…,n^{2}$ such that each cell contains a different integer and the sum of the integers in each row, column and diagonal is equal.

# let x = Mat.magic 5;;
val x : Mat.mat =

   C0 C1 C2 C3 C4
R0 17 24  1  8 15
R1 23  5  7 14 16
R2  4  6 13 20 22
R3 10 12 19 21  3
R4 11 18 25  2  9

We can validate this property with the following code. The summation of all the elements on each column is 65.

# Mat.sum_rows x;;
- : Mat.mat =
   C0 C1 C2 C3 C4
R0 65 65 65 65 65

You can try the similar sum_cols. The summation of all the diagonal elements is also 65.

# Mat.trace x;;
- : float = 65.

Accessing Elements

Similar to ndarray, the matrix module support set and get to access and modify matrix elements. The only difference is that instead of accessing according to an array, an element in matrix is accessed using two integers.

let x = Mat.uniform 5 5;;
Mat.set x 1 2 0.;;             (* set the element at (1,2) to 0. *)
Mat.get x 0 3;;                (* get the value of the element at (0,3) *)

For dense matrices, i.e., Dense.Matrix.*, you can also use shorthand .%{i; j} to access elements.

# open Mat;;
# x.%{1;2} <- 0.;;         (* set the element at (1,2) to 0. *);;
- : unit = ()
# let a = x.%{0;3};;       (* get the value of the element at (0,3) *);;
val a : float = 0.563556290231645107

The modifications to a matrix using set are in-place. This is always true for dense matrices. For sparse matrices, the thing can be complicated because of performance issues.

We can take some rows out of x by calling rows function. The selected rows will be used to assemble a new matrix. Similarly, we can also select some columns using cols.

Iterate, Map, Fold, and Filter

In reality, a matrix usually represents a collections of measurements (or points). We often need to go through these data over and over again for various reasons. Owl provides very convenient functions to help you to iterate these elements. There is one thing I want to emphasise: Owl uses row-major matrix for storage format in the memory, which means accessing rows are much faster than those column operations.

Let’s first create a 4 x 6 matrix of sequential numbers as below.

let x = Mat.sequential 4 6;;

You should be able to see the following output in your utop.

     C0 C1 C2 C3 C4 C5
  R0  1  2  3  4  5  6
  R1  7  8  9 10 11 12
  R2 13 14 15 16 17 18
  R3 19 20 21 22 23 24

Iterating all the elements can be done by using iteri function. The following example prints out all the elements on the screen.

Mat.iteri_2d (fun i j a -> Printf.printf "(%i,%i) %.1f\n" i j a) x;;

If you want to create a new matrix out of the existing one, you need mapi and map function. E.g., we create a new matrix by adding one to each element in x.

# ((+.) 1.) x;;
- : Mat.mat =

   C0 C1 C2 C3 C4 C5
R0  1  2  3  4  5  6
R1  7  8  9 10 11 12
R2 13 14 15 16 17 18
R3 19 20 21 22 23 24

Iterating rows and columns are similar to iterating elements, by using iteri_rows, mapi_rows, etc. The following example prints the sum of each row.

  Mat.iteri_rows (fun i r ->
    Printf.printf "row %i: %.1f\n" i (Mat.sum' r)
  ) x;;

You can also fold elements, rows, and columns. We can calculate the summation of all column vectors by using fold_cols function.

  let v = Mat.(zeros (row_num x) 1) in
  Mat.(fold_cols add v x);;

It is also possible to change a specific row or column. E.g., we make a new matrix out of x by setting row 2 to zero vector.

  Mat.map_at_row (fun _ -> 0.) x 2;;

The filter functions is also commonly used in manipulating matrix. Here are some examples. The first one is to filter out the elements in x greater than 20.

# Mat.filter ((<) 20.) x;;
- : int array = [|21; 22; 23|]

You can compare the next example which filters out the two-dimensional indices.

# Mat.filteri_2d (fun i j a -> a > 20.) x;;
- : (int * int) array = [|(3, 3); (3, 4); (3, 5)|]

The second example is to filter out the rows whose summation is less than 22.

# Mat.filter_rows (fun r -> Mat.sum' r < 22.) x;;
- : int array = [|0|]

If we want to check whether there is one or (or all) element in x satisfying some condition, then

  Mat.exists ((>) 5.) x;;      (* is there someone smaller than 5. *)
  Mat.not_exists ((>) 5.) x;;  (* is no one smaller than 5. *)
  Mat.for_all ((>) 5.) x;;     (* is everyone smaller than 5. *)

Math Operations

The math operations can be generally categorised into several groups.

Comparison Suppose we have two matrices:

let x = Mat.uniform 2 2;;
let y = Mat.uniform 2 2;;

We can compare the relationship of x and y element-wisely as below.

  Mat.(x = y);;    (* is x equal to y *)
  Mat.(x <> y);;   (* is x unequal to y *)
  Mat.(x > y);;    (* is x greater to y *)
  Mat.(x < y);;    (* is x smaller to y *)
  Mat.(x >= y);;   (* is x not smaller to y *)
  Mat.(x <= y);;   (* is x not greater to y *)

All aforementioned infix have their corresponding functions in the module, e.g., =@ has Mat.is_equal.

Matrix Arithmetic

The arithmetic operation also heavily uses infix. Similar to matrix comparison, each infix has its corresponding function in the module.

  Mat.(x + y);;    (* add two matrices *)
  Mat.(x - y);;    (* subtract y from x *)
  Mat.(x * y);;    (* element-wise multiplication *)
  Mat.(x / y);;    (* element-wise division *)
  Mat.(x *@ y);;    (* dot product of x and y *)

If you do match between a matrix and a scalar value, you need to be careful about their order. Please see the examples below. In the following examples, x is a matrix as we used before, and a is a float scalar value.

  let a = 2.5;;

  Mat.(x +$ a);;    (* add a to every element in x *)
  Mat.(a $+ x);;    (* add a to every element in x *)

Similarly, we have the following examples for other math operations.

  Mat.(x -$ a);;    (* sub a from every element in x *)
  Mat.(a $- x);;
  Mat.(x *$ a);;    (* mul a with every element in x *)
  Mat.(a $* x);;
  Mat.(x /$ a);;    (* div a to every element in x *)
  Mat.(a $/ x);;
  Mat.(x **$ a);;   (* power of every element in x *)

There are some ready-made math functions such as Mat.log and Mat.abs ect. to ease your life when operating matrices. These math functions apply to every element in the matrix.

There are other functions such as concatenation:

  Mat.(x @= y);;    (* concatenate x and y vertically *)
  Mat.(x @|| y);;   (* concatenate x and y horizontally *)

Gaussian Elimination

Solving linear equations systems is the core problem in Linear Algebra and is frequently used in scientific computation. Gaussian Elimination is a classic method to do that. With a bit of techniques, elimination works surprisingly well in modern numerical libraries as one way of implementation. Here is a simple example.

\(2x_1 + 2x_2 + 2x_3 = 4\) \(2x_1 + 2x_2 + 3x_3 = 5\) {#eq:linear-algebra:gauss01} \(3w_1 + 4x_2 + 5x_3 = 7\)

Divide the first equation by 2:

\(x_1 + x_2 + x_3 = 2\) \(2x_1 + 2x_2 + 3x_3 = 5\) {#eq:linear-algebra:gauss02} \(3w_1 + 4x_2 + 5x_3 = 7\)

Multiply the first equation by -2, then add it to the second one. Also, multiply the first equation by -3, then add it to the third one. We have:

\(x_1 + x_2 + x_3 = 2\) \(x_3 = 1\) {#eq:linear-algebra:gauss03} \(x_2 + 2x_3 = 1\)

Finally, swap the second and third line:

\(x_1 + x_2 + x_3 = 2\) \(x_2 + 2x_3 = 1\) {#eq:linear-algebra:gauss04} \(x_3 = 1\)

Here $x_3 = 1$, and we can put it back in the second equation and get $x_2 = -1$. Put both back to the first equation and we have $x_1 = 2$

This process demonstrate the basic process of elimination: eliminate unknown variables until this group of linear equations is easy to solve, and then do the back-substitution. There are three kinds of basic operations we can use: multiplication, adding one line to another, and swap two lines.

The starting [@eq:linear-algebra:gauss01] can be more concisely expressed with vector:

\[x_1\left[\begin{matrix}2\\2\\3\end{matrix} \right] + x_2\left[\begin{matrix}2\\2\\4\end{matrix} \right] + x_3\left[\begin{matrix}2\\3\\5\end{matrix} \right] = \left[\begin{matrix}4\\5\\7\end{matrix} \right]\]

or it can be expressed as $Ax=b$ using matrix notation.

\[\left[\begin{matrix}2 & 2 & 2\\2 & 2 & 3\\3 & 4 & 5\end{matrix} \right] \left[\begin{matrix}x_1\\x_2\\x_3\end{matrix} \right] = \left[\begin{matrix}4\\5\\7\end{matrix} \right] \Longrightarrow \left[\begin{matrix}x_1\\x_2\\x_3\end{matrix} \right] = \left[\begin{matrix}2\\-1\\1\end{matrix} \right]\]

Here A is a matrix, b is a column vector, and x is the unknown vector. The matrix notation is often used to describe the linear equation systems as a concise way.

LU Factorisation

Let’s check the gaussian elimination example again. The final form in [@eq:linear-algebra:gauss04] can be expressed with the matrix notation as:

\[\left[\begin{matrix}1 & 1 & 1\\0 & 1 & 2\\0 & 0 & 1\end{matrix} \right]\]

Here all the elements below the diagonal of this square matrix is zero. Such matrix is called an upper triangular matrix, usually denoted by $U$. Similarly, a square matrix that all the elements below the diagonal of this square matrix is zero is called lower triangular matrix, denoted by $L$. We can use the is_triu and is_tril to verify if a matrix is triangular.

The diagonal elements of $U$ are called pivots. The i-th pivot is the coefficient of the i-th variable in the i-th equation at the i-th step during the elimination.

In general, a square matrix can often be factorised into the dot product of a lower and a upper triangular matrices: $A = LU$. It is called the LU factorisation. It embodies the process of Gauss elimination. Back to the initial problem of solving the linear equation $Ax=b$. One reason the LU Factorisation is important is that if the matrix A in $Ax=b$ is triangular, then solving it would be straightforward, as we have seen in the previous example. Actually, we can use triangular_solve to efficiently solve the linear equations if we already know that the matrix is triangular.

For a normal square matrix that can be factorised into $LU$, we can change $Ax=b$ to $LUx=b$. First we can find column vector $c$ so that $Lc=b$, then we can find $x$ so that $Ux=c$. Both triangular equations are easy to solve.

We use the lu function to perform the LU factorisation. Let’s use the previous example.

# let a = [|2.;2.;2.;2.;2.;3.;3.;4.;5.|];;
val a : float array = [|2.; 2.; 2.; 2.; 2.; 3.; 3.; 4.; 5.|]
# let a = Arr.of_array a [|3; 3|];;
val a : Arr.arr =
   C0 C1 C2
R0  2  2  2
R1  2  2  3
R2  3  4  5

# let l, u, p = a;;
val l : Linalg.D.mat =

         C0 C1 C2
R0        1  0  0
R1 0.666667  1  0
R2 0.666667  1  1

val u : Linalg.D.mat =

   C0        C1        C2
R0  3         4         5
R1  0 -0.666667 -0.333333
R2  0         0        -1

val p : Linalg.D.int32_mat =
   C0 C1 C2
R0  3  2  3

The first two returned matrix are the lower and upper triangular matrices. However, if we try to check the correctness of this factorisation with dot product, the result does not fit:

# let a' = l u;;
val a' : Mat.mat =
   C0 C1 C2
R0  3  4  5
R1  2  2  3
R2  2  2  2

# a' = a;;
- : bool = false

It turns out that we need to some extra row exchange to get the right answer. That’s because the row exchange is required in certain cases, such as when the number we want to use as the pivot could be zero. This process is called pivoting. It is closely related to the numerical computation stability. Choosing the improper pivots can lead to wrong linear system solution. It can be expressed with a permutation matrix that has the same rows as the identity matrix, each row and column has exactly one “1” element. The full LU factorisation can be expressed as:

\[PA = LU.\]
# let p = Mat.of_array  [|0.;0.;1.;0.;1.;0.;1.;0.;0.|] 3 3;;
val p : Mat.mat =
   C0 C1 C2
R0  0  0  1
R1  0  1  0
R2  1  0  0

# p a = l u;;
- : bool = true

How do we translate the third output, the permutation vector, to the required permutation matrix? Each element $p_i$ in the vector represents a updated identity matrix. On this identity matrix, we set (i, i) and ($p_i$, $p_i$) to zero, and then (i, $p_i$) and ($p_i$, i) to one. Multiply these $n$ matrices, we can get the permutation matrix $P$. Here is a brief implementation of this process in OCaml:

let perm_vec_to_mat vec =
    let n = Array.length vec in
    let mat = ref (Mat.eye n) in
    for i = n - 1 downto 0 do
      let j = vec.(i) in
      let a = Mat.eye n in
      Arr.set a [| i; i |] 0.;
      Arr.set a [| j; j |] 0.;
      Arr.set a [| i; j |] 1.;
      Arr.set a [| j; i |] 1.;
      mat := a !mat

Note that there is more than one way to do the LU factorisation. For example, for the same matrix, we can have:

\[\left[\begin{matrix}1 & 0 & 0\\0 & 0 & 1\\0 & 1 & 0\end{matrix} \right] \left[\begin{matrix}2 & 2 & 2\\2 & 2 & 3\\3 & 4 & 5\end{matrix} \right] = \left[\begin{matrix}1 & 0 & 0\\1.5 & 1 & 0\\1 & 0 & 1\end{matrix} \right] \left[\begin{matrix}2 & 2 & 2\\0 & 1 & 2\\0 & 0 & 1\end{matrix} \right]\]

Inverse and Transpose

The concept of inverse matrix is related with the identity matrix, which can be built with $Mat.eye n$, where n is the size of the square matrix. The identity matrix is a special form of Diagonal Matrix, which is a square matrix that only contains non-zero element along its diagonal. You can check if a matrix is diagonal with is_diag function.

Mat.eye 5 |> Linalg.D.is_diag

The inverse of a $n$ by $n$ square matrix $A$ is denoted by $A^{-1}, so that $: $AA^{-1} = I_n$. Note that not all square matrix has inverse.
There are many sufficient and necessary conditions to decide if $A$ is invertible, one of them is that A has $n$ pivots.

We use function inv to do the inverse operation. It’s straightforward and easy to verify according to the definition. Here we use the semidef function to produce a matrix that is certainly invertible.

# let x = Mat.semidef 5;;
val x : Mat.mat =

        C0       C1       C2      C3       C4
R0 2.56816   1.0088  1.57793  2.6335  2.06612
R1  1.0088 0.441613 0.574465 1.02067 0.751004
R2 1.57793 0.574465  2.32838 2.41251  2.13926
R3  2.6335  1.02067  2.41251 3.30477  2.64877
R4 2.06612 0.751004  2.13926 2.64877  2.31124

# let y = Linalg.D.inv x;;
val y : Linalg.D.mat =

         C0       C1       C2       C3       C4
R0   12.229  -15.606  6.12229 -1.90254 -9.34742
R1  -15.606  33.2823 -4.01361 -4.96414  12.5403
R2  6.12229 -4.01361  7.06372 -2.62899 -7.69399
R3 -1.90254 -4.96414 -2.62899   8.1607 -3.60533
R4 -9.34742  12.5403 -7.69399 -3.60533  15.9673

# Mat.(x *@ y =~ eye 5);;
- : bool = true

The next frequently used special matrix is the Transpose Matrix. Denoted by $A^T$, its $i$th row is taken from the $i$-th column of the original matrix A. It has properties such as $(AB)^T=B^T~A^T$. We can check this property using the matrix function Mat.transpose. Note that this function is deemed basic ndarray operations and is not included in the Linalg module.

# let flag =
    let a = Mat.uniform 4 4 in
    let b = Mat.uniform 4 4 in
    let m1 = Mat.(dot a b |> transpose) in
    let m2 = Mat.(dot (transpose b) (transpose a)) in
    Mat.(m1 =~ m2);;
val flag : bool = true

A related special matrix is the Symmetric Matrix, which equals to its own transpose. This simple test can be done with the is_symmetric function.

Vector Spaces

We have talked about solving the $Ax=b$ linear equations with elimination, and A is a square matrix. Now we need to further discuss, how do we know if there exists one or maybe more than one solution. To answer such question, we need to be familiar with the concepts of vector space.

A vector space, denoted by $R^n$, contains all the vectors that has $n$ elements. In this vector space we have the add and multiplication operation. Applying them to the vectors is called linear combination. Then a subspace in a vector space is a non-empty set that linear combination of the vectors in this subspace still stays in the same subspace.

There are four fundamental subspaces concerning solving linear systems $Ax=b$, where $A$ is a $m$ by $n$ matrix. The column space consists of all the linear combinations of the columns of A. It is a subspace of $R^m$. Similarly, the row space consists of all the linear combinations of the rows of A. The nullspace contains all the vectors $x$ so that $Ax=0$, denoted by $N(A)$. It is a subspace of $R^n$. The left nullspace is similar. It is the nullspace of $A^T$.

Rank and Basis

In the Gaussian Elimination section, we assume an ideal situation: the matrix A is $n\times~n$ square, and we assume that there exists one solution. But that does not happen every time. In many cases $A$ is not an square matrix. It is possible that these $m$ equations are not enough to solve a $n$-variable linear system when $m < n$. Or there might not exist a solution when $m > n$. Besides, even it is a square matrix, the information provided by two of the equations are actually repeated. For example, one equation is simply a multiplication of the other.

For example, if we try to apply LU factorisation to such a matrix:

# let x = Mat.of_array [|1.; 2.; 3.; 0.; 0.; 1.; 0.; 0.; 2.|] 3 3;;
val x : Mat.mat =
   C0 C1 C2
R0  1  2  3
R1  0  0  1
R2  0  0  2

# x;;
Exception: Failure "LAPACKE: 2".

Obviously, we cannot have pivot in the second column, and therefore this matrix is singular and cannot be factorised into $LU$. As can be seen in this example, we cannot expect the linear algebra functions to be a magic lamb and do our bidding every time. Understanding the theory of linear algebra helps to better understand how these functions work.

To decide the general solutions to $Ax=b$, we need to understand the concept of rank. The rank of a matrix is the number of pivots in the elimination process. To get a more intuitive understanding of rank, we need to know the concept of *linear independent. In a linear combination $\sum_{i=1}^nc_iv_i$ where $v_i$ are vectors and $c_i$ are numbers, if $\sum_{i=1}^nc_iv_i = 0$ only happens when $c_i = 0$ for all the $i$’s, then the vectors $v_1, v_2, \ldots, v_n$ are linearly independent. Then the rank of a matrix is the number of independent rows in the matrix. We can understand rank as the number of “effective” rows in the matrix.

As an example, we can check the rank of the previous matrix.

Linalg.D.rank x

As can be example, the rank is 2, which means only two effective rows, and thus cannot be factorised to find the only solution.

One application of rank is in a crucial linear algebra idea: basis. A sequence of vectors is the basis of a space or subspace if: 1) these vectors are linear independent and 2) all the the vectors in the space can be represented as the linear combination of vectors in the basis.

A space can have infinitely different bases, but the number of vectors in these bases are the same. This number is called the dimension of this vector space. For example, a $m$ by $n$ matrix A has rank of $r$, then the dimension of its null space is $n-r$, and the dimension of its column space is $r$. Think about a full-rank matrix where $r=n$, then the dimension of column matrix is $n$, which means all its columns can be a basis of the column space, and that the null space dimension is zero so that the only solution of $Ax=0$ is a zero vector.


We can think of the basis of a vector space as the Cartesian coordinate system in a three-dimensional space, where every vector in the space can be represented with the three vectors ni the space: the x, y and z axis. Actually, we can use many three vectors system as the coordinate bases, but the x. y, z axis is used is because they are orthogonal to each other. An orthogonal basis can greatly reduce the complexity of problems. The same can be applied in the basis of vector spaces.

Orthogonality is not limited to vectors. Two vectors $a$ and $b$ are orthogonal are orthogonal if $a^Tb = 0$. Two subspaces A and B are orthogonal if every vector in A is orthogonal to every vector in B. For example, the nullspace and row space of a matrix are perpendicular to each other.

Among the bases of a subspace, if every vector is perpendicular to each other, it is called an orthogonal matrix. Moreover, if the length of each vector is normalised to one unit, it becomes the orthonormal basis.

For example, we can use the null function to find an orthonormal basis vector $x$ or the null space of a matrix, i.e. $Ax=0$.

# let a = Mat.magic 4;;
val a : Mat.mat =

   C0 C1 C2 C3
R0  1 15 14  4
R1 12  6  7  9
R2  8 10 11  5
R3 13  3  2 16

# let x = Linalg.D.null a;;
val x : Linalg.D.mat =

R0  0.223607
R1   0.67082
R2  -0.67082
R3 -0.223607

# a x |> Mat.l2norm';;
- : float = 3.7951119214201712e-15

Now that we know what is orthogonal basis, the next question is, how to build one? The QR Factorisation is used to construct orthogonal basis in a subspace. Specifically, it decomposes a matrix A into the product of an orthogonal matrix Q and an upper triangular matrix R, i.e. A = QR. It is provided in the linear algebra module.

val qr : ?thin:bool -> ?pivot:bool -> ('a, 'b) t -> ('a, 'b) t * ('a, 'b) t * (int32, int32_elt) t

The qr x function calculates QR decomposition for an m by n matrix x. The function returns a 3-tuple, the first two are Q and R, and the third is the permutation vector of columns. The default value of parameter pivot is false, setting pivot to true lets qr performs pivoted factorisation. Note that the returned indices are not adjusted to 0-based C layout. By default, qr performs a reduced QR factorisation, full factorisation can be enabled by setting thin parameter to false.

# let a = Mat.of_array [|12.; -51.; 4.; 6.; 167.; -68.; -4.; 24.; -41.|] 3 3;;
val a : Mat.mat =

   C0  C1  C2
R0 12 -51   4
R1  6 167 -68
R2 -4  24 -41

# let q, r, _ = Linalg.D.qr a;;
val q : Linalg.D.mat =

          C0        C1         C2
R0 -0.857143  0.394286   0.331429
R1 -0.428571 -0.902857 -0.0342857
R2  0.285714 -0.171429   0.942857

val r : Linalg.D.mat =

    C0   C1  C2
R0 -14  -21  14
R1   0 -175  70
R2   0    0 -35

Solving Ax = b

We can now discuss the general solution structure to $Ax=0$ and $Ax=b$. Again, here $A$ is a $m\times~n$ matrix. The theorems declare that, there exists non-zero solution(s) to $Ax=0$ if and only if $\textrm{rank}(a) <= n$. If $r(A) < n$, then the nullspace of $A$ is of dimension $n - r$ and the $n-r$ orthogonal basis can be found with null function. Here is an example.

# let a = Mat.of_array [|1.;5.;-1.;-1.;1.;-2.;1.;3.;3.;8.;-1.;1.;1.;-9.;3.;7.|] 4 4;;
val a : Mat.mat =

   C0 C1 C2 C3
R0  1  5 -1 -1
R1  1 -2  1  3
R2  3  8 -1  1
R3  1 -9  3  7

# Linalg.D.rank a;;
- : int = 2

This a rank 2 matrix, so the nullspace contains 4 - 2 = 2 vectors:

# Linalg.D.null a;;
- : Linalg.D.mat =

          C0        C1
R0 -0.495995  0.692163
R1 0.0375596 -0.306931
R2 -0.747852 -0.610728
R3  0.439655 -0.231766

These two vectors are called the fundamental system of solutions of $Ax=0$. All the solutions of $Ax=0$ can then be expressed using the fundamental system:

\[c_1\left[\begin{matrix}-0.85 \\ 0.27 \\ 0.07 \\0.44\end{matrix} \right] + c_2\left[\begin{matrix}0.013\\ 0.14 \\ 0.95 \\-0.23\end{matrix} \right]\]

Here $c_1$ and $c_2$ can be any constant numbers.

For solving the general form $Ax=b$ where b is $m\times~1$ vector, there exist only one solution if and only if $\textrm{rank}(A) = \textrm{rank}([A, b]) = n$. Here $[A, b]$ means concatenating $A$ and $b$ along the column. If $\textrm{rank}(A) = \textrm{rank}([A, b]) < n$, $Ax=b$ has infinite number of solutions. These solutions has a general form:

\[x_0 + c_1x_1 + c2x2 + \ldots +c_kx_k\]

Here $x_0$ is a particular solution to $Ax=b$, and $x_1, x_2, \ldots, x_k$ are the fundamental solution system of $Ax=0$.

We can use linsolve function to find one particular solution. In the Linear Algebra, the function linsolve a b -> x solves a linear system of equations a * x = b. By default, the function uses LU factorisation with partial pivoting when a is square and QR factorisation with column pivoting otherwise. The number of rows of a must be equal to the number of rows of b. If a is a upper or lower triangular matrix, the function calls the solve_triangular function.

Here is an example.

# let a = Mat.of_array [|2.;3.;1.;1.;-2.;4.;3.;8.;-2.;4.;-1.;9.|] 4 3;;
val a : Mat.mat =

   C0 C1 C2
R0  2  3  1
R1  1 -2  4
R2  3  8 -2
R3  4 -1  9

# let b = Mat.of_array [|4.;-5.;13.;-6.|] 4 1;;
val b : Mat.mat =
R0  4
R1 -5
R2 13
R3 -6

# let x0 = Linalg.D.linsolve a b;;
val x0 : Owl_dense_matrix_d.mat =

R0 -3.97999
R1  3.48999
R2  1.48999

Then we use null to find the fundamental solution system. You can verify that matrix a is of rank 2, so that the solution system for $ax=0$ should contain only 3 - 2 = 1 vector.

# let x1 = Linalg.D.null a;;
val x1 : Owl_dense_matrix_d.mat =

R0 -0.816497
R1  0.408248
R2  0.408248

So the solutions to $Ax=b$ can be expressed as:

\[\left[\begin{matrix}-1 \\ 2 \\ 0 \end{matrix} \right] + c_1\left[\begin{matrix}-0.8\\ 0.4 \\ 0.4 \end{matrix} \right]\]

So the takeaway from this chapter is that the using these linear algebra functions often requires solid background knowledge. Blindly using them could leads to wrong or misleading answers.

Matrix Sensitivity

The sensitivity of a matrix is perhaps not the most important issue in the traditional linear algebra, but is crucial in the numerical computation related problems. It answers this question: in $Ax=b$, if we change the $A$ and $b$ slightly, how much will the $x$ be affected? The Condition Number is a measurement of the sensitivity of a square matrix.

First, we need to understand the Norm of a matrix. The norm, or 2-norm of a matrix $|A|$ is calculated as square root of the maximum eigenvalue of $A^HA$. The norm of a matrix is a upper limit so that for any $x$ we can be certain that $|Ax| \leq |A||x|$. Here $|Ax|$ and $|x|$ are the L2-Norm for vectors. The $|A|$ bounds the how large the $A$ can amplify the input $x$. We can calculate the norm with norm in the linear algebra module.

The most frequently used condition number is that represent the sensitivity of inverse matrix. With the definition of norm, the condition number for inversion of a matrix can be expressed as $|A||A^{-1}|$. We can calculate it using the cond function.

Let’s look at an example:

# let a = Mat.of_array [|4.1; 2.8; 9.7; 6.6 |] 2 2;;
val a : Mat.mat =
    C0  C1
R0 4.1 2.8
R1 9.7 6.6

# let c = Linalg.D.cond a;;
val c : float = 1622.99938385646283

Its condition number for inversion is much larger than one. Therefore, a small change in $A$ should leads to a large change of $A^{-1}$.

# let a' = Linalg.D.inv a;;
val a' : Linalg.D.mat =
    C0  C1
R0 -66  28
R1  97 -41

# let a2 = Mat.of_array [|4.1; 2.8; 9.67; 6.607 |] 2 2;;
val a2 : Mat.mat =
     C0    C1
R0  4.1   2.8
R1 9.67 6.607

# let a2' = Linalg.D.inv a2;;
val a2' : Linalg.D.mat =

         C0       C1
R0  520.236 -220.472
R1 -761.417  322.835

We can see that by changing the matrix by only a tiny bit, the inverse of $A$ changes dramatically, and so is the resulting solution vector $x$.


Other than pivots, another basic quantity in linear algebra is the determinants. For a square matrix A:

\[\left[\begin{matrix}a_{11} & a_{12} & \ldots & a_{1n} \\ a_{21} & a_{22} & \ldots & a_{2n} \\ \vdots & \vdots & \ldots & \vdots \\ a_{n1} & a_{n2} & \ldots & a_{nn} \end{matrix} \right]\]

its determinants det(A) is defined as:


Here $\tau(j_1~j_2~\ldots~j_n) = i_1 + i_2 + \ldots + i_{n-1}$, where $i_k$ is the number of $j_p$ that is smaller than $j_k$ for $p \in [k+1, n]$.

Mathematically, there are many techniques that can be used to simplify this calculation. But as far as this book is concerned, it is sufficient for us to use the det function to calculate the determinants of a matrix.

Why is the concept of determinant important? Its most important application is to using determinant to decide if a square matrix A is invertible or singular. The determinant $\textrm{det}(A) \neq 0$ if and only if $\textrm{A} = n$. Also it can be expressed as $\textrm{det}(A) \neq 0$ if and only if matrix A is invertible.

We can also use it to understand the solution of $Ax=b$: if $\textrm{det}(A) \neq 0$, then $Ax=b$ has one and only one solution. This theorem is part of the Cramer’s rule. These properties are widely used in finding eigenvalues. As will be shown in the next section.

Since sometimes we only care about if the determinant is zero or not, instead of the value itself, we can also use a similar function logdet. It computes the logarithm of the determinant, but it avoids the possible overflow or underflow problems in computing determinant of large matrices.

# let x = Mat.magic 5;;
val x : Mat.mat =

   C0 C1 C2 C3 C4
R0 17 24  1  8 15
R1 23  5  7 14 16
R2  4  6 13 20 22
R3 10 12 19 21  3
R4 11 18 25  2  9

# Linalg.D.det x;;
- : float = 5070000.

# Linalg.D.logdet x;;
- : float = 15.4388513755673671

Eigenvalues and Eigenvectors

Solving $Ax=\lambda~x$

Now we need to change the topic from $Ax=b$ to $Ax=\lambda~x$. For an $n\times~n$ square matrix, if there exist number $\lambda$ and non-zero column vector $x$ to satisfy:

\((\lambda~I - A)x = 0,\){#eq:linear-algebra:eigen}

then $\lambda$ is called eigenvalue, and $x$ is called the eigenvector of $A$.

To find the eigenvalues of A, we need to find the roots of the determinant of $\lambda~I - A$. $\textrm{det}(\lambda~I - A) = 0$ is called the characteristic equation. For example, for

\[A = \left[\begin{matrix}3 & 1 & 0 \\ -4 & -1 & 0 \\ 4 & -8 & 2 \end{matrix} \right]\]

Its characteristic matrix $\lambda~I - A$ is:

\[\left[\begin{matrix}\lambda-3 & 1 & 0 \\ -4 & \lambda+1 & 0 \\ 4 & -8 & \lambda-2 \end{matrix} \right]\]

According to the definition of determinant,

\[\textrm{det}(\lambda~I - A) = (\lambda-1)^2(\lambda-2) = 0.\]

According to the theory of polynomials, this characteristic polynomials has and only has $n$ roots in the complex space. Specifically, here we have three eigenvalues: $\lambda_1=1, \lambda_2 = 1, \lambda=2$.

Put $\lambda_1$ back to characteristic equation, we have: $(I - A)x = 0$. Therefore, we can find the fundamental solution system of $I - A$ with:

# let basis =
    let ia = Mat.((eye 3) - (of_array [|3.;1.;0.;-4.;-1.;0.;4.;-8.;2.|] 3 3)) in
    Linalg.D.null ia;;
val basis : Owl_dense_matrix_d.mat =

R0 -0.0496904
R1  0.0993808
R2   0.993808

We have a fundamental solution $x_0 = [-0.05, 0.1, 1]^T$. Therefore all the $k_0x_0$ are the corresponding eigenvector of the eigenvalue $1$. Similarly, we can calculate that eigenvectors for the eigenvalue $2$ are $k_1[0, 0, 1]^T$.

We can use eig to find the eigenvectors and eigenvalues of a matrix. eig x -> v, w computes the right eigenvectors v and eigenvalues w of an arbitrary square matrix x. The eigenvectors are column vectors in v, their corresponding eigenvalues have the same order in w as that in v.

# let eigvec, eigval =
    let a = Mat.of_array [|3.;1.;0.;-4.;-1.;0.;4.;-8.;2.|] 3 3 in
    Linalg.D.eig a;;
val eigvec : Owl_dense_matrix_z.mat =

        C0                         C1                          C2
R0 (0, 0i) (-0.0496904, 3.56977E-10i) (-0.0496904, -3.56977E-10i)
R1 (0, 0i) (0.0993808, -1.30892E-09i)   (0.0993808, 1.30892E-09i)
R2 (1, 0i)             (0.993808, 0i)              (0.993808, 0i)

val eigval : Owl_dense_matrix_z.mat =

        C0                C1                 C2
R0 (2, 0i) (1, 1.19734E-08i) (1, -1.19734E-08i)

Note that the result are expressed as complex numbers. If we only want the eigenvalues, we can use the eigvals function. Both functions provide the boolean permute and scale arguments to indicate whether the input matrix should be permuted and/or diagonally scaled. One reason that eigenvalue and eigenvector are important is that the pattern $Ax=\lambda~x$ frequently appears in scientific and engineering analysis to describe the change of dynamic system over time.

Complex Matrices

As can be seen in the previous example, complex matrices are frequently used in eigenvalues in eigenvectors. In this section we re-introduce some previous concepts in the complex space.

We have seen the Symmetric Matrix. It can be extended to the complex numbers, called Hermitian Matrix, denoted by $A^H$. Instead of requiring it to be the same as its transpose, a hermitian matrix equals to its conjugate transpose. A conjugate transpose means that during transposing, each element $a+bi$ changes to its conjugate $a-bi$. Hermitian is thus a generalisation of the symmetric matrix. We can use the is_hermitian function to check if a matrix is hermitian, as can be shown in the next example.

# let a = Dense.Matrix.Z.of_array [|{re=1.; im=0.}; {re=2.; im=(-1.)}; {re=2.; im=1.}; {re=3.; im=0.}|] 2 2;;
val a : Dense.Matrix.Z.mat =

        C0       C1
R0 (1, 0i) (2, -1i)
R1 (2, 1i)  (3, 0i)

# Linalg.Generic.is_hermitian a;;
- : bool = true

We can use the conj function of a complex matrix to perform the conjugate transpose:

# Dense.Matrix.Z.(conj a |> transpose);;
- : Dense.Matrix.Z.mat =

         C0       C1
R0 (1, -0i) (2, -1i)
R1  (2, 1i) (3, -0i)

A theorem declares that if a matrix is hermitian, then for all complex vectors $x$, $x^HAx$ is real, and every eigenvalue is real.

# Linalg.Z.eigvals a;;
- : Linalg.Z.mat =

                          C0                      C1
R0 (-0.44949, -5.06745E-17i) (4.44949, 5.06745E-17i)

A related concept is the Unitary Matrix. A matrix $U$ is unitary if $U^HU=I$. The inverse and conjugate transpose of $U$ are the same. It can be compared to the orthogonal vectors in the real space.

Similarity Transformation and Diagonalisation

For a $nxn$ matrix A, and any invertible $nxn$ matrix M, the matrix $B = M^{-1}AM$ is similar to A. One important property is that similar matrices share the same eigenvalues. The intuition is that, think of M as the change of basis matrix, and A itself is a linear transformation, so $M^{-1}AM$ means changing the basis first, applying the linear transformation, and then change the basis back. Therefore, changing from A to B actually changes the linear transformation using one set of basis to another.

In a three dimensional space, if we can change using three random vectors as the basis of linear transformation to using the standard basis $[1, 0, 0]$, $[0, 1, 0]$, $[0, 0, 1]$, the related problem can be greatly simplified. Finding the suitable similar matrix is thus important in simplifying the calculation in many scientific and engineering problems.

One possible kind of simplification is to find a triangular matrix as similar. The Schur’s Lemma declares that A can be decomposed into $UTU^{-1}$ where $U$ is a unitary function, and T is an upper triangular matrix.

# let a = Dense.Matrix.Z.of_array [|{re=1.; im=0.}; {re=1.; im=0.}; {re=(-2.); im=0.}; {re=3.; im=0.}|] 2 2;;
val a : Dense.Matrix.Z.mat =

         C0      C1
R0  (1, 0i) (1, 0i)
R1 (-2, 0i) (3, 0i)

# let t, u, eigvals = Linalg.Z.schur a;;
val t : Linalg.Z.mat =

        C0                     C1
R0 (2, 1i) (-2.21283, -0.321545i)
R1 (0, 0i)               (2, -1i)

val u : Linalg.Z.mat =

                       C0                     C1
R0 (-0.408248, 0.408248i) (-0.775215, 0.256337i)
R1        (-0.816497, 0i)  (0.515776, 0.259439i)

val eigvals : Linalg.Z.mat =
        C0       C1
R0 (2, 1i) (2, -1i)

The returned result t is apparent a upper triangular matrix, and the u can be verified to be a unitary matrix:

# Dense.Matrix.Z.(dot u (conj u |> transpose));;
- : Dense.Matrix.Z.mat =

                             C0                         C1
R0           (1, -1.70126E-17i) (7.12478E-17, 7.4864E-17i)
R1 (7.12478E-17, -8.63331E-17i)          (1, 7.21217E-18i)

Another very important similar transformation is diagonalisation. Suppose A has $n$ linear-independent eigenvectors, and make them the columns of a matrix Q, then $Q^{-1}AQ$ is a diagonal matrix $\Lambda$, and the eigenvalues of A are the diagonal elements of $\Lambda$. It’s inverse $A = Q\Lambda~Q^{-1}$ is called Eigendecomposition. Analysing A’s diagonal similar matrix $\Lambda$ instead of A itself can greatly simplify the problem.

Not every matrix can be diagonalised. If any two of the $n$ eigenvalues of A are not the same, then its $n$ eigenvectors are linear-independent ana thus A can be diagonalised. Specifically, every real symmetric matrix can be diagonalised by an orthogonal matrix. Or put into the complex space, every hermitian matrix can be diagonalised by a unitary matrix.

Positive Definite Matrices

Positive Definiteness

In this section we introduce the concept of Positive Definite Matrix, which unifies the three most basic ideas in linear algebra: pivots, determinants, and eigenvalues.

A matrix is called Positive Definite if it is symmetric and that $x^TAx > 0$ for all non-zero vectors $x$. There are several necessary and sufficient condition for testing if a symmetric matrix A is positive definite:

  1. $x^TAx>0$ for all non-zero real vectors x
  2. $\lambda_i >0$ for all eigenvalues $\lambda_i$ of A
  3. all the upper left matrices have positive determinants
  4. all the pivots without row exchange satisfy $d > 0$
  5. there exists invertible matrix B so that A=B^TB

For the last condition, we can use the Cholesky Decomposition to find the matrix B. It decompose a Hermitian positive definite matrix into the product of a lower triangular matrix and its conjugate transpose $LL^H$:

# let a = Mat.of_array [|4.;12.;-16.;12.;37.;-43.;-16.;-43.;98.|] 3 3;;
val a : Mat.mat =

    C0  C1  C2
R0   4  12 -16
R1  12  37 -43
R2 -16 -43  98

# let l = Linalg.D.chol a;;
val l : Linalg.D.mat =
   C0 C1 C2
R0  2  6 -8
R1  0  1  5
R2  0  0  3

# Mat.(dot (transpose l) l);;
- : Mat.mat =

    C0  C1  C2
R0   4  12 -16
R1  12  37 -43
R2 -16 -43  98

If in $Ax=b$ we know that $A$ is hermitian and positive definite, then we can instead solve $L^Lx=b$. As we have seen previously, solving linear system that expressed with triangular matrices is easy. The Cholesky decomposition is more efficient than the LU decomposition.

In the Linear Algebra module, we use is_posdef function to do this test. If you look at the code in Owl, it is implemented by checking if the Cholesky decomposition can be performed on the input matrix.

# let is_pos =
    let a = Mat.of_array [|4.;12.;-16.;12.;37.;-43.;-16.;-43.;98.|] 3 3 in
    Linalg.D.is_posdef a;;
val is_pos : bool = true

The definition of semi-positive definite is similar, only that it allows the “equals to zero” part. For example, $x^TAx \leq 0$ for all non-zero real vectors x.

The pattern $Ax=\lambda~Mx$ exists in many engineering analysis problems. If $A$ and $M$ are positive definite, this pattern is parallel to the $Ax=\lambda~x$ where $\lambda > 0$. For example, a linear system $y’=Ax$ where $x = [x_1, x_2, \ldots, x_n]$ and $y’ = [\frac{dx_1}{dt}, \frac{dx_2}{dt}, \ldots, \frac{dx_n}{dt}]$. We will see such an example in the Ordinary Differential Equation chapter. In a linearised differential equations the matrix A is the Jacobian matrix. The eigenvalues decides if the system is stable or not. A theorem declares that this system is stable if and only if there exists positive and definite matrix $V$ so that $-(VA+A^TV)$ is semi-positive definite.

Singular Value Decomposition

The singular value decomposition (SVD) is among the most important matrix factorizations of the computational era. The SVD provides a numerically stable matrix decomposition that can be used for a variety of purposes and is guaranteed to exist.

Any m by n matrix can be factorised in the form:

\(A=U\Sigma~V^T\) {#eq:linear-algebra:svg}

Here $U$ is is a $m\times~m$ matrix. Its columns are the eigenvectors of $AA^T$. Similarly, $V$ is a $n\times~n$ matrix, and the columns of V are eigenvectors of $A^TA$. The $r$ (rank of A) singular value on the diagonal of the $m\times~n$ diagonal matrix $\Sigma$ are the square roots of the nonzero eigenvalues of both $AA^T$ and $A^TA$. It’s close related with eigenvector factorisation of a positive definite matrix. For a positive definite matrix, the SVD factorisation is the same as the $Q\Lambda~Q^T$.

The SVD has a profound intuition. A matrix $A$ represents a linear transformation. SVD states that, any such linear transformation, can be decomposed into three simple transformation: a rotation ($V$), a scaling transformation ($\Sigma$), and another rotation ($U$). These three transformations are much easier to analyse than a random transformation $A$. After applying $A$ to a domain, the columns of $V$ is and a set of orthonormal basis in the original domain, and columns of $U$ is the new set of orthonormal basis of the domain that is transferred after applying $A$. The $\Sigma$ diagonal matrix contains the scaling factors on different dimensions, and a singular value in $\Sigma$ thus represents the significance of that certain dimension in the linear space. A small singular value indicates that the information contained in a matrix is somehow redundant and can be compressed/removed without affecting the information carried in this matrix. This is why SVD can be used for Principal Component Analysis (PCA), as we will show in the NLP chapter later in this book.

We can use the svd function to perform this factorisation. Let’s use the positive definite matrix as an example:

# let a = Mat.of_array [|4.;12.;-16.;12.;37.;-43.;-16.;-43.;98.|] 3 3;;
val a : Mat.mat =

    C0  C1  C2
R0   4  12 -16
R1  12  37 -43
R2 -16 -43  98

# let u, s, vt = Linalg.D.svd ~thin:false a;;
val u : Linalg.D.mat =

          C0        C1        C2
R0 -0.163007 -0.212727  0.963419
R1 -0.457324 -0.848952  -0.26483
R2  0.874233 -0.483764 0.0410998

val s : Linalg.D.mat =

        C0     C1       C2
R0 123.477 15.504 0.018805

val vt : Linalg.D.mat =

          C0        C1        C2
R0 -0.163007 -0.457324  0.874233
R1 -0.212727 -0.848952 -0.483764
R2  0.963419  -0.26483 0.0410998

Note that the diagonal matrix s is represented as a vector. We can extend it with

# let s = Mat.diagm s;;
val s : Mat.mat =

        C0     C1       C2
R0 123.477      0        0
R1       0 15.504        0
R2       0      0 0.018805

However, it is only possible when we know that the original diagonal matrix is square, otherwise the vector contains the $min(m, n)$ diagonal elements.

Also, we can find to the eigenvectors of $AA^T$ to verify that it equals to the eigenvector factorisation.

# Linalg.D.eig Mat.(dot a (transpose a));;
- : Linalg.Z.mat * Linalg.Z.mat =
                C0              C1             C2
R0  (0.163007, 0i)  (0.963419, 0i) (0.212727, 0i)
R1  (0.457324, 0i)  (-0.26483, 0i) (0.848952, 0i)
R2 (-0.874233, 0i) (0.0410998, 0i) (0.483764, 0i)

              C0                C1            C2
R0 (15246.6, 0i) (0.000353627, 0i) (240.373, 0i)

In this example we ues the thin parameter. By default, the svd function performs a reduced SVD, where $\Sigma$ is a $m\times~m$ matrix and $V^T$ is a m by n matrix.

Besides, svd, we also provide svdvals that only returns the singular values, i.e. the vector of diagonal elements. The function gsvd performs a generalised SVD. gsvd x y -> (u, v, q, d1, d2, r) computes the generalised SVD of a pair of general rectangular matrices x and y. d1 and d2 contain the generalised singular value pairs of x and y. The shape of x is m x n and the shape of y is p x n. Here is an example:

# let x = Mat.uniform 5 5;;
val x : Mat.mat =

         C0       C1       C2       C3        C4
R0 0.548998 0.623231  0.95821 0.440292  0.551542
R1 0.406659 0.631188 0.434482 0.519169 0.0841121
R2 0.439047 0.459974 0.767078 0.148038  0.445326
R3 0.307424 0.129056 0.998469 0.163971  0.718515
R4 0.474817 0.176199 0.316661 0.476701  0.138534

# let y = Mat.uniform 2 5;;
val y : Mat.mat =

         C0       C1       C2       C3         C4
R0 0.523882 0.150938 0.718397   0.1573 0.00542669
R1 0.714052 0.874704 0.436799 0.198898   0.406196

# let u, v, q, d1, d2, r = Linalg.D.gsvd x y;;
val u : Linalg.D.mat =

          C0        C1        C2        C3        C4
R0 -0.385416 -0.294725 -0.398047 0.0383079 -0.777614
R1   0.18222 -0.404037 -0.754063 -0.206208  0.438653
R2 -0.380469 0.0913876 -0.199462  0.847599  0.297795
R3 -0.807427 -0.147819  0.194202 -0.418909  0.336172
R4  0.146816 -0.848345  0.442095  0.249201 0.0347409

val v : Linalg.D.mat =

         C0        C1
R0 0.558969  0.829189
R1 0.829189 -0.558969

val q : Linalg.D.mat =

          C0        C1        C2        C3        C4
R0 -0.436432 -0.169817  0.642272 -0.603428 0.0636394
R1 -0.124923  0.407939 -0.376937 -0.494889 -0.656494
R2  0.400859  0.207482 -0.268507 -0.567199  0.634391
R3 -0.283012 -0.758558 -0.559553 -0.173745 0.0347457
R4  0.743733 -0.431612  0.245375 -0.197629  -0.40163

val d1 : Linalg.D.mat =

   C0 C1 C2       C3        C4
R0  1  0  0        0         0
R1  0  1  0        0         0
R2  0  0  1        0         0
R3  0  0  0 0.319964         0
R4  0  0  0        0 0.0583879

val d2 : Linalg.D.mat =

   C0 C1 C2      C3       C4
R0  0  0  0 0.94743        0
R1  0  0  0       0 0.998294

val r : Linalg.D.mat =

         C0       C1        C2       C3         C4
R0 -0.91393 0.196148 0.0738038  1.45659  -0.268024
R1        0 0.463548  0.286501  1.38499 -0.0595374
R2        0        0  0.346057 0.954629   0.167467
R3        0        0         0 -1.56104  -0.124984
R4        0        0         0        0   0.555067

# Mat.(u *@ d1 *@ r *@ transpose q =~ x);;
- : bool = true
# Mat.(v *@ d2 *@ r *@ transpose q =~ y);;
- : bool = true

The SVD is not only important linear algebra concept, but also has a wide and growing applications. For example, the Moore-Penrose pseudo-inverse that works for non-invertible matrix can be implemented efficiently using SVD (we provide pinv function in the linear algebra module for the pseudo inverse). It can also be used for information compression such as in image processing. As we have said, in the Natural Language Processing chapter we will see how SVD plays a crucial role in the language processing field to perform principal component analysis.

Internal: CBLAS and LAPACKE

This section is for those of you who are eager for more low level information. The BLAS (Basic Linear Algebara Subprogramms) is a specification that describes a set of low-level routines for common linear algebra operation. The LAPACKE contains more linear algebra routines, such as solving linear systems and matrix factorisations, etc. Efficient implementation of these function has been practices for a long time in many softwares. Interfacing to them can provide easy access to high performance routines.

Low-level Interface to CBLAS & LAPACKE

Owl has implemented the full interface to CBLAS and LAPACKE. Comparing to Julia which chooses to interface to BLAS/LAPACK, you might notice the extra C in CBLAS and E in LAPACKE because they are the corresponding C-interface of Fortran implementations. It is often believed that C-interface may introduce some extra overhead. However, it turns out that we cannot really notice any difference at all in practice when dealing with medium or large problems.

The functions in Owl_cblas and Owl_lapacke_generated are very low-level, e.g., you need to deal with calculating parameters, allocating workspace, post-processing results, and many other tedious details. You do not really want to use them directly unless you have enough background in numerical analysis and chase after the performance. So for example, the LU factorisation is performed using the sgetrf or dgetrf function in the Owl_lapacke_generated module, the signature of which look like this:

val sgetrf:  layout:int -> m:int -> n:int -> a:float ptr -> lda:int -> ipiv:int32 ptr -> int

Instead of worrying about all these parameters, the getrf function in the Owl_lapacke module provides interface that are more straightforward:

val getrf : a:('a, 'b) t -> ('a, 'b) t * (int32, int32_elt) t

These low-level functions provides more general access for users. If this still looks a bit unfamiliar to your, in the Linalg module we have:

val lu : ('a, 'b) t -> ('a, 'b) t * ('a, 'b) t * (int32, int32_elt) t

Here the function lu x -> (l, u, ipiv) calculates LU decomposition of input matrix x, and returns the L, U matrix together with the pivoting index. In practice, you should always use Linalg module which gives you a high-level wrapper for frequently used functions.

Besides these function, the linear algebra module also provides some helper functions. For example, the peakflops ~n () function returns the peak number of float point operations using Owl_cblas_basic.dgemm function. The default matrix size is 2000 x 2000, but the user can change this by setting n arguments.

Sparse Matrices

What we have mentioned so far are dense matrix. But when the elements are sparsely distributed in the matrix, such as the identity matrix, the sparse structure might be more efficient.

The most intuitive way to represent a sparse matrix is to use the (row, column, value) triplet. The tuples can be pre-sorted according to row and column values so enhance random access time. Another commonly used format is the Compressed Sparse Row (CSR) format. It is similar to the triplet format with row indices compressed. Here is an example. Suppose that we need to represent the matrix below in CSR:

$\left[\begin{matrix} 10 & 0 & 9 & 8 \ 0 & 7 & 0 & 0 \ 6 & 5 & 0 & 0 \end{matrix} \right]$

The column indicies and values are still the same, but now the length of row indicies is the same as that of the row number. Each element shows the index of the first value in that row in the whole value vector, as shown in [@tbl:linear-algebra:csr].

| Row | 0 | 1 | 4 |
| — | — | —— | —- |
| Column | 0 | 2, 3, 1 | 0, 1 |
| Value | 10 | 9, 8, 7 | 6, 5 |
CSR format illustration {#tbl:linear-algebra:csr}

The benefit of CSR is that it is more efficient at accessing row-vectors or row operations. Another data structure Compressed Sparse Column (CSC) is similar, except that it compresses column vector instead of row.

The sparse matrix is proivded in the Sparse.Matrix module, and also support the four types of number in the S, D, C, and Z submodules. Currently we interfaces to the Eigen library to use its sparse matrix representation.


This chapter gives an overview of the topic of Linear Algebra. Starting from the basic vector and matrix, we briefly introduce several key ideas in this field, including Guassian elimination, vector spaces, determinatns, Eigenvalues, Positive Definite Matrices, etc. We explain these topics with example and code using Owl. Sure enough, there is no way to cover all these topics in one chapter. We refer to classic textbooks for more inforation. In the end, we introduce how the linear algebra module is implemented in a numerical library such as Owl. We close the discussion with a brief explanation of the sparse matrix and the representation formats used.