Back

# Compiler Backends

For a numerical library, it is always beneficial and a challenge to extend to multiple execution backends. We have seen how we support accelerators such as GPU by utilising symbolic representation and computation graph standard such as ONNX. In this chapter we introduce how Owl can be used on more edge-oriented backends, including JavaScript and MirageOS. We also introduce the base library in Owl, since this pure OCaml library is built to supported these backends.

## Base Library

Before we start, we need to understand how Owl enables compiling to multiple backends by providing different implementations. The Owl framework, as well as many of its external libraries, is actually divided to two parts: a Base library and a Core library. The base library is implemented with pure OCaml. For some backends such as JavaScript, we can only use the functions implemented in OCaml.

You may wonder how much we will be limited by the Base. Fortunately, the most advanced functions in Owl are often implemented in pure OCaml and they live in the Base, which includes e.g. algorithmic differentiation, optimisation, even neural networks and many others. Here is the structure of the core functor stack in Owl:

Ndarray is the core building block in Owl. As we have described in the previous chapters how we use C code to push forward the performance of Owl computation. The base library aims to implements all the necessary functions as the core library ndarray module. The stack is implemented in such way that the user can switch between these two different implementation without the modules of higher layer. In the Owl functor stack, ndarray is used to support the CGraph module to provide lazy evaluation functionalities.

You might be wondering: where is the ndarray module then? Here we use Owl_base_algodiff_primal_ops module, which is simply a wrapper around the base ndarray module. It also includes a small number of Matrix and Linear Algebra functions. By providing this wrapper instead of using the Ndarray module directly, we can avoid mixing all the function in the ndarray module and makes it a large Goliath.

Next, the Algorithmic Differentiation can depend its computation module on normal Ndarray or its lazy version. For example, you can have an AD that relies on the normal single precision base ndarray module:

module AD = Owl_algodiff_generic.Make (Owl_base_algodiff_primal_ops.S)

Or it can build on an double precision lazy evaluated core ndarray module:

module CPU_Engine = Owl_computation_cpu_engine.Make (Owl_algodiff_primal_ops.D)
module AD = Owl_algodiff_generic.Make (CPU_Engine)

Going even higher, we have the advanced modules Optimisation and Neural Network modules. They are both based on the AD module. For example, the code below shows how we can build a neural graph module by layers fo functors from the base ndarray.

module G = Owl_neural_graph.Make
(Owl_neural_neuron.Make
(Owl_optimise_generic.Make
(Owl_algodiff_generic.Make
(Owl_base_algodiff_primal_ops.S))))

Normally the users does not have to care about how these modules are constructed layer by layer, but understanding the functor stack and typing is nevertheless beneficial, especially when you are creating new modules that relies on the base ndarray module.

These examples show that once we have built a application with the core Ndarray module, we can then seamlessly switch it to base ndarray module without changing anything else. That means that all the code and examples we have seen so far can be used directly on different backends that requires pure implementation.

The base library is still an on-going work and there is still a lot to do. Though the Ndarray module is a large part in base library, there are other modules that also needs to be re-implemented in OCaml, such as Linear Algebra module. We need to add more functions such as the SVD factorisation. Even for the Ndarray itself we still cannot totally cover the core ndarray yet. Our strategy is that, we put most of the signature file in base library, and the core library signature file include its corresponding signature file from the base library, plus functions that are currently unique to core library. The target is to total coverage so that the core and base library provide exactly the same functions.

As can be expected, the pure OCaml implementation normally performs worse than the C code implemented version. For example, for the complex convolution, without the help of optimised routines from OpenBLAS ect., we can only provide the naive implementation that includes multiple for-loops. It’s performance is orders of magnitude slower than the C version. Currently our priority is to implement the functions themselves instead of caring about function optimisation, nor do we intend to out-perform C code with pure OCaml implementation.

## Backend: JavaScript

At first glance, JavaScript has very little to do with high-performance scientific computing. Then why Owl cares about it? One important reason is that browser is arguably the most widely deployed technology on various edge devices, e.g. mobile phones, tablets, laptops, and etc. More functionalities are being pushed from data centers to edge for reduced latency, better privacy and security. And JavaScript applications running in a browser are getting more complicated and powerful. Moreover, JavaScript interpreters are being increasingly optimised, and even relatively complicated computational tasks can run with reasonable performance.

This chapter uses two simple examples to demonstrate how to compile Owl applications into JavaScript code so that you can deploy the analytical code into browsers, using both native OCaml code and Facebook Reason. It additionally requires the use of dune. As you will see, this will make the compilation to JavaScript effortless.

### Use Native OCaml

We rely the tool js_of_ocaml to convert native OCaml code into JavaScript. Js_of_ocaml is a compiler from OCaml bytecode programs to JavaScript. The process can thus be divided into two phases: first, compile the OCaml source code into bytecode executables, and then apply the js_of_ocaml command to it. It support the core Bigarray module among most of the OCaml standard libraries. However, since the Sys module is not fully supported, we are careful to not use functions from this module in the base library.

We have described how Algorithm Differentiation plays a core role in the ecosystem of Owl, so now we use an example of AD to demonstrate how we convert a numerical programme into JavaScript code and then get executed.

The example comes from the Optimisation chapter, and is about optimise the mathematical function sin. The first step is writing down our application in OCaml as follows, then save it into a file demo.ml.


module AlgodiffD = Owl_algodiff_generic.Make (Owl_base_algodiff_primal_ops.D)
open AlgodiffD

let rec desc ?(eta=F 0.01) ?(eps=1e-6) f x =
let g = (diff f) x in
if (unpack_flt g) < eps then x
else desc ~eta ~eps f Maths.(x - eta * g)

let _ =
let f = Maths.sin in
let y = desc f (F 0.1) in
Owl_log.info "argmin f(x) = %g" (unpack_flt y)


The code is very simple: the desc defines a gradient descent algorithm, then we use desc to calculate the minimum value of Maths.sin function. In the end, we print out the result using Owl_log module’s info function. Note that we pass in the base Ndarray module to the AD functor to create an corresponding AD module.

In the second step, we need to create a dune file as follows. This file will instruct how the OCaml code will be first compiled into bytecode then converted into JavaScript by calling js_of_ocaml.

(executable
(name demo)
(modes byte js)
(libraries owl-base))

With these two files in the same folder, you can then simply run the following command in the terminal.

dune build demo.bc && js_of_ocaml _build/default/demo.bc

Or even better, since js_of_ocaml is natively supported by dune, so we can simply execute:

dune build

The command builds the application and generates a demo.bc.js in the _build/default/ folder. Finally, we can run the JavaScript using Node.js (or loading into a browser using an appropriate html page).

node _build/default/demo.bc.js

As a result, you should be able to see the output result shows a value that minimise the sin function, and should be similar to:

  2019-12-30 18:05:49.760 INFO : argmin f(x) = -1.5708

Even though we present a simple examples, you should keep in mind that the base library can be used to produce more complex and interactive browser applications.

Facebook Reason leverages OCaml as an backend to provide type safe JavaScript. It is gaining its momentum and becoming a popular choice of developing web applications. It actually uses another tool, BuckleScript, to convert the Reason/OCaml code to JavaScript. Since Reason is basically a syntax on top of OCaml, it is very straightforward to use Owl in Reason to develop advanced numerical applications.

In this example, we use reason code to manipulate multi-dimensional arrays, the core data structure in Owl. First, we save the following code into a reason file called demo.re. Note the the suffix is .re now. It includes several basic math and Ndarray operations in Owl.


open! Owl_base;

/* calculate math functions */
let x = Owl_base_maths.sin(5.);
Owl_log.info("Result is %f", x);

/* create random ndarray then print */
let y = Owl_base_dense_ndarray.D.uniform([|3,4,5|]);
Owl_base_dense_ndarray.D.set(y,[|1,1,1|],1.);
Owl_base_dense_ndarray.D.print(y);

/* take a slice */
let z = Owl_base_dense_ndarray.D.get_slice([[],[],[0,3]],y);
Owl_base_dense_ndarray.D.print(z);

The code above is simple, just creates a random ndarray, takes a slice, then prints them out. Owl library can be seamlessly used in Reason. Next, instead of using Reason’s own translation of this frontend syntax with bucklescript, we still turns to js_of_ocaml for help. Let’s look at the dune file, which turns out to be the same as that in the previous example.

  (executable
(name demo)
(modes js)
(libraries owl-base))

As in the previous example, you can then compile and run the code with following commands.

  dune build
node _build/default/demo.bc.js

As you can see, except that the code is written in different languages, the rest of the steps are identical in both example thanks to js_of_ocaml and dune.

## Backend: MirageOS

### MirageOS and Unikernel

Besides JavaScript, another choice of backend we aim to support is the MirageOS. It one approach to build unikernel. A unikernel is a specialised, single address space machine image constructed with library operating systems. Unlike normal virtual machine, it only contains a minimal set of libraries required for one application. It can run directly on a hypervisor or hardware without relying gon operating systems such as Linux and Windows. The unikernls is thus concise and secure, and extremely efficient for distributed and executed on either cloud or edge devices.

MirageOS is one solution to building unikernels. It utilises the high-level languages OCaml and a runtime to provide API for operating system functionalities. In using MirageOS, the users can think of the Xen hypervisor as a stable hardware platform, without worrying about the hardware details such as devices. Furthermore, since the Xen hypervisor is widely used in platforms such as Amazon EC2 and Rackspace Cloud, MirageOS-built unikernel can readily deployed on these platforms. Besides, benefiting from its efficiency and security, MirageOS also aims to form a core piece of the Nymote/MISO tool stack to power the Internet of Things.

Since MirageOS is based around the OCaml language, we can safely integrate the Owl library with it. To demonstrate how we use MirageOS as backend, we again uses the previous Algorithm Differentiation based optimisation example. Before we start, please make sure to follow the installation instruction. Let’s look at the code:

module A = Owl_algodiff_generic.Make (Owl_algodiff_primal_ops.S)
open A

let rec desc ?(eta=F 0.01) ?(eps=1e-6) f x =
let g = (diff f) x in
if (unpack_flt (Maths.abs g)) < eps then x
else desc ~eta ~eps f Maths.(x - eta * g)

let main () =
let f x = Maths.(pow x (F 3.) - (F 2.) * pow x (F 2.) + (F 2.)) in
let init = Stats.uniform_rvs ~a:0. ~b:10. in
let y = desc f (F init) in
Owl_log.info "argmin f(x) = %g" (unpack_flt y)

These part of code is mostly the same as before. By applying the the diff function of the algorithmic differentiation module, we use the gradient descent method to find the value that minimises the function $$x^3 - 2x^2 + 2$$. Then we need to add something different:

module GD = struct
let start = main (); Lwt.return_unit
end

Here the start is an entry point to the unikernel. It performs the normal OCaml function main, and the return a Lwt thread that will be evaluated to unit.

Explain LWT briefly. A concurrent programming library. Explain why using LWT in Mirage.

All the code above is written to a file called gd_owl.ml. To build a unikernel, next we need to define its configuration. In the same directory, we create a file called configure.ml:

open Mirage

let main =
foreign
~packages:[package "owl"]
"Gd_owl.GD" job

let () =
register "gd_owl" [main]

It’s not complex. First we need to open the Mirage module. Then we declare a value main (or you can name it any other name). It calls the foreign function to specify the configuration. First, in the package parameter, we declare that this unikernel requires Owl library. The next string parameter “Gd_owl.GD” specifies the name of the implementation file, and in that file the module GD that contains the start entry point. The third paramter job declares the type of devices required by a unikernel, such as network interfaces, network stacks, file systems, etc. Since here we only do the calculation, there is no extra device required, so the third parameter is a job. Finally, we register the unikernel entry file gd_owl with the main configuration value.

That’s all it takes for coding. Now we can take a look at the compiling part. MirageOS itself supports multiple backends. The crucial choice therefore is to decide which one to use at the beginning by using mirage configure. In the directory that holds the previous two files, you run mirage configure -t unix, and it configures to build the unikernel into a Unix ELF binary that can be directly executed. Or you can use mirage configure -t xen, then the resulting unikernel will use hypervisor backend like Xen or KVM. Either way, the unikernel runs as a virtual machine after starting up. In this example we choose to use Unix as backends. So we run:

mirage configure -t unix

This command generates a Makefile based on the configuration information. It includes all the building rules. Next, to make sure all the dependencies are installed, we need to run:

make depend

Finally, we can build the unikernels by simply running:

make

and it calls the mirage build command. As a result, now your current directory contains the _build/gd_owl.native executable, which is the unikernel we want. Executing it yields a similar result as before:

INFO : argmin f(x) = 1.33333

### Example: Neural Network

As a more complex example we have also built a simple neural network to perform the MNIST handwritten digits recognition task:

module N  = Owl_base_algodiff_primal_ops.S
module NN = Owl_neural_generic.Make (N)
open NN
open NN.Graph
open NN.Algodiff

let make_network input_shape =
input input_shape
|> lambda (fun x -> Maths.(x / F 256.))
|> fully_connected 25 ~act_typ:Activation.Relu
|> linear 10 ~act_typ:Activation.(Softmax 1)
|> get_network

This neural network has two hidden layer, has a small weight size (146KB), and works well in testing (92% accuracy). We can right the the weight into a text file.

This file is named simple_mnist.ml, and similar to previous example, we can add a unikernel entry point function in the file:

module Main = struct
let start = infer (); Lwt.return_unit
end

Here the infer function creates a neural network, loads the weight, and then performs inference on an input image. We also need an configuration file. Again, it’s mostly the same:

open Mirage

let main =
foreign
~packages:[package "owl-base"]
"Simple_mnist.Main" job

let () =
register "Simple_mnist" [main]

All the code are included in this gist. Once compiled to MirageOS unikernel with unix backends, the generated binary is 10MB. You can also try compiling this application to JavaScript.

By these examples we show that the Owl library can be readily deployed into unikernels via MirageOS. The numerical functionalities can then greatly enhance the express ability of possible OCaml-MirageOS applications. Of course, we cannot cover all the important topics about MirageOS, please refer to the documentation of MirageOS abd Xen Hypervisor for more information.

## Evaluation

In the evaluation section we mainly compare the performance of different backends we use. Specifically, we observe three representative groups of operations: (1) map and fold operations on ndarray; (2) using gradient descent, a common numerical computing subroutine, to get $$argmin$$ of a certain function; (3) conducting inference on complex DNNs, including SqueezeNet and a VGG-like convolution network. The evaluations are conducted on a ThinkPad T460S laptop with Ubuntu 16.04 operating system. It has an Intel Core i5-6200U CPU and 12GB RAM.

The OCaml compiler can produce two kinds of executables: bytecode and native. Native executables are compiled specifically for an architecture and are generally faster, while bytecode executables have the advantage of being portable.

For JavaScript, we use the js_of_ocaml approach as described in the previous sections. Note that for convenience we refer to the pure implementation of OCaml and the mix implementation of OCaml and C as base-lib and owl-lib separately, but they are in fact all included in the Owl library. For Mirage compilation, we use both libraries.

fig. 2(a-b) show the performance of map and fold operations on ndarray. We use simple functions such as plus and multiplication on 1-d (size $$< 1,000$$) and 2-d arrays. The log-log relationship between total size of ndarray and the time each operation takes keeps linear. For both operations, owl-lib is faster than base-lib, and native executables outperform bytecode ones. The performance of Mirage executives is close to that of native code. Generally JavaScript runs the slowest, but note how the performance gap between JavaScript and the others converges when the ndarray size grows. For fold operation, JavaScript even runs faster than bytecode when size is sufficiently large.

In fig. 3, we want to investigate if the above observations still hold in more complex numerical computation. We choose to use a Gradient Descent algorithm to find the value that locally minimise a function. We choose the initial value randomly between $$[0, 10]$$. For both $$sin(x)$$ and $$x^3 -2x^2 + 2$$, we can see that JavaScript runs the slowest, but this time the base-lib slightly outperforms owl-lib.

We further compare the performance of DNN, which requires large amount of computation. We compare SqueezeNet and a VGG-like convolution network. They have different sizes of weight and networks structure complexities.

Table 1: Inference Speed of Deep Neural Networks
Time (ms) VGG SqueezeNet
owl-native 7.96 ($$\pm$$ 0.93) 196.26($$\pm$$ 1.12)
owl-byte 9.87 ($$\pm$$ 0.74) 218.99($$\pm$$ 9.05)
base-native 792.56($$\pm$$ 19.95) 14470.97 ($$\pm$$ 368.03)
base-byte 2783.33($$\pm$$ 76.08) 50294.93 ($$\pm$$ 1315.28)
mirage-owl 8.09($$\pm$$ 0.08) 190.26($$\pm$$ 0.89)
mirage-base 743.18 ($$\pm$$ 13.29) 13478.53 ($$\pm$$ 13.29)
JavaScript 4325.50($$\pm$$ 447.22) 65545.75 ($$\pm$$ 629.10)

tbl. 1 shows that, though the performance difference between owl-lib and base-lib is not obvious, the former is much better. So is the difference between native and bytecode for base-lib. JavaScript is still the slowest. The core computation required for DNN inference is the convolution operation. Its implementation efficiency is the key to these differences. Current we are working on improving its implementation in base-lib.

We have also conducted the same evaluation experiments on RaspberryPi 3 Model B. fig. 2(c) shows the performance of fold operation on ndarray. Besides the fact that all backends runs about one order of magnitude slower than that on the laptop, previous observations still hold. This figure also implies that, on resource-limited devices such as RaspberryPi, the key difference is between native code and bytecode, instead of owl-lib and base-lib for this operation.

Table 2: Size of executables generated by backends
Size (KB) native bytecode Mirage JavaScript
base 2,437 4,298 4,602 739
native 14,875 13,102 16,987 -

Finally, we also briefly compare the size of executables generated by different backends. We take the SqueezeNet for example, and the results are shown in tbl. 2. It can be seen that owl-lib executives have larger size compared to base-lib ones, and JavaScript code has the smallest file size. There does not exist a dominant method of deployment for all these backends. It is thus imperative to choose suitable backend according to deployment environment.