26 November 2016

Four reasons why HelloData is written in Clojure

The HelloData platform aims to connect smart meters, apps and consumers with one another, while still ensuring consumers retain ownership and control of their data. Technically speaking, this creates challenges when it comes to data streams (scalability), security (access rights; protection against malicious users), and the interface between the source (e.g. the smart meter) and services (the apps that use the data).

Because the first version of HelloData, written in Ruby on Rails, had performance issues, we decided to (re)build the platform in Clojure.

But why Clojure? In this blog post I explain four reasons behind this decision:

REASON 1. Clojure is a dialect of Lisp

I first came across Lisp at the outset of my career as a software developer in the article Beating the Averages by Paul Graham (2001). In it, Graham describes how his startup (Viaweb, later Yahoo Stores) became successful thanks to a "secret weapon".

This secret weapon was Lisp: a family of programming languages that has its origins back in 1958 and is characterized by extensive use of open and close parentheses.

xkcd I've just received word that the Emperor has dissolved the MIT computer science program permanently.

Graham’s article convinced me that Lisp is the most powerful model of computation in existence. And that by seeing things through my Java specs (Java is the world’s most widely-used programming language, and the one with which I, too, started out) I had been suffering from the ‘Blub Paradox’: as a programmer I always thought in the programming language Blub, and so could only understand languages that were less powerful than Blub. And my Blub-induced mindset had therefore led me to dismiss any languages more powerful than Blub, such as Lisp, as gibberish.

And so began my quest to learn Lisp

I filled my evenings and weekends with mind-blowing exercises that rejigged my mind. Initially, I had to get used to the parentheses. But then... I got it!^¹

The power of Lisp lies in parentheses

The fundamental data structure in Lisp (short for: LISt Processor) is the list. Not only does the data that the Lisp programmer uses consist of lists enclosed by parentheses; so, too, does the code itself. This means that code can be treated as data, and data as code. This is called homoiconicity.

Homoiconicity makes it possible to create transformations out of source code, to write programs that create programs (in Lisp these programs are known as macros), and to introduce new syntax. This is what makes Lisp so powerful, and why Paul Graham described it as the "secret weapon".

In the context of HelloData, we make use of Lisp’s power in two ways:

Within Clojure, we’re trying to write our ‘own language’ that describes our domain. The flexibility of the abstractions means that we are scarcely, if at all, hampered by the limitations of a programming language.
This same flexibility means it’s possible to integrate features of other languages into Lisp, so they are available as libraries (code shared by others, so that a developer doesn’t having to write them themselves). An example being Go routines from the Go programming language, which can be introduced into Clojure via the core.async library. The difference with Go itself is that, in Clojure, such language features are not a fundamental property of the language.

So is everything with Lisp always just beer and skittles?

No.

Precisely because Lisp is so powerful and flexible, we also have the phenomenon sometimes refered to as The Curse of Lisp (for more on this, see Rudolf Winestock’s essay The Lisp Curse): Lisp’s being so easy to extend means there are also many language extensions. Resulting in many different dialects and libraries that are only partially and poorly documented, and really only suitable for the specific domain of their creator.

Clojure is a Lisp dialect that is less affected by the Curse of Lisp, as in Clojure the consistency of the core libraries and the standard means of doing things are guarded by its author, Rich Hickey, and his company Cognitect.

What’s more, Clojure has access to the Java Virtual Machine (JVM) and, in the form of ClojureScript, to JavaScript, on which many standard libraries are available. Which neatly brings us to the second reason for opting for Clojure.

REASON 2. Clojure runs on the JVM and JavaScript runtime

The JVM is a platform-independent environment (the JVM runs on Linux, Windows, macOS, etc.) for executing Java bytecode. The JVM has been around since the 90s, since when it has been optimized by many different people.

On the JVM you have access to a vast number of libraries. From web servers to user interfaces — you name it and there’s a library for it. Clojure is converted into Java bytecode, and thus benefits from the stability, popularity, speed and safety of the JVM.

All libraries on the JVM are simple to use from Clojure. For example, in our registration flow we wanted to use Macaroons: exchange keys described in this paper by Google. We don’t have to implement Macaroons ourselves, because a Java library already existed which —aside from one spelling mistake — we could easily use.

Clojurescript does the same with Javascript that Clojure does on the JVM. JavaScript is the world’s most widely-used programming language and runs on pretty well all browsers.

What’s in a name?

Though their names may be very similar, Java and JavaScript have about as much in common as a car and a carp. Clojure and ClojureScript, on the other hand, are the same language. This means that, as long as you don’t use libraries specifically intended for the host, you can reuse Clojure code from the backend on the frontend. Letting you write an entire application in Clojure and, for example, use the validation code in both the frontend and backend.

Best thing about ClojureScript? The unique development experience.

Normally speaking, the development of websites involves the following two steps:

Adjust code.
Refresh the browser to see the changes.

Step 1 is unavoidable, but Step 2 is tedious and slows things down. The developer has to perform an extra act and the website loses its state. If, for example, there’s been a few clicks and fields have been filled in on the website for testing, following a browser refresh the developer is back to square one.

The Figwheel library offers a solution that makes this continual refreshing unnecessary, as it ensures that the code that has been modified is sent directly to the browser. If you have the browser displayed on the left of your screen and the code you are adapting on the right, when you make a code adjustment on the right, you immediately see on the left its effect on the website. In this video, Figwheel’s creator gives a demo:

This perfectly describes our development experience at HelloData and is one of the reasons why the development team always looks so cheerful.

Although Clojure runs on the JVM, the code written does not look like Java - the most popular programming language on the JVM - at all. This is partly due to the parentheses, but mainly because of Clojure’s data orientation. What this is you’ll read in the third reason.

REASON 3. Clojure is data oriented

In Clojure much less code is needed to say the same things as in Java. Code that is written is only about the problem that it’s solving. Syntax, design patterns, and other ceremonies that are not about the problem, disappear.

This is because Clojure treats data as data.

Treating data as data – what does that mean? Programmers work with data. In many other programming languages, like for example Java, data is not treated as data. Before it is used the data is first transformed to a, by the programmer created specific data structure (for example a class), only used for that specific use case, that is difficult to extend, and makes it difficult to create another implementation.

In Clojure the programmer often just uses Clojure datastructures (lists; vectors; hash-maps; sets).

In practice this means that the programmers makes transformations on Clojure datastructures with all the free functions that are available.

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures. –- Alan Perlis (Epigrams in Programming)

To drive the point home I can advise you to watch a few minutes of the Clojure made Simple video below, from 51 minutes and 15 seconds onwards. This video contains a rant of Rich Hickey, the Clojure creator, where he compares a specific Java-datastructure with the same representation in Clojure.

Clojure’s data-orientation (treatment of data as data) is – besides the level of abstraction that can be reached – the reason that Clojure code bases are relatively small. Smaller programs have fewer bugs. And almost all code is about the problem that is solved. Not about something else. This makes Clojure code simple and robust.

So, its flexibility, the host platforms and the treatment of data as data, are three reasons to use Clojure. Another important point is that Clojure is made for parallel and concurrent programming.

REASON 4. Clojure embraces concurrent programming

According to Moore’s Law, the performance of processors doubles every 18 months. So taking account of physical limitations, Moore’s Law becomes this:

Moore's law Source: IEEE Computer Society

In the graph above, we see how the limit on processor performance is reached. Right now, technologies are being developed to make computers even faster, namely:

1. More processors on a chip (multi-core processors, see bottom line of graph); 2. Utilizing multiple computers, for example in the cloud (distributed programming).

The future of programming is thus multi-core and distributed².

To make the most of multi-core processors and distributed machines, software developers should use concurrent programming (writing code that does things in parallel). After all, when source code is sequential, you can’t run it on multiple computers or processors simultaneously.

The problem with concurrent programming...?

Quite simply, it’s difficult to do in most programming languages.

Concurrent programming is tricky due to so-called ‘race conditions’. Race conditions occur when two processes simultaneously want to read or write shared data and the sequence or timing of this is uncontrollable. They are the cause of many bugs and often difficult to trace.

Race conditions manifest themselves as occasionally an erroneous situation.

One way to avoid race conditions is to identify critical sections at points in the source code that may only be used by one process at a time. But programming these critical sections itself has its problems. Take, for example, deadlocks, where two processes both wait for each other and the software freezes. Or starvation, where a process never gets its turn and is simply kept waiting. As well as these problems, code with critical sections cannot be executed in parallel.

The creator of Clojure says in his presentation, Clojure Concurrency, that we can learn two things from a famous Java book with techniques for concurrent programming:

1. Concurrent Programming in Java is incredibly difficult. 2. The immutability (unchangability) of values is repeatedly cited as the solution.

Race conditions are caused by changes: one process wants to write a value at precisely the moment that another process is reading it; or two processes want to write a value at the same time. But what if assigned values were immutable? Then there’d be no more racing.

Immutability of values makes critical sections redundant and race conditions impossible. So-called ‘functional programming languages’ are based on this principle. Just as with mathematical functions, the results of calculations in a functional programming language are always the same. In non-functional programming languages, you can’t be sure about outcomes and different processes or side effects can occur that change a value in the background. In functional programming languages, you can be certain (this is known as ‘referential transparency’); so it doesn’t matter in what order you execute code, which makes concurrency easy.

Clojure supplies tools to program functionally and immutable values are fundamentally built into the language. Once assigned, a value in Clojure is ‘persistent’. If you want to change something, you need to make copies of previous data structures with the changes added. In order to avoid all these data structures occupying too much space in the computer’s memory, Clojure uses a method called ‘structural sharing’. This ensures efficient, persistent data structures, as shown in this diagram:

Persistent datastructures Source: Functional Advantage with Clojure

In Clojure, it seems like the programmer is working with copies of data, but under the hood Clojure reuses everything that remains unchanged. So despite all the copies, Clojure still only uses minimal memory.

Sometimes you do need to change a value. After all, in the real world things change and in a pragmatic programming language you have to be able to work with variable values.

But happily, changes to variables are possible in Clojure, and so the language is not purely functional. Clojure provides constructs for change. There are mechanisms built into Clojure to change values which ensure that the system is always consistent. All without the need for the developer having to update the values manually, using error-prone critical sections.

Through this focus on functional programming, persistent data structures and language constructs, we can see how Clojure truly embraces concurrent programming.

AND THERE’S LOADS MORE REASONS TO LOVE CLOJURE

The origins in Lisp, the support of host platforms and the focus on concurrent programming are three of the reasons why HelloData is built in Clojure. And I’ve not even mentioned any of these others:

Clojure’s REPL, which allows the developer to communicate with their code
the many supporting programming paradigms - such as logic programming, where you can say: "this is happening, give me a solution".
practical polymorphism
integrated editors
flexibility for change
clojure.spec
the GPU (Graphical Processing Unit) tools
Adoption within the industry
the build tool
and the friendly Clojure community.

The future is parallel – and Clojure is ready for it!

There’s just one problem with Clojure. And only you can solve it.

Besides problems programmers with a different background have adapting to Clojure, there is the additional problem of actually finding good, experienced Clojure developers.

Are you an experienced Clojure developer? Do you want to contribute to a sustainable future? If so, we’d be delighted to welcome you to HelloData.

Simply get in touch with me, at erwin.rooijakkers@mpare.net, and I'll be happy to tell you more about HelloData.

This blog post originally appeared on the MPARE website. See https://www.mpare.net/en/blog-drie-redenen-waarom-hellodata-wordt-gebouwd-in-clojure/.

This moment of enlightenment is wonderfully illustrated in a lecture by Gerald Jay Sussman at the point when he finishes writing out Lisp code (fitted onto a single school blackboard), in which any programming language can be expressed. See https://youtu.be/0m6hoOelZH8?t=34m34s.↩
Extensive research is being done into quantum computers, with promising results. But to keep things to a reasonable length and not dive into areas where I have limited expertise,I’ve decided not to cover the quantum computer, and programming it, in this blog post.↩

« Simple but effective Quote of the week #2 »