Compiling to Closures
Table of Contents
This is partly a follow-up on the previous post about RTypes library, partly some random thoughts on the topic in general.
It's been quite a while since I published a post, and it was not the case that I had nothing to share, far from that. The reason is, I have been spending almost every available minute of my off-work time on my side project. Alas, it isn't doing as well as I hoped it would. On the bright side, I finally got time to write about some interesting things related to PL topics.
TL;DR "compiled to closures" interpreter turned out to be 2x faster than the walk-the-tree interpreter.
So what is "compiling to closures"? To say the truth, up until recently I have been mentally cringing every time I heard someone put words "compile" and "closures" together. Normally, when we talk about compiling in the context of programming languages, we refer to the process of translating abstract syntax trees (AST) to some lower-level, simplified, machine-readable form. For example, we can compile C code to x86 machine code or Erlang to BEAM bytecode to be executed on the BEAM virtual machine. As opposed to that, interpreting, in its traditional sense, is the process of directly evaluating code while traversing the AST. Nowadays, when we have multi-stage JIT compilers, GraalVM and whatnots, it's not an easy task to draw a solid line between the two. Still, the distinction becomes more clear if we were to look at some code that illustrates the ideas.
Let's imagine we have a type1 that defines an AST (most examples are in Elixir)
@type ast :: list(expr) @typep expr :: {:funcall, funname, list(expr)} | {:binop, bopname, lhs::expr, rhs::expr} | {:unop, uopname, expr} | term @typep term :: {:integer, integer()} | {:float, float()} | {:bool, boolean()} | {:string, binary()}
The type says that the AST for some hypothetical language is a list of expressions, where each expression can be either a function call, a binary operator, a unary operator, or a single term, and the basic types allowed in that language are integers, floating point numbers, booleans, and strings. The AST is abstract, that is, it says nothing about the concrete language and can equally well represent code in different surface languages. To highlight the idea, consider the following code fragments in some language A and some other language B
// language A f(1+2, "hi there") * 512; print(5, "hello" + "world");
# language B (f (1 intadd 2) 'hi there') mult 512 print 5 ('hello'%'world')
They are likely to produce the same AST
[ {:binop, :*, {:funcall, "f", [{:binop, :+, {:integer, 1}, {:integer, 2}}, {:string, "hi there"}]}, {:integer, 512}}, {:funcall, "print", [{:integer, 5}, {:binop, :concat, {:string, "hello"}, {:string, "world"}}]} ]
The point is, ASTs represent code structure, ignoring particularities of a surface language. That's why we concentrate only on AST in the following sections.
Interpreter vs Compiler
To write an interpreter for the AST above, we can sketch a function eval
that
would recursively walk the tree, performing necessary operations as it proceeds
# the function takes AST and some Environment and # returns some other AST, and, possibly updated, Environment. @spec eval(expr(), env()) :: {expr(), env()} # unary operator evaluates the term and then applies the op def eval({:unary, op, term}, env) do {term1, env1} = eval(term, env) apply_unop(op, term1, env1) end # binary operator evaluates both sides and then applies the op def eval({:binop, op, lhs, rhs}, env) do {lhs1, env1} = eval(lhs, env) {rhs1, env2} = eval(rhs, env1) apply_binop(op, lhs1, rhs1, env1) end # function call is a bit more involved. It first fetches # the function from the environment, then evaluates all # the arguments, left-to-right and finally applies the # function to the evaluated arguments def eval({:funcall, fname, args}, env) do f = get_fun(env, fname) {args1, env1} = Enum.reduce(args, {[], env}, fn arg, {acc, env} -> {result, env1} = eval(arg, env) {[result | acc], env1} end) apply(f, Enum.reverse(args1), env1) end # def eval(...) # the driver function chains multiple # evaluations together and discarding all intermediate results @spec interpret(ast(), env()) :: {expr(), env()} def interpret(exprs, env) do Enum.reduce(exprs, {:none, env}, &eval/2) end
The function recursively evaluates tree nodes, left-to-right, depth first. In
other words, arguments to functions and operators are evaluated first. One
thing to note here is the type of eval
function. It takes an expression and
returns a modified expression, possibly updating the environment as it
proceeds.
Now let's outline a compiler which produces code for an over-simplified
stack-based VM. The VM supports only two instructions: push
and
call
. push
puts a value on top of the stack, while call
pops the required
number of values off the stack and puts the result of a function call back.
@spec compile(ast(), static_env()) :: [instruction] when instruction: {:push, val} | {:call, ref} def compile({:integer, val}, _static_env) do [{:push, {:integer, val}}] end def compile({:string, val}, _static_env) do [{:push, {:string, val}}] end # ... code for other primitive types def compile({:unop, op, term}, static_env) do op_ref = get_unop(static_env, op) # `code` should put the fully evaluated term # on top of the stack code = compile(term, static_env) code ++ [{:call, op_ref}] end def compile({:binop, op, lhs, rhs}, static_env) do op_ref = get_binop(static_env, op) code_lhs = compile(lhs, static_env) code_rhs = compile(rhs, static_env) code_lhs ++ code_rhs ++ [{:call, op_ref}] end def compile({:funcall, fname, args}, static_env) do fun_ref = get_fun(static_env, fname) code = Enum.flat_map(args, fn arg -> compile(arg, static_env) end) code ++ [{:call, fun_ref}] end
We can see that the compile
function also does depth-first, left-to-right
tree traversal, but the return value of the function is a list of
instructions. Also, the environment it is evaluated in, is static
(compile-time), while for the interpreter the environment is dynamic
(run-time).
Interpreters walk ASTs every time they are called, producing a simplified (reduced) AST as a result, while compilers walk the trees only once, producing some low-level code. The funny thing is, the resulting low-level code is in turn can be interpreted by a VM or compiled into some other representation. In real compilers the process can be multistage, combining the two processes at both, run-time and ahead-of-time.
Compiling to Closures
What is "compiling to closures" then? The idea is to walk the tree only once, as if we were to compile it, but instead of producing a list of instructions, generate a chain of suspended function calls. Let's demonstrate it on the same AST type.
## `cc` takes an expression and a static environment and returns ## a function that takes dynamic environment and returns a fully ## evaluated term and and possibly updated dynamic environment @spec cc(expr(), static_env()) :: (env() -> {term, env()}) def cc({:integer, val}, _static_env) do # primitive values are returned as is fn env -> {{:integer, val}, env} end end # def cc({:string, ...}) # other primitive types # unary operations require only one closure def cc({:unop, op, term}, static_env) do op_fun = get_unop(static_env, op) closure = cc(term, static_env) fn env -> {val, env1} = closure.(env) apply(op_fun, [val], env1) end end # binary ops require two closures def cc({:binop, op, lhs, rhs}, static_env) do op_fun = get_binop(static_env, op) closure_lhs = cc(lhs, static_env) closure_rhs = cc(rhs, static_env) fn env -> {lhs_val, env1} = closure_lhs.(env) {rhs_val, env2} = closure_lhs.(env) apply(op_fun, [lhs_val, rhs_val], env2) end end # function calls require a closure for each arg def cc({:funcall, fname, args}, static_env) do f = get_fun(static_env, fname) cls = Enum.reduce(args, [], fn arg, cls -> [cc(arg, static_env) | cls] end) |> Enum.reverse() fn env -> # values for function application must be evaluated at run-time {vals, env1} = Enum.reduce(cls, {[], env}, fn c, {vals, env} -> {val, env1} = c.(env) {[val | vals], env1} end) apply(f, Enum.reverse(vals), env1) end end
As we can see, the code is very similar to what we had in compile
function, where instead of producing instruction sequences we prepare
closures at compile-time and instead of concatenating sequences with
++
operator we return new closures. With the above in mind I now think
that the "compile" word is somewhat justified. Yes, we don't spit out a
lower-level code, but abstracting the "combine" operation, we get
exactly the same code. Compare, e.g. the binop
clause
def compile({:binop, op, lhs, rhs}, static_env) do op_ref = get_binop(static_env, op) code_lhs = compile(lhs, static_env) code_rhs = compile(rhs, static_env) code_lhs ++ code_rhs ++ [{:call, op_ref}] end def cc({:binop, op, lhs, rhs}, static_env) do op_fun = get_binop(static_env, op) closure_lhs = cc(lhs, static_env) closure_rhs = cc(rhs, static_env) fn env -> {lhs_val, env1} = closure_lhs.(env) {rhs_val, env2} = closure_rhs.(env1) apply(op_fun, [lhs_val, rhs_val], env2) end end
An interesting aspect here is that while cc
function is evaluated in a static
environment, the returned closure must be evaluated in run-time environment.
Why compile to closures?
The answer, I believe, is performance. Imagine that you need to pass through a large, complex maze. Interpreting AST can be thought of as making a decision at each intersection using a rule book, e.g. "if there are three exits and the previous intersection had at least two exits then we should take the one in the middle". The more rules you have in that rule book, the more time consuming the process becomes. And if you don't have perfect memory, the next time you pass it, you repeat that tedious process again. Now, compiling to closures can be thought of as if someone has already passed through that maze once and marked each exit using the rule book. You still need to pass through, but there is no need to do tedious calculations at each intersection, the path is known. The "rule book" in this analogy corresponds to a type that defines an AST and a maze is analogous to some concrete source code.
Closures in RTypes library
I had a chance to apply the above in practice. The first iteration of RTypes
library contained a simple walk-the-tree interpreter. The Extractor
module
extracted type specifications from a .beam
file and returned their AST, while
the Checker
module implemented the interpreter that walked the extracted AST
for a given term.
# primitive integer type def checker(term, {:type, _line, :integer, []}) do is_integer(term) end # integer range, e.g. 1..100 def check(term, {:type, _, :range, [{:integer, _, l}, {:integer, _, u}]}) do is_integer(term) and term >= l and term <= u end # list with elements of type `typ` defp check(term, {:type, _line, :list, [typ]}, ctx) when is_list(term) do Enum.reduce_while(term, true, fn elem, _ -> check(elem, typ) end) end # ... another 50 clauses handling primitive and compound types
It worked quite nicely as a sanity checker during active development phase (for the project that motivated it in the first place, that is). And the return on the time I invested into writing the library was tenfold because validating complex data structures became trivial. Some time later I decided to use it as a validator for JSON requests over HTTP (in another project). When you potentially have thousands requests per second, the performance does matter.
My first thought was to generate Core Erlang and compile it directly to beam
files, but I remembered the "compile to functions" discussion I had a few weeks
before and decided to give it a try. Modifying Checker
module turned out to
be a straightforward if not a routine task. The result was Lambda
module
which did generate the closures
def build({:type, _line, :integer, _args}) do fn term -> is_integer(term) end end def build({:type, _, :range, [{:integer, _, l}, {:integer, _, u}]}) do fn term -> is_integer(term) and term >= l and term <= u end end def build({:type, _line, :list, [typ]}) do typ? = build(typ) fn term -> is_list(term) && Enum.all?(term, typ?) end end ## ... another 50 clauses for primitive and compound types
I also created a simple benchmark to measure the performance gain on both a simple and a fairly complex structure, both containing reasonably large terms. To be honest, I wasn't completely sure which one would win. On one hand "closures" in theory should be faster. On the other hand, pattern matching in Erlang is prevailing and thus must be heavily optimised.
It turned out, in both cases the "compiled to closures" checker was roughly 2x faster:
Name | ips | average | deviation | median | 99th % |
---|---|---|---|---|---|
simple term, closures | 544.60 | 1.84 ms | ±4.32% | 1.81 ms | 2.22 ms |
simple term, interpreter | 272.75 | 3.67 ms | ±6.23% | 3.60 ms | 4.49 ms |
complex term, closures | 826.93 | 1.21 ms | ±9.68% | 1.20 ms | 1.61 ms |
complex term, interpreter | 393.51 | 2.54 ms | ±10.28% | 2.54 ms | 3.06 ms |
The library is available on hex.pm and on GitHub. Running the benchmark is as simple as
$ mix deps.get
$ MIX_ENV=bench mix run bench/run.exs
Conclusion
A walk-the-tree interpreter is often the simplest and the most straightforward solution for anything AST-related. When performance matters, however, compiling to closures may provide a quick and easy gain, which should be attempted before embarking on a journey to byte code compilation, virtual machines, or JIT.
Footnotes:
The AST does not allow variable binding, function definition and other things, to allow concise examples.