RTypes
Table of Contents
I few weeks ago I had an interesting chat with my brother about "Run Types". He's a die-hard Typescript zealot1 and a "frontend" developer while I dwell in the realm of "backend" services that have absolutely no relation to Web. Maybe that's why often times we have nearly opposite opinions on and vastly different experiences with the same things. We were talking about data serialisation/deserialisation and he said "you know what runtypes are?" and went on with the topic. My first reaction was "What? There are no types at run time! All the type information disappears after compilation.2"
The goal (one of) of a type system is to ensure that operations are applied to appropriate objects, e.g. integer division expects two integers as operands.
Dynamically typed languages use tagged values to solve the problem. For
instance, Erlang uses first two bits of a machine word to differentiate between
objects on heap (boxed values), lists, and immediates which use next two bits
to further differentiate between small integers, ports, pids, etc. It's
important to understand here that those bits are tags, not types. In other
words, if we have a user type foo
we can't reconstruct it using the
tags. Erlang does have types in the form of type specifications but they are
not used by the compiler.
In C or C++ objects are essentially chunks of memory which (in general case) do
not carry any information about their type. It's the job of a compiler to make
sure that a function that expects an integer is given an integer value, or that
method Blah()
is applied to the object which is an instance of class
Foo
. So the generated code has no tags and the notion of a type exist only
at compile time.
Functional languages with static type systems such as Haskell or OCaml are a bit of both. They do reserve the first bit (or two) to distinguish between boxed and unboxed values, and the values also store information (as a number) about the constructor they were created with. However, in general, the values at run-time do not store any information about their originator type.
Long story short, to my ear the phrase "runtime types" is an oxymoron.
The Use Case
I write Elixir/Erlang code as my day to day job. One of the recent projects was to design a service which consisted of "edge units" - stateless, data collecting and processing applications - and a central "controller" application which served as a control plane and the reliable configuration store for the edge units. The design has many benefits but there is a drawback: complex data structures representing a unit's state have to be serialised and sent over the network. Because the edge unit and the controller application can evolve independently (and in fact, we had different people working on them simultaneously) we spent quite a number of man-hours debugging issues resulting from some minor discrepancies in that complex, deeply nested data structure.
Had we used a statically typed language we could have relied on the compiler to ensure the correctness of that data structure. Alas we hadn't and we struggled, suffering in the process. At some point a thought came to me, "I wish we had something that given the type specification could automatically generate a function which we could use to validate the structure at run time!" That was when I recalled the "runtypes" discussion and got that "aha!" moment.
The Runtypes Typescript library uses a DSL of sort to generate validation functions. It also automatically derives the type for the compiler. But we could do better! We can use Elixir macros to generate the AST for an existing type specification, recursively expand the AST to eliminate all the user types and write checks for the basic, built-in data types. In a sense, it solves the expression problem that the aforementioned library suffers from.
While still feeling the pain having motivation I spent a few evenings on
something that resulted in the library. I called it rtypes
, partly
acknowledging the Typescript library, and partly enjoying the controversy it
implies.
Implementation Details
The idea is simple: for any basic or built-in type we are able to define a function which checks the given value at run-time. For instance, for the range type
@type t :: 0..255
we can use the function
def is_t(x) when is_integer(x) and x >= 0 and x <= 255
which ensures that the value of x
indeed belongs to the type t
.
Erlang has a finite, relatively small number of basic data types and only three kinds of compound data types - lists, tuples, and maps3. For any of the compound data types we can recursively check their elements. For example, suppose we have the following type definitions
@type list_of_ts :: [t] @type complex_state(a, b) :: %{ key1: {a, b}, key2: list_of_ts() | :none }
To check if a term corresponds to complex_state
we should
- check that it is a map
- check that it has keys
:key1
and:key2
- check that value corresponding to
:key1
is a tuple of two elements - check that value corresponding to
:key2
is a list - recursively apply the algorithm for the first and the second elements
of the tuple referred to by
:key1
using concrete typesa
andb
- recursively apply the algorithm for each element of the list under
:key2
For a union type (aka sum) type we replace the implicit logical "AND" in the steps above with "OR". For instance to check if a value corresponds to the type
@type result(a, b) :: {:ok, a} | {:error, b}
we should
- check if the value is a tuple of two elements and the first element is
atom
:ok
- OR check if the value is a tuple of two elements and the first element
is atom
:error
- recursively check the second element of the tuple for the
corresponding concrete type
a
orb
I'd like to stress that the type variables a
and b
must be instantiated to
some concrete type because at run type all the values can only be of some
concrete type.
Usage
The validation function essentially interprets the fully expanded AST. Using
it is straightforward with either derive/1
macro or derive/3
function:
iex> require RTypes iex> is_port_number = RTypes.derive(:inet.port_number()) iex> is_port_number.(8080) true iex> is_port_number.(:blah) ** (RuntimeError) term :blah does not conform to type {:type, 102, :range, [{:integer, 102, 0}, {:integer, 102, 65535}]} (rtypes) lib/rtypes/checker.ex:240: RTypes.Checker.check/3
## validate type Keyword.t(pos_integer()) iex> is_keyword = RTypes.derive(Keyword, :t, [{:type, 0, :pos_integer, []}]) iex> is_keyword.([a: 1, b: 2]) true iex> is_keyword.([a: 1, b: -2]) ** (RuntimeError) term -2 in the context [[a: 1, b: -2]] does not conform to type {:type, 0, :pos_integer, []} (rtypes) lib/rtypes/checker.ex:244: RTypes.Checker.check/3 (elixir) lib/enum.ex:2886: Enum.all_list/2
Things To Do
While the library can already be used in its current state and I used it in my projects in semi-production setting, there are things that can be improved.
Recursive Types
The library can't handle recursive types. Given a type specification
@type my_list(a) :: nil | {a, my_list(a)}
the library's derive
function will stuck in the infinite loop trying to
expand my_list
definition. One solution to the problem is to use lazy
expansion, i.e. try to expand a type's AST only when it is needed. It works
but slow to the point of not being of much practical use. The current working
idea is to keep the set of all the user types during the expansion phase to
detect cycles and use lazy checks to workaround the problem during
validation. I believe it will work reasonably well and hope to get back to it
at some point.
Error Messages
Currently the library spits out error messages that contain fully expanded ASTs
representing a type. A nicer way to handle it would be to keep a link to the
user defined type and instead of saying, "the term :blah
does not conform to
type {:type, 0, :list, [{:type, 0, :pos_integer, []]}]
in the context []
"
produce a more user-friendly message, "the term :blah
does not conform to
type list(foo())
."
Data Generators
For unit tests it might be more useful to reverse the idea and generate values that correspond to some specific type. While we can generate values using libraries like propcheck, by automatically deriving the generators we can make the process more enjoyable.
Conclusion
When the idea materialised in my head, the first thing I did was to search
through hex.pm. The closest I found was a library called optimal. It does solve
the same problem and is more featureful but can't derive validators
automatically. I am reasonably sure that a more thorough search would have
revealed a library that does precisely what rtypes
does and more. However, as
most software engineers I probably have a severe mild case of NIH4
syndrome so I wasn't that deliberate in my endeavor. I would be happy to learn
about one, though.
The source code is available on GitHub. PRs and bug reports are more than welcome.