pl-rants

Rants about programming languages

May 25, 2019

RTypes

Table of Contents

I few weeks ago I had an interesting chat with my brother about "Run Types". He's a die-hard Typescript zealot1 and a "frontend" developer while I dwell in the realm of "backend" services that have absolutely no relation to Web. Maybe that's why often times we have nearly opposite opinions on and vastly different experiences with the same things. We were talking about data serialisation/deserialisation and he said "you know what runtypes are?" and went on with the topic. My first reaction was "What? There are no types at run time! All the type information disappears after compilation.2"

The goal (one of) of a type system is to ensure that operations are applied to appropriate objects, e.g. integer division expects two integers as operands.

Dynamically typed languages use tagged values to solve the problem. For instance, Erlang uses first two bits of a machine word to differentiate between objects on heap (boxed values), lists, and immediates which use next two bits to further differentiate between small integers, ports, pids, etc. It's important to understand here that those bits are tags, not types. In other words, if we have a user type foo we can't reconstruct it using the tags. Erlang does have types in the form of type specifications but they are not used by the compiler.

In C or C++ objects are essentially chunks of memory which (in general case) do not carry any information about their type. It's the job of a compiler to make sure that a function that expects an integer is given an integer value, or that method Blah() is applied to the object which is an instance of class Foo. So the generated code has no tags and the notion of a type exist only at compile time.

Functional languages with static type systems such as Haskell or OCaml are a bit of both. They do reserve the first bit (or two) to distinguish between boxed and unboxed values, and the values also store information (as a number) about the constructor they were created with. However, in general, the values at run-time do not store any information about their originator type.

Long story short, to my ear the phrase "runtime types" is an oxymoron.

The Use Case

I write Elixir/Erlang code as my day to day job. One of the recent projects was to design a service which consisted of "edge units" - stateless, data collecting and processing applications - and a central "controller" application which served as a control plane and the reliable configuration store for the edge units. The design has many benefits but there is a drawback: complex data structures representing a unit's state have to be serialised and sent over the network. Because the edge unit and the controller application can evolve independently (and in fact, we had different people working on them simultaneously) we spent quite a number of man-hours debugging issues resulting from some minor discrepancies in that complex, deeply nested data structure.

Had we used a statically typed language we could have relied on the compiler to ensure the correctness of that data structure. Alas we hadn't and we struggled, suffering in the process. At some point a thought came to me, "I wish we had something that given the type specification could automatically generate a function which we could use to validate the structure at run time!" That was when I recalled the "runtypes" discussion and got that "aha!" moment.

The Runtypes Typescript library uses a DSL of sort to generate validation functions. It also automatically derives the type for the compiler. But we could do better! We can use Elixir macros to generate the AST for an existing type specification, recursively expand the AST to eliminate all the user types and write checks for the basic, built-in data types. In a sense, it solves the expression problem that the aforementioned library suffers from.

While still feeling the pain having motivation I spent a few evenings on something that resulted in the library. I called it rtypes, partly acknowledging the Typescript library, and partly enjoying the controversy it implies.

Implementation Details

The idea is simple: for any basic or built-in type we are able to define a function which checks the given value at run-time. For instance, for the range type

@type t :: 0..255

we can use the function

def is_t(x) when is_integer(x) and x >= 0 and x <= 255

which ensures that the value of x indeed belongs to the type t.

Erlang has a finite, relatively small number of basic data types and only three kinds of compound data types - lists, tuples, and maps3. For any of the compound data types we can recursively check their elements. For example, suppose we have the following type definitions

@type list_of_ts :: [t]
@type complex_state(a, b) :: %{
   key1: {a, b},
   key2: list_of_ts() | :none
}

To check if a term corresponds to complex_state we should

  • check that it is a map
  • check that it has keys :key1 and :key2
  • check that value corresponding to :key1 is a tuple of two elements
  • check that value corresponding to :key2 is a list
  • recursively apply the algorithm for the first and the second elements of the tuple referred to by :key1 using concrete types a and b
  • recursively apply the algorithm for each element of the list under :key2

For a union type (aka sum) type we replace the implicit logical "AND" in the steps above with "OR". For instance to check if a value corresponds to the type

@type result(a, b) :: {:ok, a} | {:error, b}

we should

  • check if the value is a tuple of two elements and the first element is atom :ok
  • OR check if the value is a tuple of two elements and the first element is atom :error
  • recursively check the second element of the tuple for the corresponding concrete type a or b

I'd like to stress that the type variables a and b must be instantiated to some concrete type because at run type all the values can only be of some concrete type.

Usage

The validation function essentially interprets the fully expanded AST. Using it is straightforward with either derive/1 macro or derive/3 function:

iex> require RTypes
iex> is_port_number = RTypes.derive(:inet.port_number())
iex> is_port_number.(8080)
true
iex> is_port_number.(:blah)
** (RuntimeError) term :blah does not conform to type
  {:type, 102, :range, [{:integer, 102, 0}, {:integer, 102, 65535}]}
    (rtypes) lib/rtypes/checker.ex:240: RTypes.Checker.check/3
## validate type Keyword.t(pos_integer())
iex> is_keyword = RTypes.derive(Keyword, :t, [{:type, 0, :pos_integer, []}])
iex> is_keyword.([a: 1, b: 2])
true
iex> is_keyword.([a: 1, b: -2])
** (RuntimeError) term -2 in the context [[a: 1, b: -2]]
   does not conform to type {:type, 0, :pos_integer, []}
    (rtypes) lib/rtypes/checker.ex:244: RTypes.Checker.check/3
    (elixir) lib/enum.ex:2886: Enum.all_list/2

Things To Do

While the library can already be used in its current state and I used it in my projects in semi-production setting, there are things that can be improved.

Recursive Types

The library can't handle recursive types. Given a type specification

@type my_list(a) :: nil | {a, my_list(a)}

the library's derive function will stuck in the infinite loop trying to expand my_list definition. One solution to the problem is to use lazy expansion, i.e. try to expand a type's AST only when it is needed. It works but slow to the point of not being of much practical use. The current working idea is to keep the set of all the user types during the expansion phase to detect cycles and use lazy checks to workaround the problem during validation. I believe it will work reasonably well and hope to get back to it at some point.

Error Messages

Currently the library spits out error messages that contain fully expanded ASTs representing a type. A nicer way to handle it would be to keep a link to the user defined type and instead of saying, "the term :blah does not conform to type {:type, 0, :list, [{:type, 0, :pos_integer, []]}] in the context []" produce a more user-friendly message, "the term :blah does not conform to type list(foo())."

Data Generators

For unit tests it might be more useful to reverse the idea and generate values that correspond to some specific type. While we can generate values using libraries like propcheck, by automatically deriving the generators we can make the process more enjoyable.

Conclusion

When the idea materialised in my head, the first thing I did was to search through hex.pm. The closest I found was a library called optimal. It does solve the same problem and is more featureful but can't derive validators automatically. I am reasonably sure that a more thorough search would have revealed a library that does precisely what rtypes does and more. However, as most software engineers I probably have a severe mild case of NIH4 syndrome so I wasn't that deliberate in my endeavor. I would be happy to learn about one, though.

The source code is available on GitHub. PRs and bug reports are more than welcome.

Footnotes:

1

No offence implied. He is an excellent software engineer.

2

Unless you want to do something like reflection in Java or type-switch in Go.

3

Erlang arrays and Elixir structs are just syntactic sugar over tuples and maps respectively.

4

Not Invented Here.