pl-rants

Rants about programming languages

Dec 25, 2019

RTypes Data Generators

Table of Contents

This is a continuation of the previous post about RTypes library.

A few days ago I found myself having a few hours of free time on my hands and thought that it was a great chance to improve the library. I've been "dogfooding" the library in a few projects for a while now. First as a sanity checker during active development phase, then as a data validation layer. But since its inception I had an itch to close the loop by not only being able to automatically derive data validators, but also to derive data generators to be used with property-based testing frameworks1.

Here is a quick demo of the feature:

iex(1)> require RTypes.Generator, as: G
iex(2)> g = G.make(:inet.port_number(), G.StreamData)
iex(3)> Enum.take(g, 10)
 => [53289, 49615, 25526, 14765, 2391, 57424, 1399, 48755, 23668, 57176]
iex(4)> g2 = G.make(:inet.ip4_address(), G.StreamData)
iex(5)> Enum.take(g2, 10)
=> [
    {237, 57, 3, 204},
    {93, 132, 242, 86},
    {254, 226, 96, 62},
    {61, 141, 84, 51},
    {66, 182, 79, 220},
    {21, 168, 172, 155},
    {121, 240, 20, 76},
    {60, 188, 81, 14},
    {169, 112, 182, 234},
    {14, 5, 193, 161}
  ]

Two generators were constructed using RTypes.Generator.make macro by supplying a required type and then asked to generate ten random port numbers and IPv4 addresses. Isn't it cool?

How it works

The idea is to walk the AST that corresponds to the provided type expression and build a data generator using existing facilities from a property-based testing framework. The code looks very similar to "compile-to-closures" interpreter (see the the previous post for some context) with the only difference that instead of producing a chain of closures it produces a chain of data generators.

import StreamData

## prmitive types
def derive({:type, _line, :integer, _args}), do: integer()
# ...
## literals
def derive({:atom, _line, term}), do: constant(term)
## ranges
def derive({:type, _, :range, [{:integer, _, l}, {:integer, _, u}]}) do
  integer(l..u)
end
## compound types
def derive({:type, _line, :list, [typ]}) do
  list_of(derive(typ))
end

For the PropCheck backend the code looks even more alike the "compile-to-closures" implementation

import PropCheck, only: [let: 2]
import PropCheck.BasicTypes
# primitive types
def derive({:type, _line, :any, _args}), do: &any/0
def derive({:type, _line, :atom, _args}), do: &atom/0
# ...
# compound types
def derive({:type, _line, :maybe_improper_list, [typ1, typ2]}) do
  g1 = derive(typ1)
  g2 = derive(typ2)

  fn ->
    let {h, t} <- {g1.(), g2.()} do
      oneof([[], [h | t]])
    end
  end
end
# ...

because it produces a chain of functions (closures) each returning a proper generator. Compare it with the code for "compile-to-closures" code

def build({:type, _line, :atom, _args}), do: &is_atom/1
def build({:type, _line, :integer, _args}), do: &is_integer/1
# ...
def build({:type, _line, :maybe_improper_list, [typ1, typ2]}) do
  typ1? = build(typ1)
  typ2? = build(typ2)

  fn
    [] -> true
    [car | cdr] -> typ1?.(car) and typ2?.(cdr)
    _ -> false
  end
end

Note, the derive and build functions above accept an AST that corresponds to a type expression. The RTypes.make_* and RTypes.Generator.make macros do some magic to allow literal type expressions. For that magic to work, the type must be either a primitive or defined in a module that has a .beam file somewhere reachable.

How to use it

I see the feature primarily as a testing tool. The StreamData backend makes it handy to use in fuzz testing or similar applications which require an infinite stream of random data.

require RTypes.Generator, as: G

packet_gen = G.make(MyModule.packet(), G.StreamData)

G.make(:inet.port_number(), G.StreamData)
|> Stream.map(fn port_number ->
  {:ok, sock} = open_udp_socket(some_host, port_number)
  sock
end)
|> Enum.each(fn sock ->
  # generate 1000 random packets and send them to socket
  Stream.map(packet_gen, &send_udp_packet(sock, &1))
  |> Enum.take(1000)

  close_socket(sock)
end)

And here is an example how to use it with PropCheck, e.g. to test a function for totality. Let's suppose you have a function f defined in module M

defmodule M do
  @type f_input_type :: list()
  @type f_result_type :: pos_integer()

  @spec f(f_input_type()) :: f_result_type()
  def f(xs) do
    # ...
  end
end

Then if we claim that the function f should return a value that belongs to f_result_type for any possible input that belongs to f_input_type we say that the function f is total.

defmodule MTest do
  use PropCheck
  require RTypes
  require RTypes.Generator, as: G

  property "f is total" do
    input_generator = G.make(M.f_input_type(), G.PropCheck)
    result_value? = RTypes.make_predicate(M.f_result_type())

    forall input <- intput_generator.() do
      result_value?.(M.f(input))
    end
  end
end

Conversely, if we were to test a non-total function it should fail on some input. For instance, the hd/1 function from the standard library almost immediately fails on the empty list [].

defmodule MTest do
  use PropCheck
  require RTypes
  require RTypes.Generator, as: G

  property "hd is total" do
    gen = G.make(list(integer()), G.PropCheck)
    int_value? = RTypes.make_predicate(integer())

    forall val <- gen.() do
      int_value?.(hd(val))
    end
  end
end
1) property hd is total (RTypesPropCheckTest)
   test/rtypes_propcheck_test.exs:10
   Property Elixir.RTypesPropCheckTest.property hd is total() failed.
   Counter-Example is: [[]]

License issue

PropCheck library is released under GPL 3.0 license (as well as the PropEr library which it's based upon). It's totally fine to use PropCheck for testing in non-GPL projects because tests usually are not shipped with the final product. RTypes, however, is released under Apache 2.0 license because I wanted to use it in projects where managers would be like "yeah, nah…" as soon as they hear "GPL" acronym. And because RTypes is not only a testing library, but also can be used as a data-validation layer, I couldn't just use PropCheck in the release builds and keep it under Apache 2.0 license (or could I?). The solution I came up with was to introduce a plug-in system and release a separate rtypes_propcheck library under GPL 3.0. I believe it's a fine solution because it still allows to use RTypes as a run-time dependency and use rtypes_propcheck as test-only dependency.

Yet maintaining both libraries simultaneously is a bit of a pain, so If anyone reading this knows how can I combine both in one package, please reach me on Twitter @plrants or send an email to hello@pl-rants.net. Thanks!

Footnotes:

1

Cannot not to mention the excellent book "Property Based Testing with PropEr, Erlang, and Elixir" by Fred Hebert for a deep dive into the topic.