pl-rants

Rants about programming languages

Aug 31, 2018

Libraries used in Haskell vs. Go vs…

Table of Contents

The decision process for including a particular package in the experiment consisted of answering a few questions:

In addition to the criteria above I added another one for Haskell: a package should be on Stackage. That excluded some of the libraries that could have been a better fit but I wanted to reduce extra complexity as much as possible.

Let's dive in.

Command line parser

The task was to parse parameters of the following shape:

$ baz --group=Group1 --group=Group2 --group=GroupN \
      --db-name=mydb --db-host=example.com --db-port=1234 \
      --db-user=user --db-password=password \
      --input-dir=/temp/blah/foo

All the parameters except --group and --db-host were optional and --group could be repeated, resulting in a list of values.

Go

After evaluating some libraries and discarding the standard flag thing I chose go-flags because it allowed me to replicate Perl version's interface with the least amount of time/effort and seemed mature enough documentation and implementation-wise.

One should define a struct

import "github.com/jessevdk/go-flags"

type Options struct {
    Groups []string `long:"group" description:"Device group" required:"true"`
    InputDir string  `long:"input-dir" default:"/tmp"`
    DBHost string `long:"db-host" required:"true"`
    DBPort int `long:"db-port" default:"3306"`
    ....
}

and then call flags.Parse to get the struct fields initialised.

func main() {
   var opts = Options{}
   _, err := flags.Parse(&opts)
   ...
}

go-flags automatically checks for presence of required parameters and will do type conversion. Although, putting all the metadata/logic for a flag into one line seemed a bit limiting, it was totally fine as a quick solution.

Haskell

optparse-generic seemed to be the easiest way and, more importantly, had enough documentation to write a working code without searching at StackOverflow or scanning through the source code. Defining a record that derives Generic and an instance of ParseRecord does the job:

{-# LANGUAGE DeriveGeneric #-}
import Options.Generic

data Options = Options
{ group :: [String]
, input_dir :: String
, db_host :: String
, db_port :: Maybe Int
   -- ...
} deriving (Generic, Show)

instance ParseRecord Options

getOptions :: IO Options
getOptions = getRecord "exe-name"

The names of the fields (i.e. --input_dir) directly correspond to flags while Maybe parameters are used to define optional values.

main = do
    otps <- getOptions
    doSomethingElse opts

Magic.

OCaml

Among the three I liked cmdliner the most. It was flexible but at the same time, easy to get started with.

open Cmdliner

type options = {
    groups: string list;
    input_dir: string;
    db_host: string;
    db_port: int;
    (* ... *)
}

let build_opts_term = Term.(
    const build_opts $
    Arg.(non_empty & opt_all string [] (info ["g"; "group"])) $
    Arg.(required & opt (some string) None & info ["db-host"]) $
    Arg.(value & opt int 3306 & info ["db-port"]) $
    Arg.(value & opt dir default_dir & info ["input-dir"]) $
    (* ... *)
)

let main_term = Term.(const main $ build_opts_term)
let () = Term.exit @@ Term.eval (main_term, Term.info "ocaml-version")

It not only parses and converts parameters to appropriate type, but can also verify e.g. that dir exists.

Explicit. Smart. No magic.

Database access

Go

For database access I used database/sql and github.com/go-sqldriver/mysql. Time spent to get the thing running was hardly more than 3 hours which could have easily been more if not for the excellent Go database/sql tutorial.

Haskell

Haskell people apparently hate MySQL. There is no shortage of libraries that can be used to access a database but most of them designed for PostgreSQL. I tried three libraries. mysql-simple did look "simple" but for some reason I couldn't make the code that uses it to compile. mysql-haskell compiled just fine and docs were sufficient to start, but the DB server returned a protocol error. HDBC-mysql worked fine for me. Reading a chapter on HDBC in Real World Haskell was helpful.

OCaml

The libmysql binding did the job just fine. It also has some handy functions, for example Mysql.values takes a list of strings and returns a (val1,val2,...,valN) string which can be used in WHERE ... IN clause.

Go

The data returned by the DB queries was used to construct a glob pattern which would return a list of RRD files. I used path/filepath. It works fast and has a simple interface.

import (
    "path/filepath"
    "fmt"
)
// ...
pattern := fmt.Sprintf("%s/%s/*/%d/*/*.rrd", opts.InputDir, d.Blah, d.Id)
files, err := filepath.Glob(pattern)
for _, file := range files {
    processFile(file)
}

Haskell

Glob library did the job. It is easy to use and works well but in the optimised version the library turned out to be the bottleneck - it takes up to 48% of the total run time.

import System.FilePath.Glob (compile, globDir)
-- ...
listRRDFiles :: Options -> Ifaces -> [Device] -> IO [RRDFile]
listRRDFiles opts ifaces devices = do
   fss <- globDir patterns (input_dir opts)
   return $ concatMap f (zip fss devices)
 where patterns = fmap compile -- ...

I also tried sourceDirectoryDeep from conduit-combinators but it was much slower than Glob, presumably because it makes fstat call on every entry.

OCaml

I couldn't find one and used find command line tool at first replacing it by a C binding to libc's glob function later. With glob it looks almost identical to Go's version:

Printf.sprintf "%s/%s/*/%s/*/*.rrd" input_dir dc id
|> Glob.glob  (* call the glob function *)
|> Array.enum  (* convert the resulting array into Enum *)
|> Enum.filter_map (rrd_from_path ifaces (id, host, dc)) (* filter-map the result *)
|> Enum.iter (process_rrd rrdtool interval) (* and finally process each file *)

Querying RRD files

Go

I used a nice RRD library wrapper which was unobtrusive and took the minimum amount of time and effort to use:

result, err := rrd.Fetch(file.Path, "AVERAGE", start, end, 5*time.Minute)
return result.values() // returns an array of datapoints - []float64

Haskell

There were some bindings to librrd on Hackage but not on Stackage. I was short on time, with only two hours left till presentation, and made a decision to call rrdtool directly, parsing the text output into data points. It was far from ideal because starting a process 800000 times would certainly kill any performance. After the presentation I learnt that there was a "daemon mode" where the tool expected a command on stdin and produced output on stdout. It did improve the performance.

OCaml

I used rrdtool directly to have a fair comparison with Haskell version, although making bindings to librrd seemed trivial. The main difference from the Haskell version was that in the OCaml version each result-line (a data point) was processed immediately, dumping the result on stdout, while in the former the result was accumulated as [ByteString.Char8] and then processed, saving intermediate results into ByteString.Builder. Only after that the data was dumped onto stdout using

B.hPutBuilder stdout (renderLines lines).

This explains the outstanding memory usage results for the OCaml version.

Calculating query interval

Go

Dealing with time is always tricky. Nevertheless, in this case it was not so tricky because the standard time package had good documentation and the API, which allowed me to express what I needed in three lines of code

y, m, d := time.Now().AddDate(0, 0, -1).Date()
start := time.Date(y, m, d, 0, 0, 0, 0, time.Local)
end := time.Date(y, m, d, 23, 55, 0, 0, time.Local)

Haskell

time library is the first one that appears in the Hackage search and seems to be the go to tool to deal with anything time-related.

It took me a while to figure out how to get the interval. I ended up with the following (fairly convoluted) code:

getInterval :: IO (Int, Int)
getInterval = do
    ct <- getCurrentTime
    tz <- getCurrentTimeZone
    let LocalTIme day _ = utcToLocalTime tz ct
    let start = localTimeToUTC tz (LocalTime (addDays (-1) day) midnight)
    let posix = round $ diffUTCTime start epoch
    return (posix, posix + 23*60*60 + 55*60)
 where epoch = UTCTime (fromGregorian 1970 1 1) 0

OCaml

I wrote a simple function that would work only for + time zones:

let getInterval () =
  let t = Unix.time in
  let tm = Unix.localtime t in
  let offset = tm.tm_sec + tm.tm_min*60 + tm.tm_hour*60*60 in
  let start = int_of_float t - offset - 24*60*60 in
      (start, start + 23*60*60 + 55*60)

There is calendar library which could (probably) provide more robust implementation. Alas, at that point I could not to (didn't want to) invest any more time into the OCaml version.

Dumping data to CSV

Go

I used encoding/csv package simply because it popped first in the search. It was super-straightforward to use and took me hardly more than thirty minutes to find/implement.

writer := csv.NewWriter(os.Stdout)
defer writer.Flush()
// ...
for ... {
    writer.Write(record)
}

Haskell

No libraries, just dumping Builder -built ByteString s onto the standard out.

OCaml

Printf.fprintf to stdout.

Regex replace

Go

The regexp package did the job.

var convertRe, _ = regexp.Compile("[/:\\. (),'\"]+")
// and then in the loop
for ... {
    strings.ToLower(convertRe.ReplaceAllString(name, "_"))
}

Haskell

I thought that at this century any language supports that out of the box. There has to be replaceAll Regex -> String -> String -> String of sorts, right? Wrong. I couldn't believe it and still think that there has to be something out there that I overlooked. After spending 3 hours looking for the solution I gave up and wrote the function myself. Maybe some crazy instance of (=~) would've done the job but I couldn't figure it out.

Talking about the latter, if not for this ultra-mega-helpful tutorial I would have given up and wrote some crude regex-replace-like thing from scratch. I could never imagined that a regex library interface could be so daunting. It appears to be highly abstract on the surface but underneath it is all concrete and rigid, killing off any flexibility. It works only when there is an instance of a class, or more specifically, when the combination of types on the left hand side, right hand side, and the return type could be combined into something that makes sense from the authors point of view. So

(=~) :: String -> String -> Bool

works, but what would the following expression mean?

(=~) :: String -> String -> (Int, Bool)

Nothing. It fails to compile producing a terrifying error message. A super-heavy-overloaded operator could have been concise if type inference would work. Not in this case. The operator is polymorphic on all three parameters so I ended up explicitly typing all of them.

In short, the regex interface in Haskell is as far from being user-friendly as humanly possible.

OCaml

I tried both re and pcre packages. I found re's interface somewhat more approachable and since there was no significant difference in performance I went ahead and used it.

let iface_re = Re.compile @@ Re.Pcre.re "[/:\\. (),'\"]+"
let convert_iface_name name =
    String.lowercase_ascii @@ Re.replace_string iface_re ~by:"_" name

Containers

Go

Built-in maps and slices did the job.

Haskell

Not all hope is lost in Haskell land. The excellent containers package has a good introduction which is enough for most use cases one can face with. It was unobtrusive and it took me mere 10-15 minutes to code everything I needed.

Excellent.

OCaml

batteries had all I needed and paired with the all-powerful Enum interface delivered a smooth developer experience. There is, however, something that differentiates the OCaml version from the other two. A concrete version of a module shall be derived from a generic implementation:

module StrKey = struct
  type t = string
  let compare = Pervasives.compare
end
(* create string->'a map module *)
module StrMap = Map.Make(StrKey)
(* create string set module *)
module StrSet = Set.Make(StrKey)

In other words, a module must be parameterised by another module. I feel that it is a powerful feature and I hope to get back to that in a follow-up post.

String handling

Go

Go's strings and fmt were all I needed.

Haskell

After witnessing terrible performance of the rushed-in Haskell version I replaced String with ByteString. The library is mature, well documented and comprehensive. Especially large performance boost I got after plugging in ByteString.Builder wherever I could.

OCaml

batteries have many useful (byte)string-handling functions.

Conclusion

Go's godoc seems to have everything under the Sun and Moon.

Haskell's libraries, with rare exceptions, feel as if they were written with the sole purpose of being mathematically pure and highly abstract. And that does not always help productivity (unless you are writing another highly sophisticated abstract library, of course). It is not like I am against using mathematics and abstractions that stem from that. However, for a RAD tool a well designed set of libraries that solve real-life problems is a must.

OCaml's library set is tiny compared to the other two, but it does not prevent it from being practical.

To sum it up I segued through Go's libraries, struggled with Haskell libraries, and truly enjoyed OCaml ones.