Libraries used in Haskell vs. Go vs...

The decision process for a particular package in the experiment consisted of answering a few questions:

In addition to the criteria above I added another one for Haskell: a package should be on Stackage. That excluded some of the libraries that could have been a better fit but I wanted to reduce extra complexity as much as possible.

Let’s dive in.

Command line parser

The task was to parse parameters of the following shape:

$ baz --group=Group1 --group=Group2 --group=GroupN \
      --db-name=mydb --db-host=example.com --db-port=1234 \
      --db-user=user --db-password=password \
      --input-dir=/temp/blah/foo

All the parameters but --group and --db-host were optional and --group could be repeated resulting in a list of values.

Go

After evaluating some libraries and discarding the standard flag thing I chose go-flags because it allowed me to replicate Perl version’s interface with the least amount of time/effort and seemed mature enough documentation and implementation-wise.

One should define a struct

import "github.com/jessevdk/go-flags"

type Options struct {
    Groups []string `long:"group" description:"Device group" required:"true"`
    InputDir string  `long:"input-dir" default:"/tmp"`
    DBHost string `long:"db-host" required:"true"`
    DBPort int `long:"db-port" default:"3306"`
    ....
}

and then call

func main() {
   var opts = Options{}
   _, err := flags.Parse(&opts)
   ...
}

to get the struct fields initialized. go-flags automatically checks for presence of required parameters and will do type conversion. Having to put all the metadata/logic for a flag into one line seemed a bit limiting but as a quick solution it was totally fine.

Haskell

optparse-generic seemed to be the easiest way and more importantly had enough documentation to write something working without looking at StackOverflow or scanning through the source code. Defining a record that derives Generic and an instance of ParseRecord does all the job:

{-# LANGUAGE DeriveGeneric #-}
import Options.Generic

data Options = Options
{ group :: [String]
, input_dir :: String
, db_host :: String
, db_port :: Maybe Int
   -- ...
} deriving (Generic, Show)

instance ParseRecord Options

getOptions :: IO Options
getOptions = getRecord "exe-name"

The names of the fields directly correspond to flags (i.e. --input_dir) and Maybe parameters are used to define optional values.

main = do
    otps <- getOptions
    doSomethingElse opts

Magic.

OCaml

Among the three I liked cmdliner the most. It was very flexible but at the same time easy to get started with.

open Cmdliner

type options = {
    groups: string list;
    input_dir: string;
    db_host: string;
    db_port: int;
    ...
}

let build_opts_term = Term.(
    const build_opts $
    Arg.(non_empty & opt_all string [] (info ["g"; "group"])) $
    Arg.(required & opt (some string) None & info ["db-host"]) $
    Arg.(value & opt int 3306 & info ["db-port"]) $
    Arg.(value & opt dir default_dir & info ["input-dir"]) $
    ...
)

let main_term = Term.(const main $ build_opts_term)
let () = Term.exit @@ Term.eval (main_term, Term.info "ocaml-version")

It not only parses and converts parameters to appropriate type, but can also verify e.g. that dir exists. Explicit. Smart. No magic.

Database access

Go

For database access I used database/sql and github.com/go-sqldriver/mysql. Time spent to get the thing running was hardly more than 3 hours which could have easily been more if not for the excellent Go database/sql tutorial.

Haskell

Haskell people apparently hate MySQL. There is no shortage of libraries that can be used to access a database but most of them designed for PostgreSQL. I tried three libraries. mysql-simple did look “simple” but for some reason I couldn’t make the code that uses it to compile. mysql-haskell compiled just fine and docs were sufficient to start, however the DB server returned a protocol error. HDBC-mysql worked fine for me. Reading a chapter on HDBC in Real World Haskell was helpful.

OCaml

The libmysql binding did the job just fine. It also has some handy functions, for example Mysql.values takes a list of strings and returns a (val1,val2,...,valN) string which can be used in WHERE ... IN clause.

Go

The data returned by the DB queries was used to construct a glob pattern which would return a list of RRD files. I used path/filepath. It works fast and has simple interface.

import (
    "path/filepath"
    "fmt"
)
// ...
pattern := fmt.Sprintf("%s/%s/*/%d/*/*.rrd", opts.InputDir, d.Blah, d.Id)
files, err := filepath.Glob(pattern)
for _, file := range files {
    processFile(file)
}
Haskell

Glob library did the job. It is easy to use and works well but in the optimised version the library turned out to be the bottleneck - it takes up to 48% of the total run time.

import System.FilePath.Glob (compile, globDir)
-- ...
listRRDFiles :: Options -> Ifaces -> [Device] -> IO [RRDFile]
listRRDFiles opts ifaces devices = do
   fss <- globDir patterns (input_dir opts)
   return $ concatMap f (zip fss devices)
 where patterns = fmap compile -- ...

I also tried sourceDirectoryDeep from conduit-combinators but it was much slower than Glob, presumably because it makes fstat call on every entry.

OCaml

I couldn’t find one and used find command line tool at first replacing it by a C binding to libc’s glob function later. With glob it looks almost identical to Go’s version:

Printf.sprintf "%s/%s/*/%s/*/*.rrd" input_dir dc id
|> Glob.glob  (* call the glob function *)
|> Array.enum  (* convert the resulting array into Enum *)
|> Enum.filter_map (rrd_from_path ifaces (id, host, dc)) (* filter-map the result *)
|> Enum.iter (process_rrd rrdtool interval) (* and finally process each file *)

Querying RRD files

Go

I used a nice RRD library wrapper which did the job unobtrusively, with minimum amount of time and effort spent:

result, err := rrd.Fetch(file.Path, "AVERAGE", start, end, 5*time.Minute)
return result.values() // returns an array of datapoints - []float64
Haskell

There were some bindings to librrd on Hackage but not on Stackage. I was short on time with only two hours left till presentation and made a decision to call rrdtool directly parsing the output into data points. It was far from ideal because starting a process 800000 times would certainly kill any performance. After the presentation I learnt that there was a “daemon mode” where the tool expected a command on stdin and produced output on stdout. It did improve the performance.

OCaml

I used rrdtool directly to have a fair comparison with Haskell version although making bindings to librrd seemed trivial. The main difference from Haskell version was that in OCaml version each result-line (a datapoint) was processed immediately dumping the result on stdout while in the former the result was accumulated as [ByteString.Char8] and then processed saving result into ByteString.Builder and only after that dumped onto stdout using B.hPutBuilder stdout (renderLines lines). That probably explains the outstanding memory usage results for OCaml.

Calculating query interval

Go

Dealing with time is always tricky. However in this case it was not so tricky because the standard time package had good documentation and the API that allowed me to express what I needed in three lines of code

y, m, d := time.Now().AddDate(0, 0, -1).Date()
start := time.Date(y, m, d, 0, 0, 0, 0, time.Local)
end := time.Date(y, m, d, 23, 55, 0, 0, time.Local)
Haskell

time library is the first one that appears in the Hackage search and seems to be the go to tool to deal with anything time-related.

It took me a while to figure out how to get the interval. I ended up with the following:

getInterval :: IO (Int, Int)
getInterval = do
    ct <- getCurrentTime
    tz <- getCurrentTimeZone
    let LocalTIme day _ = utcToLocalTime tz ct
    let start = localTimeToUTC tz (LocalTime (addDays (-1) day) midnight)
    let posix = round $ diffUTCTime start epoch
    return (posix, posix + 23*60*60 + 55*60)
 where epoch = UTCTime (fromGregorian 1970 1 1) 0
OCaml

I wrote a simple function that would work only for + time zones:

let getInterval () =
  let t = Unix.time in
  let tm = Unix.localtime t in
  let offset = tm.tm_sec + tm.tm_min*60 + tm.tm_hour*60*60 in
  let start = int_of_float t - offset - 24*60*60 in
      (start, start + 23*60*60 + 55*60)

There is calendar library that would probably provide more robust implementation but at that point I did not want to invest any more time into that.

Dumping data to CSV

Go

I used encoding/csv package simply because it popped first in the search. It was super-straightforward to use and took me hardly more than thirty minutes to find/implement.

writer := csv.NewWriter(os.Stdout)
defer writer.Flush()
// ...
for ... {
    writer.Write(record)
}
Haskell

No libraries, just dupming Builder-built ByteStrings onto the standard out.

OCaml

Printf.fprintf to stdout.

Regex replace

Go

The regexp package did the job.

var convertRe, _ = regexp.Compile("[/:\\. (),'\"]+")
// and then in the loop
for ... {
    strings.ToLower(convertRe.ReplaceAllString(name, "_"))
}
Haskell

I thought any language nowadays supports that out of the box. There has to be replaceAll Regex -> String -> String -> String of sorts, right? Wrong. I couldn’t believe it and still think that there has to be something out there that I just overlooked. After spending 3 hours looking for that I gave up and wrote the function myself. Maybe some crazy instance of (=~) would’ve done the job but I couldn’t figure it out.

Talking about the latter, if not for this ultra-mega-helpful tutorial I would have given up and wrote some crude regex-replace-like thing from scratch. I could never imagined that a regex library interface could be so daunting. It appears to be highly abstract on the surface but underneath it is all concrete and rigid killing off any flexibility. It works only when there is an instance of a class, or more specifically, when the combination of types on the left hand side, right hand side, and the return type could be combined into something that makes sense from the authors point of view. So

(=~) :: String -> String -> Bool

works, but what would the following expression mean?

(=~) :: String -> String -> (Int, Bool)

Nothing. It fails to compile producing a terrifying error message. A super-heavy-overloaded operator could have been concise if type inference would work. Not in this case. The operator is polymorphic on all three parameters so I ended up explicitly typing all of them.

In short, the regex interface in Haskell is as far from being user-friendly as humanly possible.

OCaml

I tried both re and pcre packages. I found re’s interface somewhat more approachable and since there was no significant difference in performance I ended up using it.

let iface_re = Re.compile @@ Re.Pcre.re "[/:\\. (),'\"]+"
let convert_iface_name name =
    String.lowercase_ascii @@ Re.replace_string iface_re ~by:"_" name

Containers

Go

Built-in maps and slices did the job.

Haskell

Not all hope is lost in Haskell land. The excellent containers package has very good introduction which is enough for probably most use cases one can imagine. It was unobtrusive and took only 10-15 minutes to code everything I needed.

OCaml

batteries had all I needed and paired with the all-powerful Enum interface delivered smooth developer experience. There is, however, something that differentiates OCaml version from the other two. A concrete version of a module shall be derived from the generic implementation:

module StrKey = struct
  type t = string
  let compare = Pervasives.compare
end
(* create string->'a map module *)
module StrMap = Map.Make(StrKey)
(* create string set module *)
module StrSet = Set.Make(StrKey)

In other words a module can be parameterised by another module. I feel that it is a very powerful feature and I hope to get back to that in a follow-up post.

String handling

Go

Go’s strings and fmt were all I needed.

Haskell

After witnessing terrible performance of the rushed-in Haskell version I replaced String with ByteString. The library is mature, well documented and comprehensive. Especially good performance boost I got after plugging in ByteString.Builder wherever I could.

OCaml

batteries have many useful (byte)string-handling functions.

Conclusion

Go’s godoc seems to have everything under the Sun and Moon.

Haskell’s libraries, with rare exceptions, feel as if they were written with the sole purpose of being mathematically pure and highly abstract. And that does not always help productivity (unless you are writing another highly sophisticated abstract library, of course). It is not like I am against using mathematics and abstractions that stem from that. However for a RAD tool a well designed set of libraries that help solve real-life problems is a must.

OCaml’s library set is tiny compared to the other two, but it does not prevent it from being useful and practical.

To sum it up I segued through Go’s libraries, struggled with Haskell libraries and truly enjoyed OCaml ones.