Libraries used in Haskell vs. Go vs…
Table of Contents
The decision process for including a particular package in the experiment consisted of answering a few questions:
- does the package have easy-to-read-and-comprehend documentation?
- can it be effortlessly downloaded/built using a package manager?
- is it mature enough/actively maintained?
- does the job with minimal efforts
In addition to the criteria above I added another one for Haskell: a package should be on Stackage. That excluded some of the libraries that could have been a better fit but I wanted to reduce extra complexity as much as possible.
Let's dive in.
Command line parser
The task was to parse parameters of the following shape:
$ baz --group=Group1 --group=Group2 --group=GroupN \ --db-name=mydb --db-host=example.com --db-port=1234 \ --db-user=user --db-password=password \ --input-dir=/temp/blah/foo
All the parameters except --group
and --db-host
were optional and --group
could be repeated, resulting in a list of values.
Go
After evaluating some libraries and discarding the standard flag
thing I
chose go-flags because it allowed me to replicate Perl version's interface with
the least amount of time/effort and seemed mature enough documentation and
implementation-wise.
One should define a struct
import "github.com/jessevdk/go-flags" type Options struct { Groups []string `long:"group" description:"Device group" required:"true"` InputDir string `long:"input-dir" default:"/tmp"` DBHost string `long:"db-host" required:"true"` DBPort int `long:"db-port" default:"3306"` .... }
and then call flags.Parse
to get the struct fields initialised.
func main() { var opts = Options{} _, err := flags.Parse(&opts) ... }
go-flags
automatically checks for presence of required parameters and will do
type conversion. Although, putting all the metadata/logic for a flag into one line
seemed a bit limiting, it was totally fine as a quick solution.
Haskell
optparse-generic seemed to be the easiest way and, more importantly, had enough
documentation to write a working code without searching at StackOverflow or
scanning through the source code. Defining a record that derives Generic
and
an instance of ParseRecord
does the job:
{-# LANGUAGE DeriveGeneric #-} import Options.Generic data Options = Options { group :: [String] , input_dir :: String , db_host :: String , db_port :: Maybe Int -- ... } deriving (Generic, Show) instance ParseRecord Options getOptions :: IO Options getOptions = getRecord "exe-name"
The names of the fields (i.e. --input_dir
) directly correspond to flags while
Maybe
parameters are used to define optional values.
main = do otps <- getOptions doSomethingElse opts
Magic.
OCaml
Among the three I liked cmdliner the most. It was flexible but at the same time, easy to get started with.
open Cmdliner type options = { groups: string list; input_dir: string; db_host: string; db_port: int; (* ... *) } let build_opts_term = Term.( const build_opts $ Arg.(non_empty & opt_all string [] (info ["g"; "group"])) $ Arg.(required & opt (some string) None & info ["db-host"]) $ Arg.(value & opt int 3306 & info ["db-port"]) $ Arg.(value & opt dir default_dir & info ["input-dir"]) $ (* ... *) ) let main_term = Term.(const main $ build_opts_term) let () = Term.exit @@ Term.eval (main_term, Term.info "ocaml-version")
It not only parses and converts parameters to appropriate type, but can also
verify e.g. that dir
exists.
Explicit. Smart. No magic.
Database access
Go
For database access I used database/sql
and github.com/go-sqldriver/mysql.
Time spent to get the thing running was hardly more than 3 hours which could
have easily been more if not for the excellent Go database/sql tutorial.
Haskell
Haskell people apparently hate MySQL. There is no shortage of libraries that can be used to access a database but most of them designed for PostgreSQL. I tried three libraries. mysql-simple did look "simple" but for some reason I couldn't make the code that uses it to compile. mysql-haskell compiled just fine and docs were sufficient to start, but the DB server returned a protocol error. HDBC-mysql worked fine for me. Reading a chapter on HDBC in Real World Haskell was helpful.
OCaml
The libmysql binding did the job just fine. It also has some handy functions,
for example Mysql.values
takes a list of strings and returns a
(val1,val2,...,valN)
string which can be used in WHERE ... IN
clause.
File search
Go
The data returned by the DB queries was used to construct a glob pattern which would return a list of RRD files. I used path/filepath. It works fast and has a simple interface.
import ( "path/filepath" "fmt" ) // ... pattern := fmt.Sprintf("%s/%s/*/%d/*/*.rrd", opts.InputDir, d.Blah, d.Id) files, err := filepath.Glob(pattern) for _, file := range files { processFile(file) }
Haskell
Glob library did the job. It is easy to use and works well but in the optimised version the library turned out to be the bottleneck - it takes up to 48% of the total run time.
import System.FilePath.Glob (compile, globDir) -- ... listRRDFiles :: Options -> Ifaces -> [Device] -> IO [RRDFile] listRRDFiles opts ifaces devices = do fss <- globDir patterns (input_dir opts) return $ concatMap f (zip fss devices) where patterns = fmap compile -- ...
I also tried sourceDirectoryDeep
from conduit-combinators
but it was much
slower than Glob
, presumably because it makes fstat
call on every entry.
OCaml
I couldn't find one and used find
command line tool at first replacing it by
a C binding to libc
's glob
function later. With glob
it looks almost
identical to Go's version:
Printf.sprintf "%s/%s/*/%s/*/*.rrd" input_dir dc id |> Glob.glob (* call the glob function *) |> Array.enum (* convert the resulting array into Enum *) |> Enum.filter_map (rrd_from_path ifaces (id, host, dc)) (* filter-map the result *) |> Enum.iter (process_rrd rrdtool interval) (* and finally process each file *)
Querying RRD files
Go
I used a nice RRD library wrapper which was unobtrusive and took the minimum amount of time and effort to use:
result, err := rrd.Fetch(file.Path, "AVERAGE", start, end, 5*time.Minute) return result.values() // returns an array of datapoints - []float64
Haskell
There were some bindings to librrd
on Hackage but not on Stackage. I was
short on time, with only two hours left till presentation, and made a decision
to call rrdtool
directly, parsing the text output into data points. It was
far from ideal because starting a process 800000 times would certainly kill any
performance. After the presentation I learnt that there was a "daemon mode"
where the tool expected a command on stdin
and produced output on
stdout
. It did improve the performance.
OCaml
I used rrdtool
directly to have a fair comparison with Haskell version,
although making bindings to librrd
seemed trivial. The main difference from
the Haskell version was that in the OCaml version each result-line (a data
point) was processed immediately, dumping the result on stdout
, while in the
former the result was accumulated as [ByteString.Char8]
and then processed,
saving intermediate results into ByteString.Builder
. Only after that the
data was dumped onto stdout
using
B.hPutBuilder stdout (renderLines lines).
This explains the outstanding memory usage results for the OCaml version.
Calculating query interval
Go
Dealing with time is always tricky. Nevertheless, in this case it was not so tricky because the standard time package had good documentation and the API, which allowed me to express what I needed in three lines of code
y, m, d := time.Now().AddDate(0, 0, -1).Date() start := time.Date(y, m, d, 0, 0, 0, 0, time.Local) end := time.Date(y, m, d, 23, 55, 0, 0, time.Local)
Haskell
time library is the first one that appears in the Hackage search and seems to be the go to tool to deal with anything time-related.
It took me a while to figure out how to get the interval. I ended up with the following (fairly convoluted) code:
getInterval :: IO (Int, Int) getInterval = do ct <- getCurrentTime tz <- getCurrentTimeZone let LocalTIme day _ = utcToLocalTime tz ct let start = localTimeToUTC tz (LocalTime (addDays (-1) day) midnight) let posix = round $ diffUTCTime start epoch return (posix, posix + 23*60*60 + 55*60) where epoch = UTCTime (fromGregorian 1970 1 1) 0
OCaml
I wrote a simple function that would work only for +
time zones:
let getInterval () = let t = Unix.time in let tm = Unix.localtime t in let offset = tm.tm_sec + tm.tm_min*60 + tm.tm_hour*60*60 in let start = int_of_float t - offset - 24*60*60 in (start, start + 23*60*60 + 55*60)
There is calendar library which could (probably) provide more robust implementation. Alas, at that point I could not to (didn't want to) invest any more time into the OCaml version.
Dumping data to CSV
Go
I used encoding/csv package simply because it popped first in the search. It was super-straightforward to use and took me hardly more than thirty minutes to find/implement.
writer := csv.NewWriter(os.Stdout) defer writer.Flush() // ... for ... { writer.Write(record) }
Haskell
No libraries, just dumping Builder
-built ByteString
s onto the standard out.
OCaml
Printf.fprintf
to stdout
.
Regex replace
Go
The regexp package did the job.
var convertRe, _ = regexp.Compile("[/:\\. (),'\"]+") // and then in the loop for ... { strings.ToLower(convertRe.ReplaceAllString(name, "_")) }
Haskell
I thought that at this century any language supports that out of the box. There
has to be replaceAll Regex -> String -> String -> String
of sorts, right?
Wrong. I couldn't believe it and still think that there has to be something out
there that I overlooked. After spending 3 hours looking for the solution I gave up
and wrote the function myself. Maybe some crazy instance of (=~)
would've
done the job but I couldn't figure it out.
Talking about the latter, if not for this ultra-mega-helpful tutorial I would have given up and wrote some crude regex-replace-like thing from scratch. I could never imagined that a regex library interface could be so daunting. It appears to be highly abstract on the surface but underneath it is all concrete and rigid, killing off any flexibility. It works only when there is an instance of a class, or more specifically, when the combination of types on the left hand side, right hand side, and the return type could be combined into something that makes sense from the authors point of view. So
(=~) :: String -> String -> Bool
works, but what would the following expression mean?
(=~) :: String -> String -> (Int, Bool)
Nothing. It fails to compile producing a terrifying error message. A super-heavy-overloaded operator could have been concise if type inference would work. Not in this case. The operator is polymorphic on all three parameters so I ended up explicitly typing all of them.
In short, the regex interface in Haskell is as far from being user-friendly as humanly possible.
OCaml
I tried both re and pcre packages. I found re
's interface somewhat more
approachable and since there was no significant difference in performance I
went ahead and used it.
let iface_re = Re.compile @@ Re.Pcre.re "[/:\\. (),'\"]+" let convert_iface_name name = String.lowercase_ascii @@ Re.replace_string iface_re ~by:"_" name
Containers
Go
Built-in maps and slices did the job.
Haskell
Not all hope is lost in Haskell land. The excellent containers package has a good introduction which is enough for most use cases one can face with. It was unobtrusive and it took me mere 10-15 minutes to code everything I needed.
Excellent.
OCaml
batteries
had all I needed and paired with the all-powerful Enum
interface
delivered a smooth developer experience. There is, however, something that
differentiates the OCaml version from the other two. A concrete version of a
module shall be derived from a generic implementation:
module StrKey = struct type t = string let compare = Pervasives.compare end (* create string->'a map module *) module StrMap = Map.Make(StrKey) (* create string set module *) module StrSet = Set.Make(StrKey)
In other words, a module must be parameterised by another module. I feel that it is a powerful feature and I hope to get back to that in a follow-up post.
String handling
Haskell
After witnessing terrible performance of the rushed-in Haskell version I
replaced String
with ByteString
. The library is mature, well documented and
comprehensive. Especially large performance boost I got after plugging in
ByteString.Builder
wherever I could.
OCaml
batteries
have many useful (byte)string-handling functions.
Conclusion
Go's godoc seems to have everything under the Sun and Moon.
Haskell's libraries, with rare exceptions, feel as if they were written with the sole purpose of being mathematically pure and highly abstract. And that does not always help productivity (unless you are writing another highly sophisticated abstract library, of course). It is not like I am against using mathematics and abstractions that stem from that. However, for a RAD tool a well designed set of libraries that solve real-life problems is a must.
OCaml's library set is tiny compared to the other two, but it does not prevent it from being practical.
To sum it up I segued through Go's libraries, struggled with Haskell libraries, and truly enjoyed OCaml ones.