Thursday, January 3, 2013

Introduction to Result.t vs Exceptions in Ocaml

This post uses Jane St's Core suite. Specifically the Result module. It assumes some basic knowledge of Ocaml. Please check out Ocaml.org for more Ocaml reading material.

There are several articles and blog posts out there arguing for or against return values over exceptions. I'll add to the discussion with my reasons for using return values in the place of exceptions in Ocaml.

What's the difference?

Why does the debate even exist? Because each side has decent arguments for why their preference is superior when it comes to writing reliable software. Pro-return-value developers, for example, argue that their code is easier identify if the code is wrong simply by reading it (if it isn't handling a return value of a function, it's wrong), while exception based code requires understanding all of the functions called to determine if and how they will fail. Pro-exception developers argue that it is much harder to get their program into an undefined state because an exception has to be handled or else the program fails, where in return based code one can simply forget to check a function's return value and the program continues on in an undefined state.

I believe that Ocaml has several features that make return values the preferable way to handle errors. Specifically variants, polymorphic variants, exhaustive pattern matching, and a powerful static type system make return values attractive.

This debate is only worth your time if you are really passionate about writing software that has fairly strong guarantees about its quality in the face of errors. For a majority of software, it doesn't matter which paradigm you choose. Most errors will be stumbled upon during debugging and fairly soon after going into production or through writing unit and integration tests. But, tests cannot catch everything. And in distributed and concurrent code rare errors can now become common errors and it can be near impossible to reconstruct the conditions that caused it. But in some cases it is possible to make whole classes of errors either impossible or catchable at compile-time with some discipline. Ocaml is at least one language that makes this possible.

Checked exceptions

A quick aside on checked exceptions, as in Java. Checked exceptions provide some of the functionality I claim is valuable, the main problem with how checked exceptions are implemented in Java (the only language I have any experience in that uses them), is they have a very heavy syntax, to the point where using them can seem too burdensome.

The Claim

The claim is that if one cares about ensuring they are handling all failure cases in their software, return-values are superior to exceptions because, with the help of a good type system, their handling can be validated at compile-time. Ocaml provides a fairly light, non intrusive, syntax to make this feasible.

Good Returns

The goal of a good return value based error handling system is to make sure that all errors are handled at compile-time. This is because there is no way to enforce this at run-time, as an exception does. This is a good reason to prefer exceptions in a dynamically typed language like Python or Ruby, your static analyzers are few and far between.

In C this is generally accomplished by using a linting tool that will report an error if a function's return value is ignored in a call. This is why you might see printf casted to void in some code, to make it clear the return value is meant to be ignored. But a problem with this solution is that it only enforces that the developer handles the return value, not all possible errors. For example, POSIX functions return a value saying the function failed and put the actual failure in errno. How, then, to enforce that all of the possible failures are handled? Without encoding all of that information in a linting tool, the options in C (and most languages) are pretty weak. Linting tools are also separate from the compiler and vary in quality. Writing code that takes proper advantage of a linting tool, in C, is a skill all of its own as well.

Better Returns

Ocaml supports exceptions but the compiler provides no guarantees that the exceptions are actually handled anywhere in the code. So what happens if the documentation of a function is incomplete or a dependent function is changed to add a new exception being thrown? The compiler won't help you.

But Ocaml's rich type system, combined with some discipline, gives you more power than a C linter. The primary strength is that Ocaml lets you encode information in your types. For example, in POSIX many functions return an integer to indicate error. But an int has no interesting meaning to the compiler other than it holds values between INT_MIN and INT_MAX. In Ocaml, we can instead create a type to represent the errors a function can return and the compiler can enforce that all possible errors are handled in some way thanks to exhaustive pattern matching.

An Example

What does all of this look like? Below a contrived example. The goal is to provide a function, called parse_person that takes a string and turns it into a person record. The requirements of the code is that if a valid person cannot be parsed out, the part of the string that failed is specified in the error message.

Here is a version using exceptions, ex1.ml:

open Core.Std

exception Int_of_string of string

exception Bad_line of string
exception Bad_name of string
exception Bad_age of string
exception Bad_zip of string

type person = { name : (string * string)
              ; age  : Int.t
              ; zip  : string
              }

(* A little helper function *)
let int_of_string s =
  try
    Int.of_string s
  with
    | Failure _ ->
      raise (Int_of_string s)

let parse_name name =
  match String.lsplit2 ~on:' ' name with
    | Some (first_name, last_name) ->
      (first_name, last_name)
    | None ->
      raise (Bad_name name)

let parse_age age =
  try
    int_of_string age
  with
    | Int_of_string _ ->
      raise (Bad_age age)

let parse_zip zip =
  try
    ignore (int_of_string zip);
    if String.length zip = 5 then
      zip
    else
      raise (Bad_zip zip)
  with
    | Int_of_string _ ->
      raise (Bad_zip zip)

let parse_person s =
  match String.split ~on:'\t' s with
    | [name; age; zip] ->
      { name = parse_name name
      ; age  = parse_age age
      ; zip  = parse_zip zip
      }
    | _ ->
      raise (Bad_line s)

let () =
  (* Pretend input came from user *)
  let input = "Joe Mama\t25\t11425" in
  try
    let person = parse_person input in
    printf "Name: %s %s\nAge: %d\nZip: %s\n"
      (fst person.name)
      (snd person.name)
      person.age
      person.zip
  with
    | Bad_line l ->
      printf "Bad line: '%s'\n" l
    | Bad_name name ->
      printf "Bad name: '%s'\n" name
    | Bad_age age ->
      printf "Bad age: '%s'\n" age
    | Bad_zip zip ->
      printf "Bad zip: '%s'\n" zip

ex2.ml is a basic translation of the above but using variants. The benefit is that the type system will ensure that all failure case are handled. The problem is the code is painful to read and modify. Every function that can fail has its own variant type to represent success and error. Composing the functions is painful since every thing returns a different type. We have to create a type that can represent all of the failures the other functions returned. It would be nice if each function could return an error and we could use that value instead. It would also be nice if everything read as a series of steps, rather than pattern matching on a tuple which makes it hard to read.

ex3.ml introduces Core's Result.t type. The useful addition is that we only need to define a type for parse_person. Every other function only has one error condition so we can just encode the error in the Error variant. This is still hard to read, though. The helper functions aren't so bad but the main function is still painful.

While the previous solutions have solved the problem of ensuring that all errors are handled, they introduced the problem of being painful to develop with. The main problem is that nothing composes. The helpers have their own error types and for every call to them we have to check their return and then encompass their error in any function above it. What would be nice is if the compiler could automatically union all of the error codes we want to return from itself and any function it called. Enter polymorphic variants.

ex4.ml Shows the version with polymorphic variants. The nice bit of refactoring we were able to do is in parse_person. Rather than an ugly match, the calls to the helper functions can be sequenced:

let parse_person s =
  match String.split ~on:'\t' s with
    | [name; age; zip] ->
      let open Result.Monad_infix in
      parse_name name >>= fun name ->
      parse_age  age  >>= fun age  ->
      parse_zip  zip  >>= fun zip  ->
      Ok { name; age; zip }
    | _ ->
      Error (`Bad_line s)

Don't worry about the monad syntax, it's really just to avoid the nesting to make the sequencing easier on the eyes. Except for the >>=, this looks a lot like code using exceptions. There is a nice linear flow and only the success path is shown. But! The compiler will ensure that all failures are handled.

The final version of the code is ex5.ml. This takes ex4 and rewrites portions of it to be prettier. As a disclaimer, I'm sure someone else would consider writing this differently even with the same restrictions I put on it, I might even write it different on a different day, but this version of the code demonstrates the points I am making.

A few points of comparison between ex1 and ex5:

  • The body of parse_person is definitely simpler and easier to read in the exception code. It is short and concise.
  • The rest of the helper functions are a bit of a toss-up between the exception and return-value code. I think one could argue either direction.
  • The return-value code has fulfilled my requirements in terms of handling failures. The compiler will complain if any failure parse_person could return is not handled. If I add another error type the code will not compile. It also fulfilled the requirements without bloating the code. The return-value code and exception code are roughly the same number of lines. Their flows are roughly equal. But the return-value code is much safer.

Two Points

It's not all sunshine and lollipops. There are two issues to consider:

  • Performance - Exceptions in Ocaml are really, really, fast. Like any performance issue, I suggest altering code only when needed based on measurements and encapsulating those changes as well as possible. This also means if you want to provide a safe and an exception version of a function, you should probably implement the safe version in terms of the exception verson.
  • Discipline - I referred to discipline a few times above. This whole scheme is very easy to mess up with a single mistake: pattern matching on anything (_). The power of exhaustive pattern matching means you need to match on every error individually. This is effectively for the same reason catching the exception base class in other languages is such a bad idea, you lose a lot of information.

Conclusion

The example given demonstrates an important point: code can become much safer at compile time without detriment to its length or readability. The cost is low and the benefit is high. This is a strong reason to prefer a return-value based solution over exceptions in Ocaml.

1 comment: