Two simple pattern matchers for Common Lisp

:: programming, lisp

I’ve written two pattern matchers for Common Lisp:

  • destructuring-match, or dsm, is a case-style construct which can match destructuring-bind-style lambda lists with a couple of extensions;
  • spam, the simple pattern matcher, does not bind variables but lets you match based on assertions about, for instance, the contents of lists.

Both dsm and spam strive to be simple and correct.

Simplicity

Both dsm and spam are simple: they do exactly one thing, and try to do that one thing well.

You could think of dsm as being to some other CL pattern matchers as Unix once was to Multics: dsm is the result of me looking at those other systems and thinking ‘please, not that’.

Those systems are vast, have several levels, and are extensible: some subset of them might do what I wanted to be able to do — make writing macros less unpleasant — but I’m not sure1. They are obsessed with performance.

dsm does one thing, and exports a single macro. If you know how to use destructuring-bind and case you already know almost all there is to know about dsm: it’s a case construct whose cases are destructuring-bind lambda lists. dsm doesn’t care about performance at all, because macroexpansion performance never matters.

At least one of those matchers has almost as many commits in its repo as dsm has lines of code.

Like Multics was, those hairy pattern matchers are fine systems. But there was a good reason that Thompson and Ritchie wrote something very different2.

destructuring-match / dsm

in CL destructuring-bind and, mostly equivalently, macro argument lists are both a blessing and a curse. They’re a blessing because they support destructuring, so you can write, for instance

(defmacro with-foo ((var &optional init) &body forms)
  ...)

They’re a curse because they are so fragile: with-foo can only support that syntax and will fail with an ugly error message from the implementation when it is fed anything else.

Writing robust macros in CL, especially macros which expect various different argument patterns, then turns into a great saga of manually checking argument patterns before using destructuring-bind to actually bind things. The result of that, of course, is that very many CL macros are not robust and have terrible error reporting.

destructuring-match does away with all this unpleasentness. It supports a slightly extended version of the lambda lists that destructuring-bind supports, has ‘guard’ clauses which allow additional checks, and will match a form against any number of lambda lists until one matches, with a fallback case.

As an example here is a version of with-foo which allows two patterns:

(defmacro with-foo (&body forms)
  (destructuring-match forms
    (((var &optional init) &body body)
     (:when (symbolp var))
     ...)
    ((((var &optional type) &optional init) &body body)
     (:when (symbolp var))
     ...)
    (otherwise
     (error ...))))

The guard clauses check that var is a symbol before the match succeeds, and will therefore ensure that the second match is the one chosen for (with-foo ((x y) 1) ...).

destructuring-match also supports ‘blank’ variables: any variable whose name is _ (in any package) is ignored, and all such variables are distinct. So for instance

(destructuring-match l
  ((_ _ _) ...))

will match if l is a proper list with exactly three elements.

Using destructuring-match it’s easy to write this macro3:

(defmacro define-matching-macro (name &body clauses)
  (let ((<whole> (make-symbol "WHOLE"))
        (<junk> (make-symbol "JUNK")))
    (destructuring-match clauses
      ((doc . the-clauses)
       (:when (stringp doc))
       `(defmacro ,name (&whole ,<whole> &rest ,<junk>)
          ,doc
          (destructuring-match ,<whole> ,@the-clauses)))
      (the-clauses
       `(defmacro ,name (&whole ,<whole> &rest ,<junk>)
          (destructuring-match ,<whole> ,@the-clauses))))))

And this then allows the above with-foo macro to be written like this:

(define-matching-macro with-foo
  ((_ (var &optional init) &body forms)
   (:when (symbolp var))
   ...)
  ((_ ((var &optional type) &optional init) &body forms)
   (:when (symbolp var))
   ...)
  (form
   (error "~S is bad syntax for with-foo" form)))

dsm was not written with performance in mind but it seems to be, typically, around a tenth to a half the speed of destructuring-bind while being far more powerful of course.

dsm can be found here. It will probably end up in Quicklisp in due course but currently it isn’t there, and some of its dependencies are also not up to date there.

spam, the simple pattern matcher

dsm has a lot of cases where it needs to check what the lambda list it is parsing and compiling looks like. To do this I wrote a bunch of predicate constructors and combinators, which return predicates which will check things. So for example:

  • (is 'foo) returns a function which checks its argument is eql to foo;
  • (some-of p1 ... pn) returns a function of one argument which will succeed if one of the predicates which are its arguments succeeds, so (some-of (is 'foo) (is 'bar));
  • (head-matches p1 ... pn) will succeed if the predicates which are its arguments succeed on the first elements of a list.

There are several other predicate constructrors and predicate combinators, but spam can use any predicate.

There is then a matches macro which uses these to match things, and a matchp function which simply invokes a predicate.

As an example, here’s part of a matcher for &rest specifications in lambda lists.

(matching ll
  ((head-matches (some-of (is '&rest) (is '&body))
                 (var)
                 (is '&key))
   ;; &rest x &key ...
   ...)
   ((head-matches (some-of (is '&rest) (is '&body))
                  (var)
                  (any))
    ;; &rest x with something else
    ...)
   ((list-matches (some-of (is '&rest) (is '&body))
                  (var))
    ;; &rest x and no more
    ...)
   (otherwise
    (error "oops")))

spam is pretty useful, and code written using it is much easier to read than doing the equivalent checks manually. It is used extensively in the implementation of dsm.

spam is now one of my CL hax.


  1. At the time of writing Trivia supports lambda lists I think, but not destructuring-lambda lists: (match '(1 (1)) ((lambda-list a (b)) (values a b))) will fail, for instance. I don’t know whether is it meant to support destructuring lambda lists — comments in the sources imply it is, but it clearly does not in fact. 

  2. I am aware of Gabriel’s ‘worse is better’ paper and its various afterthoughts. dsm is not like that: it is smaller and simpler, but is not intended to be worse. dsm is to these other systems perhaps as Scheme was to CL. Gabriel also talks about these two options, of course. 

  3. Note this macro is 12 lines, half of which are handling the possible docstring.