Macros in Racket, part two

:: computer, lisp

The second part of my notes on writing macros in Racket.

This is the second part of at least three: the first part is here, and the third part is here. This won’t make much sense unless you’ve read that. As before I make no claims to be an expert in Racket’s macro system although I am familiar with Lisp macros in general: this is just some more notes I wrote while learning it.

The unwashed Lisp hacker’s version of collecting

So, we can write clet: can we write collecting? Yes, we can:

(require (for-syntax racket/list))

(define-syntax (collecting stx)
  (datum->syntax
   (quote-syntax collecting)
   `(let ([r '()])
      (define (,(datum->syntax stx 'collect) it)
        (set! r (cons it r)) it)
      ,@(rest (syntax->list stx))
      (reverse r))))

This works because, in the internal definition of collect, we’ve intentionally given it a name which uses the context of the syntax object we’re transforming, not the context of the macro. It’s easy to confirm that this works the way you would expect, and in particular that it’s safe in both directions: for instance

> (let ((reverse (λ (x) x)))
    (collecting (collect 1) (collect 2)))
'(1 2)

shows that the binding of reverse when the macro is called has not ‘infected’ the macro definition.

It seems as if that should be all you need: so long as you are careful about which context you choose, and you make sure that the ‘default’ context is the one from the macro not from where it is used. In fact it isn’t, quite: see below. However even if it were, it’s clearly a pain to write macros this way.

Pattern matching

Pretty much all macros do two things:

  1. deconstruct their arguments in some more-or-less complicated way, but almost always in a way which is significantly more complicated than anything that needs to be done for the arguments of a function;
  2. construct a form which is the result of the macro and which, again, may be complicated.

The beauty of traditional Lisp macros is that since the arguments and results of the macro were just what the reader spat out — lists and symbols and so on — and since Lisp was kind of good at doing things to these structures as it was designed for that, and finally since the whole power of the language was available in the macro, this was not horrible even without special tools, although it was not particularly pleasant for complicated macros.

Hygienic macros make this much less pleasant because the objects that need to be deconstructed and constructed are now opaque syntax objects, and there is additional worrying about context to do. The answer to this is to provide special tools which do the boring bits for you: this makes everything simpler, at the cost of making it still more opaque what is actually happening. In almost all cases that’s a tradeoff worth making. Pattern matching is also a fashionable thing amongst the young and hip, of course.

The way this is done in Racket is via syntax-case, its slightly simpler friend syntax-rules, and by syntax and variants on it.

syntax-case takes a bit of syntax and matches it against patterns, binding matches, which can then be used in syntax forms lexically within it to return syntax objects, whose context is that of the syntax-case form (so hygienic). There is syntactic sugar for syntax: (syntax ...) can be written #'... in the same way that (quote ...) can be written '.... There is also quasisyntax which works the same way as quasiquote, except that the various unquoting things are preceeded with #. quasisyntax, unsurprisingly also has syntactic sugar coating: (quasisyntax ...) can be written #`....

I’m not going to describe the patterns in any detail, largely because I only understand the simple cases. However the simple cases are relatively easy to understand and pleasant to use.

Once a case has matched in syntax-case the corresponding expression is evaluated, and its value is the value of the form. Generally that wants to be a bit of syntax.

The first important thing to understand is that syntax is not quote-for-syntax: it interpolates things which matched in a lexically surrounding syntax-case, if there is one (if there isn’t, then I think it is quote-for-syntax).

The second important thing to understand is that syntax-case and syntax turn Racket into a sort of bodged Lisp–2: the things matched by syntax-case can be used only in syntax forms. But it’s not actually a separate namespace, because if you refer to them outwith such a form you get a compile-time error. I don’t know why this is — perhaps to avoid accidentally naming matches outside a syntax form — but it is certainly annoying.

So, here are some examples.

A simple while form:

(define-syntax (while stx)
  (syntax-case stx ()
    [(_ test body ...)
     #'(let loop ()
         (when test
           body ...
           (loop)))]))

A simple implementation of let, leaving out the named-let case, which shows how good the pattern matching is:

(define-syntax (with stx)
  (syntax-case stx ()
    [(_ ([var val] ...) body ...)
     #'((λ (var ...) body ...) val ...)]))

A better implementation which deals with the empty body case ((λ (...)) is illegal in Racket) and also optimises a simple case:

(define-syntax (with stx)
  (syntax-case stx ()
    [(_ () body ...)
     ;; no vars: trivial case
     #'(begin body ...)]
    [(_ ([var val] ...))
     ;; null body: make sure vars are evaluated
     #'(begin val ... (void))]
    [(_ ([var val] ...) body ...)
     #'((λ (var ...) body ...) val ...)]))

One thing which syntax-case allows is the notion of literal names which must occur in the source. So for instance let’s say I wanted to write some mutant loop macro whose syntax was (loop for x in y do ...): where for, in, do are literals. Well, I can write something to match this:

> (define-syntax (loop stx)
    (syntax-case stx (for in do)
    [(_ for v in l do body ...)
     #'(for ([v (in-list l)]) body ...)]))
> (loop for x in '(1 2 3) do (print x))
123
> (loop with x in '(1 2 3) do (print x))
loop: bad syntax in: (loop with x in (quote (1 2 3)) do (print x))

The syntax object that corresponds to stx here is the whole form: the equivalent to CL’s &WHOLE. It’s almost never necessary to worry about the car of this since it will obviously be loop. However I’m always tempted to provide it as a literal.

syntax-rules is (almost: there is some complexity I think) a wrapper around syntax-case which provides the function wrapper for it and which implicitly wraps the right hand side of the cases, which must be just one form, in a syntax form. So the above definition of with could be written:

(define-syntax with
  (syntax-rules ()
    [(_ () body ...)
     ;; no vars: trivial case
     (begin body ...)]
    [(_ ([var val] ...))
     ;; null body: make sure vars are evaluated
     (begin val ... (void))]
    [(_ ([var val] ...) body ...)
     ((λ (var ...) body ...) val ...)]))

syntax-rules can be defined something like this (this is due to bmastenbrook):

(require (for-syntax 
          (rename-in racket 
                     [syntax-rules racket:syntax-rules])))

(begin-for-syntax
  (define-syntax syntax-rules
    (racket:syntax-rules ()
      [(_ literals (pattern expansion) ...)
       (lambda (s)
         (syntax-case s literals
           (pattern #'expansion) ...))])))

define-syntax-rule combines define-syntax and a single rule for syntax-rules. I think it might be equivalent to this:

(define-syntax define-syntax-rule
  (syntax-rules ()
    [(_ (name pat ...) expansion)
     (define-syntax name
       (syntax-rules ()
         [(name pat ...) expansion]))]))

although I am probably missing some complexity here.

There is a useful variant on syntax-case called with-syntax: it looks more like let-style thing, and all the patterns in the clauses must match, when all the pattern variables will be bound.

So, what about our desirable macros?

collect is pretty easy. Here are two different versions. The first uses quasisyntax:

(define-syntax (collecting stx)
  (syntax-case stx ()
    [(_) #'(void)]
    [(_ body ...)
     #`(let ([r '()])
         (define (#,(datum->syntax stx 'collect) it)
           (set! r (cons it r)) it)
         body ...
         (reverse r))]))

The second uses with-syntax:

(define-syntax (collecting stx)
  (syntax-case stx ()
    [(_) #'(void)]
    [(_ body ...)
     (with-syntax ([collect (datum->syntax stx 'collect)])
       #'(let ([r '()])
         (define (collect it)
           (set! r (cons it r)) it)
           body ...
           (reverse r)))]))

This is pretty nice, I think. Note that you could not do this with syntax-rules, or at least I can’t see how to do it: syntax-rules is quite a lot less general than syntax-case.

clet is harder, because each element of the binding list may be either an identifier or a two-element list. If we insisted on a two-element list it would be easy (see above). Here is the best I can do:

(require racket/undefined)        

(define-syntax (clet stx)
  (syntax-case stx ()
    [(_ ()) #'(void)]
    [(_ () body ...) #'(begin body ...)]
    [(_ (b ...) body ...)
     (let-values ([(vars vals)
                   (for/lists (as vs) ([binding (syntax->list #'(b ...))])
                     (syntax-case binding ()
                       [(var val) 
                        (identifier? #'var)
                        (values #'var #'val)]
                       [var
                        (identifier? #'var)
                        (values #'var #'undefined)]
                       [_ (raise-syntax-error #f "bad binding" stx)]))])
       #`((λ #,vars body ...) #,@vals))]))

Well, this is still quite hairy, but almost all of the hair involves processing the binding list, which is done using syntax-case again, using an additional feature of it whereby it can use a ‘guard’ expression to decide whether a clause matches: identifer? returnt true if a syntax object refers to an identifier. I think there must be a way of using with-syntax to avoid the quasisyntax form.

Even with all this hair, this version of clet is far easier to read than the previous one, and not harder to read than the CL equivalent.

A better version of clet would, I think, need a proper parser for syntax. I think that is what syntax-parse is, although I have not investigated that.

Macro composition

As mentioned above, we don’t yet have quite all the tools we need to write some kinds of macros: specifically macros which are intentionally slightly unygienic, such as collecting. As an example, let’s suppose we wanted a general purpose, intentionally-unhygenic, with-abort macro which provided an abort function which would, well, abort. Without thinking too hard about the implications of call/cc we could write this as:

(define-syntax (with-abort stx)
  (syntax-case stx ()
    [(_ body ...)
     #`(call/cc (λ (#,(datum->syntax stx 'abort))
                  body ...))]))

So now (with-abort (abort 2) (end-the-world)) returns 2 and does not end the world.

Well, we might want to use this macro in another macro:

(define-syntax-rule (while/abort test body ...)
  (with-abort
    (let loop ([r test])
      (when r
        body ...
        (loop test)))))

Now something like the following will work:

> (let ([x 0])
    (while/abort (< x 10) (set! x (+ x 1)) (print x)))
12345678910

But the whole point was to be able to use abort in the body, and that doesn’t work:

> (let ([x 0])
    (while/abort (< x 10) (set! x (+ x 1)) (when (> x 1) (abort 'done))))
abort: undefined;
 cannot reference an identifier before its definition

Oh, dear. The problem here is that while/abort is hygenic, so the abort binding that is introduced by with-abort is not visible in the body.

We could fix this by better design:

(define-syntax-rule (with-named-abort (abort) body ...)
  ;; a better macro
  (call/cc (λ (abort) body ...)))

(define-syntax (with-abort stx)
  ;; backwards compatible
  (syntax-case stx ()
    [(_ body ...)
     #`(with-abort (#,(datum->syntax stx 'abort)) body ...)]))

(define-syntax (while/abort stx)
  ;; the end result
  (syntax-case stx ()
    [(_ test body ...)
     #`(with-named-abort (#,(datum->syntax stx 'abort))
         (let loop ([r test])
           (when r
             body ...
             (loop test))))]))

But that’s not the solution we’re after.

Racket’s answer to this is syntax parameters. I don’t completely understand these, but they are at least close to dynamic variables, except at macro-expansion time. What you do is to define a syntax parameter, and then rebind it during the expansion: the rebound value is visible to macros which are expanded dynamically within the rebinding form. As with Racket’s ordinary special variables these look like functions (yet another namespace in disguise).

So we can define a syntax parameter called abort using define-syntax-parameter:

(require racket/stxparam)

(define-syntax-parameter abort
  (λ (stx)
    (raise-syntax-error #f "not available" stx)))

So now any reference to abort will result in a syntax error:

> (abort)
abort: not available in: (abort)
> abort
abort: not available in: abort

And we can now try to use syntax-parameterize, to rebind abort as a macro:

(define-syntax with-abort
  (syntax-rules (with-abort)
    [(with-abort) (void)]
    [(with-abort body ...)
     (call/cc
      (λ (a)
        (syntax-parameterize ([abort
                               (syntax-rules ()
                                 [(_ ...) (a ...)])])
          body ...)))]))

And this fails horribly, because the outer syntax-rules thinks it owns the patterns and sees ...s that it does not expect. So much for that.

Well, we could at least check this works with a specific number of arguments:

(define-syntax with-abort
  (syntax-rules (with-abort)
    [(with-abort) (void)]
    [(with-abort body ...)
     (call/cc
      (λ (a)
        (syntax-parameterize ([abort
                               (λ (stx)
                                 (syntax-case stx (abort)
                                   [(abort) #'(a)]
                                   [(abort x) #'(a x)]
                                   [_ (raise-syntax-error #f "I give up" stx)]))])
          body ...)))]))

But this is obviously just a rubbish answer.

Well, there is an answer to this: all we really need to do is to make the abort macro attach itself to a, and there is a special hack, make-rename-transformer, to do this:

(define-syntax with-abort
  (syntax-rules (with-abort)
    [(with-abort) (begin)]
    [(with-abort body ...)
     (call/cc
      (λ (a)
        (syntax-parameterize ([abort (make-rename-transformer #'a)])
          body ...)))]))

And this now works:

> (with-abort (abort 1 2 3))

1
2
3

And we can use this to write a really robust version of collecting

(require racket/stxparam)

(define-syntax-parameter collect
  (λ (stx)
    (raise-syntax-error #f "not collecting" stx)))

(define-syntax collecting
  (syntax-rules ()
    [(_) (void)]
    [(_ body ...)
     (let ([r '()])
       (define (clct it)
         (set! r (cons it r)) it)
       (syntax-parameterize ([collect (make-rename-transformer #'clct)])
         body ...
         (reverse r)))]))

As far as I can see there is still a problem, however: it is very hard to write macros which expand to other macros which themselves do pattern-matching, since the patterns get acquired by the outer macros. There must be some answer to this, but I can’t see what it is.

On the other hand, this is also extremely painful in CL: here is a version of collecting where collect is a local macro:

(defmacro collecting (&body forms)
  ;; collect lists forwards using a tail pointer
  ;; local macro version
  (let ((rn (make-symbol "R"))
        (rtn (make-symbol "RT"))
        (itn (make-symbol "IT")))
    `(let ((,rn '())
           (,rtn nil))
       (macrolet ((collect (form)
                    `(let ((,',itn ,form))
                       (if (not (null ,',rn))
                           (setf (cdr ,',rtn) (cons ,',itn nil)
                                 ,',rtn (cdr ,',rtn))
                         (setf ,',rn (cons ,',itn nil)
                               ,',rtn ,',rn))
                       ,',itn)))
         ,@forms)
       ,rn)))

This is not easy to understand.

Additionally, the problem almost always comes from ellipses, and in many interesting cases they can be avoided by using dotted pairs as patterns — here is yet another version of with-abort that does this:

(require racket/stxparam)

(define-syntax-parameter abort
  (λ (stx)
    (raise-syntax-error #f "not available" stx)))

(define-syntax with-abort
  (syntax-rules (with-abort)
    [(with-abort) (void)]
    [(with-abort body ...)
     (call/ec
      (λ (a)
        (syntax-parameterize ([abort
                               (syntax-rules (abort)
                                 [(abort . args) (a . args)])])


          body ...)))]))

This is clearly better than the CL version.

Summary

Well, I think I now know enough about Racket’s macros to be going on with: I can certainly write the macros I need to be able to write now without it just being cargo-cult programming. There are still things I don’t understand, and the whole system smells to me as if, by trying remain ideologically pure, it has become vast and essentially incomprehensible. This seems to be a common problem with Scheme, unfortunately.

Small notes

Macro definitions scope properly, so you can define a local macro the same way you can define a local function, so this works:

(define (foo ...)
  (define-syntax-rule (while test body ...)
    (let loop ()
      (when test
        body ...
        (loop))))
  ... (while ... ...) ...)

This makes the equivalent of CL’s MACROLET easy to do.

For fun, here is a version of with which can deal with named-let: There must be a way of implementing this without assignment, but I can never work out what it is.

(define-syntax (with stx)
  (syntax-case stx ()
    [(_ ())
     ;; all null
     #'(void)]
    [(_ () body ...)
     ;; no vars: trivial case
     #'(begin body ...)]
    [(_ ([var val] ...))
     ;; null body: make sure vars are evaluated
     #'(begin val ... (void))]
    [(_ ([var val] ...) body ...)
     ;; normal let
     #'((λ (var ...) body ...) val ...)]
    [(_ n ())
     (identifier? #'n)
     ;; named null
     #'(void)]
    [(_ n ([var val] ...))
     (identifier? #'n)
     ;; named null body
     #'(begin val ... (void))]
    [(_ n ([var val] ...) body ...)
     ;; named let with arguments
     ;; (is there an implementation without assignment?
     (identifier? #'n)
     #'((λ (n)
          ((λ (l)
             (set! n l)
             (l val ...))
           (λ (var ...) body ...)))
        #f)]
    [_ (raise-syntax-error #f "bad syntax" stx)]))

Things I still do not know or understand

At this point I’m mostly comfortable writing macros in Racket, but there are things I still do not understand:

  • protecting and arming syntax objects — I just don’t understand what this is about at all;
  • syntax-parse is, I think, not difficult but I have not bothered to learn about it as it seems to add yet another layer.
  • there are probably other things that I don’t even know I don’t know.

At some point I might write a further part of this series on some of that.


Pointers

Eli Barilay’s paper on syntax-parameterize.

Fear of Macros, again.