Macros in Racket, part one

2015-01-13 :: computer, lisp

I’ve written in Lisp for a long time, but I’ve never used a hygienic macro system in any way other than the most simple. Here are some initial notes on my experiences learning Racket’s macro system.

This is the first part of several: see part two and part three. I’m not completely fluent with Racket macros yet: there are almost certainly mistakes and confusions here. Despite appearances, I also have no axe to grind: I’m learning Racket because I want to and I have time. Finally this is not a tutorial: look at Greg Hendershott’s Fear of Macros for something closer to that. This is just some notes which were useful to me, and might be useful to other CL people.

Macros in Common Lisp

Common Lisp’s macro system is, in essence, simple: it’s what you’d end up writing if you had to write a macro system for a Lisp. That’s not surprising because it is the descendent of the first macro systems people wrote for Lisp. In CL what happens is this:

the reader ingests the source text and produces data structures which represent the source of the program;
these structures are possibly transformed by macros, which are simply Lisp functions which are given the Lisp representation of the source and return some other representation;
once all macros are expanded, then the code is compiled, evaluated or both.

(I have missed out some subtleties here, but they don’t matter for my purposes.)

In CL, what the reader produces is exactly what you would expect. If it reads "(defun foo (a) a)" then, with standard settings, it returns a list whose car is the symbol DEFUN (in the CL package) and so on. It is this structure that macros transform.

CL provides relatively limited support for writing macros: there is backquote, which is critical to being able to write macros which are even slightly readable, limited pattern matching in the form of destructuring, and there are mechanisms to generate unique names as well a few other things. There is a semi-standard way of enquiring about bindings in the environment at macro expansion time, although this is not in the standard.

In practice, CL’s macro system has turned out to work very well; in theory it has all sorts of problems, the most important being that the programmer is entirely responsible for making sure that macros don’t introduce or accidentally use names they should not. Consider this:

(defmacro collecting (&body forms)
  ;; collect lists forwards using a tail pointer
  ;; polluting version
  `(let ((r '())
         (rt nil))
     (flet ((collect (form)
              (if (not (null r))
                  (setf (cdr rt) (cons form nil)
                        rt (cdr rt))
                (setf r (cons form nil)
                      rt r))
              form))
       ,@forms)
     r))

This intentionally introduces a function binding, collect, but also accidentally introduces bindings for r and rt.

(let ((r 2))
  (collecting
    (+ r r)))

Does not do what it should. One right way to write the collecting macro is like this:

(defmacro collecting (&body forms)
  ;; collect lists forwards using a tail pointer
  ;; non-polluting version
  (let ((rn (make-symbol "R"))
        (rtn (make-symbol "RT")))
    `(let ((,rn '())
           (,rtn nil))
       (flet ((collect (form)
                (if (not (null ,rn))
                    (setf (cdr ,rtn) (cons form nil)
                          ,rtn (cdr ,rtn))
                  (setf ,rn (cons form nil)
                        ,rtn ,rn))
                form))
         ,@forms)
       ,rn)))

And now the above form does not signal an error and correctly returns ().

Note that the problem is with names and not just bindings. Consider this CL code:

(defvar *stashes* '())
(defvar *mark* nil)

(defun stash (name thing)
  ;; Stash something under a name
  (setf *stashes* (acons name thing *stashes*))
  (values name thing))

(defun retrieve (name)
  ;; Retrieve the value of a name, dropping everything stashed more
  ;; recently, and stopping at the mark, if any.
  (let ((mark *mark*))
    (labels ((rl (tail)
               (if (or (null tail)
                       (eq (first tail) mark))
                   (values nil nil)
                 (destructuring-bind ((n . v) . r) tail
                   (if (eql n name)
                       (progn
                         (setf *stashes* r)
                         (values v t))
                     (rl r))))))
      (rl *stashes*))))

(defmacro with-marked-stash (&body forms)
  ;; mark the stack of stashes for the dynamic extent of FORMS
  (let ((mn (make-symbol "MARK")))
    `(let ((*stashes* (cons ',mn *stashes*))
           (*mark* ',mn))
       ,@forms)))

In this code the marks on the stack of stashes established by with-marked-stash are not bound anywhere: they are just names. But it’s important to the correct functioning of the code that they are unique names. (There are better ways of doing this such as using a fresh cons for the mark: I just wanted an example where a name mattered other than as the name of a variable.)

The politically correct way of saying that we’re talking about names is to talk about ‘lexical context’ or ‘lexical information’: it’s the same thing but more confusing to those not initiated into the cult, which is always good.

The disadvantages of the CL macro system are this problem with hygiene and the lack of any clever tools to do pattern matching on macro forms. The second of these is easily overcome by using any of a number of tools, while the first is generally not a problem in practice: CL being a Lisp–2 (separate namespaces for functions and variables) helps here.

The advantage of the CL macro system is that there is no magic: macros get passed the things that the source code looks like — generally a structure whose interesting parts are lists and symbols — which you process using the normal list-processing tools to produce some other structure which is the expansion of the macro. It’s easy enough that you could write it yourself: there are no special opaque objects being handed around.

That being said, having a standard set of tools for pattern matching in macros and a way of dealing with the hygiene problems which is less ugly than in CL might well be worth the cost in transparency.

Macros in Scheme

I am not a native Scheme person, but it has clearly taken the whole hygiene thing very seriously: Scheme, as a set of languages, treats purity as much more than CL, which revels in being a fairly grungy language, does. However these posts are not about Scheme: the only reason I am mentioning it is to say that I have not cared at all whether anything here applies generally to Scheme or is specific to Racket.

Macros in Racket: baby steps

For a long time the only kind of macros that I’ve really been able to define in Racket are annoyingly trivial ones using define-syntax-rule, things like:

(define-syntax-rule (while test body ...)
  (let loop ()
    (when test
      body ...
      (loop))))

That’s all very well, but the ‘obvious’ (and obviously wrong) definition of collect then looks like this:

(define-syntax-rule (collecting body ...)
  ;; horribly wrong	
  (let ([s '()])
    (define (collect it)
      (set! s (cons it s))
      it)
    body ...
    (reverse s)))

(There’s no obvious way to build lists backwards in Racket: reversing the list is probably as cheap as anything). This is either introducing a spurious binding for s or not introducing a deliberate one for collect, and in fact, of course, it’s the latter.

Quite apart from this, define-syntax-rule gives the strong impression that it lets you write only the sort of macros that would give people who write C++ great pride: simple ones. (Actually you can do reasonably hairy things even with this because the pattern matching is very competent:

(define-syntax-rule (mlet ([var val] ...) body ...)
  ((λ (var ...) body ...) val ...))

is an implementation of simple let, for instance. Indeed we can defined named let as well:

(define-syntax-rule (nlet label ([var val] ...) body ...)
  (mlet ()
    (define (label var ...) body ...)
    (label val ...)))

What I can’t work out how to do is to make mlet do both things: I think this is too hard for define-syntax-rule although I might be wrong.)

But for a long time I was stuck with that: whenever I looked at Racket macros in more detail I walked into a wall of opaque terminology and just decided that I had better things to do that year. This year, I don’t.

Two desirable macros

There are many ways people use macros in Lisp: some of them are good. I decided that if I could write two macros and understand them then I would be well on my way.

collecting / collect. This is the macro given above in CL. It’s interesting not for what it does — the tail-pointer stuff is less interesting now than it once was and is hard to implement in Racket anyway — but because it introduces a binding: it is intentionally not completely hygienic, while having an essentially trivial expansion: no complicated destructuring is needed.
CL’s let, which I’ll call clet. This is interesting because it requires destructuring of arguments which is not completely simple, but it does not present problems of hygiene. The reason it’s not just a subset of Racket’s let is that CL allows variables with no initial value, which get bound to nil and should, I think, become undefined in Racket. So (clet ((x 1) y) body ...) should expand to (let ([x 1] [y undefined]) body ...) or something equivalent to that.

Here is a simple implementation of clet in CL, missing any error checking:

(defmacro clet (bindings &body forms)
  (multiple-value-bind (args vals)
      (loop for binding in bindings
            for consp = (consp binding)
            collect (if consp (first binding) binding) into as
            collect (if consp (second binding) nil) into vs
            finally (return (values as vs)))
    `((lambda (,@args) ,@forms) ,@vals)))

Like most macros in CL it’s not particularly pretty but it is reasonably clear what it does.

I will use these two macros as examples below.

Phases

To understand macros in any Lisp you need to develop a strong idea of the various ‘times’ that things happen and the relationships between them: for CL these are things like read time, macro expansion time, compilation time (compiler-macro expansion time), load time, run time and so on. Racket has formalised the parts of this after read time into a notion of ‘phase’:

phase 0 is run-time;
phase 1 is macro expansion time;
phase 2 would, I think, be macros used in macro expansion;
and so on.

However I am not sure how this ties in to read time: is that phase 1? For CL read time is before macro expansion time although the two are, or may be, interleaved at the granularity of forms (rather than a per-file or per-compilation-unit). Also there are negative phases which I don’t understand, although I think they must be to do with code which exists at macro expansion time (phase 1) wanting to make things available at run time (phase 0). All of this is integrated into the module system (and CL gets away without it mostly because it does not have a formalised module system).

Bindings exist at a phase, and the same name can have different bindings at different phases.

Modules can say what they provide at which phase, and, importantly, the racket module does indeed provide different things at different phases: if you look at it you’ll find:

(provide ...
         (for-syntax (all-from-out racket/base)))

Which means that, at phase 1, what is available is racket/base: a significantly smaller language than racket itself. If you need things in macros which are in racket but not racket/base you need to require them:

(require (for-syntax ...))

An example of this is first & rest, both of which are provided at phase 0 by racket but not at phase one: if you want them you need to say (require (for-syntax racket/list)).

Syntax objects

As in CL, Racket macros are source-to-source functions. The difference is that in Racket the source is represented by a syntax object and a macro needs to produce another syntax object, while in CL source is represented as it looks: usually as nested lists.

So then a Racket macro is simply a function which maps from syntax objects to other syntax objects. The reason for having an opaque syntax object is that it can carry around all sorts of information around with it, and in particular it can carry information about names, which help the system maintain hygiene. (There is also information about source location and so on, but this isn’t so important.)

So the Racket macro system needs tools to transform syntax objects into other syntax objects, ultimately by digging around inside them to find out what the source code actually was. This is necessarily more complicated than it is in CL both because the objects are opaque and because they contain information which is not present at all in the objects CL macros get.

Additionally, and mostly independently, there is a layer on top of this which does not exist in CL (without libraries) at all: pattern matching and template filling. This means that for many purposes you can write macros in Racket simply by specifying patterns that the source must match and filling templates with the results of those matches. This is a very nice way of writing macros, although it renders what is actually going on even more opaque. For a CL person, used to feeling the bits between their toes, this can be quite disconcerting at first since what is actually happening can become entirely obscure.

Syntax objects for the unwashed Lisp hacker

Well, of course it is possible to ignore all this terrifyingly modern pattern matching stuff and write macros almost the way you do in CL, and it’s worth doing that at least once, perhaps. So here is clet:

(require (for-syntax racket/list)
         racket/undefined)

(define-syntax clet
  (λ (stx)
    (define ctx (quote-syntax clet))
    (define top-level (syntax->list stx))
    (define bindings (second top-level))
    (define body (rest (rest top-level)))
    (define-values (args vals)
      (for/lists (as vs) ([binding (syntax->list bindings)])
        (define it (syntax->list binding))
        (if it
            (values (first it) (second it))
            (values binding (datum->syntax ctx 'undefined)))))
    (datum->syntax 
     ctx
     `((λ (,@args) ,@body) ,@vals))))

So how does this work? Well, it uses some functions provided by Racket to look inside the syntax object (getting the ‘datum’ in the syntax object) and in turn to construct a new one:

syntax->list takes a syntax object which wraps a proper list and unpacks one level of it, returning a list of syntax objects, or #f if it does not wrap a proper list;
datum->syntax takes a context object and a datum and wraps it into a syntax object, leaving any syntax objects in the datum as they are;
quote-syntax is like quote but it creates a syntax object, and this object contains the lexical information present in the source.

So the macro pulls apart the syntax object in a fairly straightforward way: making it into a list, extracting the second element and all the remaining elements, which will be the binding specifications, and then grinding over the binding specifications, using syntax->list both to work out if the bindings are a list or not and to extract the variable and value if it is, and then reassembles everything as a call to an anonymous function.

The critical trick is that the context that datum->syntax needs is a syntax object and you need to pick the right one: you can use the syntax object you got given, which provides the context of the place where the macro was expanded, or you can use a syntax object of your own devising which provides that object’s context. And in this case we want our own context, not the context of place where the macro was expanded. This is what ctx is for: providing a suitable context.

Notice the require:

we need racket/list at phase 1 (macro expansion time) because the macro uses first and so on;
we need racket/undefined at phase 0 (run time) as the expansion of the macro uses undefined.

So we can try this:

(clet ((x 12) y) (values x y))
12
#<undefined>
> (let ((undefined 'hello)) (clet (x) x))
#<undefined>
> (clet ((undefined 'hello)) (clet (x) x))
#<undefined>
> (clet ((x 1)))
λ: bad syntax in: (λ (x))
> (clet (1) 1)
λ: not an identifier, identifier with default, or keyword in: 1

The second and third examples show why we need the macro context: we don’t want a binding of undefined to alter what the clet picks as the undefined value. The fourth and fifth examples show that the macro isn’t very robust, and has terrible error reporting.

Some notes:

I’ve deliberately written (define-syntax clet (λ (stx) ...) rather than the more pleasant (define-syntax (clet stx) ...) to make it clear that clet is a function which transforms a syntax object;
but I’ve used internal define where in CL there would be let* or nested lets — I’m not sure why other than reducing indentation;
the destructuring of the syntax object is done in a way which is primitive even by the standards of CL;
it should be evident that the macro is not very robust — something like (clet ((x 1) 2) ...) will fail horribly;
it’s not much less clear than the CL version, although I think it is a bit less clear.

I am fairly but not completely sure that this macro is right: I am slightly confused by the handling of undefined: although it is easy to check, by wrapping clet into a module, that clients of that module don’t themselves need to import racket/undefined and do get the right initial values in forms like (clet (x) ...) I am still a bit queasy about what it’s doing.

What is very clear is that this macro is just horrible: even by the standards of CL macros it’s horrible, because there is so much explcit unpacking and repacking going on. Things would be even worse if there was any significant error checking. Something better than this is needed to deal with syntax objects, in a way that it isn’t needed for CL macros. In next week’s exciting episode I’ll look at ways of making this better.

Pointers

Writing ‘syntax-case’ Macros by Eli Barzilay. This was the article that first helped me understand what was going on.

Fear of Macros by Greg Greg Hendershott. This is an introduction to macros, and macros in Racket in particular, by the author of Frog.