How to understand closures in Common Lisp

2023-02-22 :: lisp, programming

The first rule of understanding closures is that you do not talk about closures. The second rule of understanding closures in Common Lisp is that you do not talk about closures. These are all the rules.

There is a lot of elaborate bowing and scraping about closures in the Lisp community. But despite that a closure isn’t actually a thing: the thing people call a closure is just a function which obeys the language’s rules about the scope and extent of bindings. Implementors need to care about closures: users just need to understand the rules for bindings. So rather than obsessing about this magic invisible thing which doesn’t actually exist in the language, I suggest that it is far better simply to think about the rules which cover bindings.

Angels and pinheads

It’s easy to see why this has happened: the CL standard has a lot of discussion of lexical closures, lexical and dynamic environments and so on. So it’s tempting to think that this way of thinking about things is ‘the one true way’ because it has been blessed by those who went before us. And indeed CL does have objects representing part of the lexical environment which are given to macro functions. Occasionally these are even useful. But there are no objects which represent closures as distinct from functions, and no predicates which tell you if a function is a closure or not in the standard language: closures simply do not exist as objects distinct from functions at all. They were useful, perhaps, as part of the text which defined the language, but they are nowhere to be found in the language itself.

So, with the exception of the environment objects passed to macros, none of these objects exist in the language. They may exist in implementations, and might even be exposed by some implementations, but from the point of the view of the language they simply do not exist: if I give you a function object you cannot know if it is a closure or not.

So it is strange that people spend so much time worrying about these objects which, if they even exist in the implementation, can’t be detected by anyone using the standard language. This is worrying about angels and pinheads: wouldn’t it be simpler to simply understand what the rules of the language actually say should observably happen? I think it would.

I am not arguing that the terminology used by the standard is wrong! All I am arguing is that, if you think you want to understand closures, you might instead be better off understanding the rules that give rise to them. And when you have done that you may suddenly find that closures have simply vanished into the mist: all you need is the rules.

History

Common Lisp is steeped in history: it is full of traces of the Lisps which went before it. This is intentional: one goal of CL was to enable programs written in those earlier Lisps — which were all Lisps at that time of course — to run without extensive modification.

But one place where CL didn’t steep itself in history is in exactly the areas that you need to understand to understand closures. Before Common Lisp (really, before Scheme), people spent a lot of time writing papers about the funarg problem and describing and implementing more-or-less complicated ways of resolving it. Then Scheme came along and decided that this was all nonsense and that it could just be made to go away by implementing the language properly. And the Common Lisp designers, who knew about Scheme, said that, well, if Scheme can do this, then we can do this as well, and so they also made it the problem vanish, although not in quite such an extreme way as Scheme did.

And this is now ancient history: these predecessor Lisps to CL are all at least 40 years old now. I am, just, old enough to have used some of them when they were current, but for most CL programmers these questions were resolved before they were born. The history is very interesting, but you do not need to steep yourself in it to understand closures.

Bindings

So the notion of a closure is part of the history behind CL: a hangover from the time when people worried about the funarg problem; a time before they understood that the whole problem could simply be made to go away. So, again, if you think you want to understand closures, the best approach is to understand something else: to understand bindings. Just as with closures, bindings do not exist as objects in the language, although you can make some enquiries about some kinds of bindings in CL. They are also a concept which exists in many programming languages, not just CL.

A binding is an association between a name — a symbol — and something. The most common binding is a variable binding, which is an association between a name and a value. There are other kinds of bindings however: the most obvious kind in CL is a function binding: an association between a name and a function object. And for example within a (possibly implicit) block there is a binding between the name of the block and a point to which you can jump. And there are other kinds of bindings in CL as well, and the set is extensible. The CL standard only calls variable bindings ‘bindings’, but I am going to use the term more generally.

Bindings are established by some binding construct and are usually not first-class objects in CL: they are just as vaporous as closures and environments. Nevertheless they are a powerful and useful idea.

What can be bound?

By far the most common kind of binding is a variable binding: an association between a name and a value. However there are other kinds of bindings: associations between names and other things. I’ll mention those briefly at the end, but in everything else that follows it’s safe to assume that ‘binding’ means ‘variable binding’ unless I say otherwise.

Scope and extent

For both variable bindings and other kinds of bindings there are two interesting questions you can ask:

where is the binding available?
when is the binding visible?

The first question is about the scope of the binding. The second is about the extent of the binding.

Each of these questions has (at least) two possible answers giving (at least) four possibilities. CL has bindings which use three of these possibilities and the fourth in a restricted case: two and a restricted version of a third for variable bindings, the other one for some other kinds of bindings.

Scope. The two options are:

the binding may be available only in code where the binding construct is visible;
or the binding may be available during all code which runs between where the binding is established and where it ends, regardless of whether the binding construct is visible.

What does ‘visible’ mean? Well, given some binding form, it means that the bindings it establishes are visible to all the code that is inside that form in the source. So, in a form like (let ((x 1)) ...) the binding of x is visible to the code that replaces the ellipsis, including any code introduced by macroexpansion, and only to that code.

Extent. The two options are:

the binding may exist only during the time that the binding construct is active, and goes away when control leaves it;
or the binding may exist as long as there is any possibility of reference.

Unfortunately the CL standard is, I think, slightly inconsistent in its naming for these options. However I’m going to use the standard’s terms with one exception. Here they are.

Scope:

when a binding is available only when visible this called lexical scope;
when a binding available to all code within the binding construct this is called indefinite scope¹;

Extent:

when a binding ends at the end of the binding form this is called dynamic extent²;
when a binding available indefinitely this called indefinite extent.

The term from the standard I am not going to use is dynamic scope, which it defines to mean the combination of indefinite scope and dynamic extent. I am not going to use this term because I think it is confusing: although it has ‘scope’ in its name it concerns both scope and extent. Instead I will introduce better, commonly used, terms below for the interesting combinations of scope and extent.

The four possibilities for bindings are then:

lexical scope and dynamic extent;
lexical scope and indefinite extent;
indefinite scope and dynamic extent;
indefinite scope and indefinite extent.

The simplest kind of binding

So then let’s ask: what is the simplest kind of binding to understand? If you are reading some code and you see a reference to a binding then what choice from the above options will make it easiest for you to understand whether that reference is valid or not?

Well, the first thing is that you’d like to be able to know by looking at the code whether a reference is valid or not. That means that the binding construct should be visible to you, or that the binding should have lexical scope. Compare the following two fragments of code:

(defun simple (x)
  ...
  (+ x 1)
  ...)

and

(defun confusing ()
  ...
  (+ *x* 1)
  ...)

Well, in the first one you can tell, just by looking at the code, that the reference to x is valid: the function, when called, establishes a binding of x and you can see that when reading the code. In the second one you just have to assume that the reference to *x* is valid: you can’t tell by reading the code whether it is or not.

Lexical scope makes it easiest for people reading the code to understand it, and in particular it is easier to understand than indefinite scope. It is the simplest kind of scoping to understand for people reading the code.

So that leaves extent. Well, in the two examples above definite or indefinite extent makes no difference to how simple the code is to understand: once the functions return there’s no possibility of reference to the bindings anyway. To expose the difference we need somehow to construct some object which can refer to a binding after the function has returned. We need something like this:

(defun maker (x)
  ...
  <construct object which refers to binding of x>)

(let ((o (maker 1)))
  <use o somehow to cause it to reference the binding of x>)

Well, what it this object going to be? What sort of things reference bindings? Code references bindings, and the objects which contain code are functions³. What we need to do is construct and return a function:

(defun maker (x)
  (lambda (y)
    (+ x y)))

and then cause this function to reference the binding by calling it:

(let ((f (maker 1)))
  (funcall f 2))

So now we can, finally, ask: what is the choice for the extent of the binding of x which makes this code simplest to understand? Well, the answer is that unless the binding of x remains visible to the function that is created in maker, this code can’t work at all. It would have to be the case that it was simply not legal to return functions like this from other functions. Functions, in other words, would not be first-class objects.

Well, OK, that’s a possibility, and it makes the above code simple to understand: it’s not legal and it’s easy to see that it is not. Except consider this small variant on the above:

(defun maybe-maker (x return-identity-p)
  (if return-identity-p
      #'identity
    (lambda (y)
      (+ x y))))

There is no way to know from reading this code whether maybe-maker will return the nasty anonymous function or the innocuous identity function. If it is not allowed to return anonymous functions in this way then there is no way of knowing whether

(funcall (maybe-maker 1 (zerop (random 2)))
         2)

is correct or not. This is certainly not simple: in fact it is a horrible nightmare. Another way of saying this is that you’d be in a situation where

(let ((a 1))
  (funcall (lambda ()
             a)))

would work, but

(funcall (let ((a 1))
           (lambda ()
             a)))

would not. There are languages which work that way: those languages suck.

So what would be simple? What would be simple is to say that if a binding is visible, it is visible, and that’s the end of the story. In a function like maker above the binding of x established by maker is visible to the function that it returns. Therefore it’s visible to the function that maker returns: without any complicated rules or weird special cases. That means the binding must have indefinite extent.

Indefinite extent makes it easiest for people reading the code to understand it when that code may construct and return functions, and in particular it is easier to understand than dynamic extent, which makes it essentially impossible to tell in many cases whether such code is correct or not.

And that’s it: lexical scope and indefinite extent, which I will call lexical binding, is the simplest binding scheme to understand for a language which has first-class functions⁴.

And really that’s it: that’s all you need to understand. Lexical scope and indefinite extent make reading code simple, and entirely explain the things people call ‘closures’ which are, in fact, simply functions which obey these simple rules.

Examples of the simple binding rules

One thing I have not mentioned before is that, in CL, bindings are mutable, which is another way of saying that CL supports assignment: assignment to variables is mutation of variable bindings. So, as a trivial example:

(defun maximum (list)
  (let ((max (first list)))
    (dolist (e (rest list) max)
      (when (> e max)
        (setf max e)))))

This is very easy to understand and does not depend on the binding rules in detail.

But, well, bindings are mutable, so the rules which say they exist as long as they can be referred to also imply they can be mutated as long as they can be referred to: anything else would certainly not be simple. So here’s a classic example of this:

(defun make-incrementor (&optional (value 0))
  (lambda (&optional (increment 1))
    (prog1 value
      (incf value increment))))

And now:

> (let ((i (make-incrementor)))
    (print (funcall i))
    (print (funcall i))
    (print (funcall i -2))
    (print (funcall i))
    (print (funcall i))
    (values))

0
1
2
0
1

As you can see, the function returned by make-incrementor is mutating the binding that it can still see.

What happens when two functions can see the same binding?

(defun make-inc-dec (&optional (value 0))
  (values
   (lambda ()
     (prog1 value
       (incf value)))
   (lambda ()
     (prog1 value
       (decf value)))))

And now

> (multiple-value-bind (inc dec) (make-inc-dec)
    (print (funcall inc))
    (print (funcall inc))
    (print (funcall dec))
    (print (funcall dec))
    (print (funcall inc))
    (values))

0
1
2
1
0

Again, what happens is the simplest thing: you can see simply from reading the code that both functions can see the same binding of value and they are therefore both mutating this common binding.

Here is an example which demonstrates all these features: an implementation of a simple queue as a pair of functions which can see two shared bindings:

(defun make-queue ()
  (let ((head '())
        (tail nil))
    (values
     (lambda (thing)
       ;; Push thing onto the queue
       (if (null head)
           ;; It's empty currently so set it up
           (setf head (list thing)
                 tail head)
         ;; not empty: just adjust the tail
         (setf (cdr tail) (list thing)
               tail (cdr tail)))
       thing)
     (lambda ()
       (cond
        ((null head)
         ;; empty
         (values nil nil))
        ((null (cdr head))
         ;; will be empty: don't actually need this case but it is
         ;; cleaner
         (values (prog1 (car head)
                   (setf head '()
                         tail nil))
                 t))
        (t
         ;; will still have content
         (values (pop head) t)))))))

make-queue will return two functions:

the first takes one argument which it appends to the queue;
the second takes no argument and either the next element of the queue and t or nil and nil if the queue is empty.

So, with this little function to drain the queue

(defun drain-and-print (popper)
  (multiple-value-bind (value fullp) (funcall popper)
    (when fullp
      (print value)
      (drain-and-print popper))
    (values)))

we can see this in action

> (multiple-value-bind (pusher popper) (make-queue)
    (funcall pusher 1)
    (funcall pusher 2)
    (funcall pusher 3)
    (drain-and-print popper))

1
2
3

A less-simple kind of binding which is sometimes very useful

Requiring bindings to be simple usually makes programs easy to read and understand. But it also makes it hard to do some things. One of those things is to control the ‘ambient state’ of a program. A simple example would be the base for printing numbers. It’s quite natural to say that ‘in this region of the program I want numbers printed in hex’.

If all we had was lexical binding then this becomes a nightmare: every function you call in the region you want to cause printing to happen in hex needs to take some extra argument which says ‘print in hex’. And if you then decide that, well, you’d also like some other ambient parameter, you need to provide more arguments to every function⁵. This is just horrible.

You might think you can do this with global variables which you temporarily set: that is both fiddly (better make sure you set it back) and problematic in the presence of multiple threads⁶.

A better approach is to allow dynamic bindings: bindings with indefinite scope & dynamic extent. CL has these, and at this point history becomes unavoidable: rather than have some separate construct for dynamic bindings, CL simply says that some variable bindings, and some references to variable bindings, are to be treated as having indefinite scope and dynamic extent, and you tell the system which bindings this applies to withspecial declarations / proclamations. CL does this because that’s very close to how various predecessor Lisps worked, and so makes porting programs from them to CL much easier. To make this less painful there is a convention that dynamically-bound variable names have *stars* around them, of course.

Dynamic bindings are so useful that if you don’t have them you really need to invent them: I have on at least two occasions implemented a dynamic binding system in Python, for instance.

However this is not an article on dynamic bindings so I will not write more about them here: perhaps I will write another article later.

What else can be bound?

Variable bindings are by far the most common kind. But not the only kind. Other things can be bound. Here is a partial list⁷:

local functions have lexical scope and indefinite extent;
block names have lexical scope and definite extent (see below);
tag names have lexical scope and definite extent (see below);
catch tags have indefinite scope and definite extent;
condition handlers have indefinite scope and definite extent;
restarts have indefinite scope and definite extent.

The two interesting cases here are block names and tag names. Both of these have lexical scope but only definite extent. As I argued above this makes it hard to know whether references to them are valid or not. Look at this, for example:

(defun outer (x)
  (inner (lambda (r)
           (return-from outer r))
         x))

(defun inner (r rp)
  (if rp
      r
    (funcall r #'identity)))

So then (funcall (outer nil) 1) will: call inner with a function which wants to return from outer and nil, which will cause inner to call that function, returning the identity function, which is then called by funcall with argument 1: the result is 1.

But (funcall (outer t) 1) will instead return the function which wants to return from outer, which is then called by funcall which is an error since it is outside the dynamic extent of the call to outer.

And there is no way that either a human reading the code or the compiler can detect that this is going to happen: a very smart compiler might perhaps be able to deduce that the internal function might be returned from outer, but probably only because this is a rather simple case: for instance in

(defun nasty (f)
  (funcall f (lambda ()
               (return-from nasty t))))

the situation is just hopeless. So this is a case where the binding rules are not as simple as you might like.

What is simple?

For variable bindings I think it’s easy to see that the simplest rule for a person reading the code is lexical binding. The other question is whether that is simpler for the implementation. And the answer is that probably it is not: probably lexical scope and definite extent is the simplest implementationally. That certainly approximates what many old Lisps did⁸. It’s fairly easy to write a bad implementation of lexical binding, simply by having all functions retain all the bindings, regardless of whether they might refer to them. A good implementation requires more work. But CL’s approach here is that doing the right thing for people is more important than making the implementor’s job easier. And I think that approach has worked well.

On the other hand CL hasn’t done the right thing for blocks and tags: There are at least three reasons for this.

Implementational complexity. If the bindings had lexical scope and indefinite extent then you would need to be able to return from a block which had already been returned from, and go to a tag from outside the extent of the form that established it. That opens an enormous can of worms both in making such an implementation work at all but also handling things like dynamic bindings, open files and so on. That’s not something the CL designers were willing to impose on implementors.

Complexity in the specification. If CL had lexical bindings for blocks and tags then the specification of the language would need to describe what happens in all the many edge cases that arise, including cases where it is genuinely unclear what the correct thing to do is at all such as dealing with open files and so on. Nobody wanted to deal with that, I’m sure: the language specification was already seen as far too big and the effort involved would have made it bigger, later and more expensive.

Conceptual difficulty. It might seem that making block bindings work like lexical variable bindings would make things simpler to understand. Well, that’s exactly what Scheme did with call/cc and call/cc can give rise to some of the most opaque code I have ever seen. It is often very pretty code, but it’s not easy to understand.

I think the bargain that CL has struck here is at least reasonable: to make the common case of variable bindings simple for people, and to avoid the cases where doing the right thing results in a language which is harder to understand in many cases and far harder to implement and specify.

Finally, once again I think that the best way to understand how closures in CL is not to understand them: instead understand the binding rules for variables, why they are simple and what they imply.

indefinite scope is often called ‘dynamic scope’ although I will avoid this term as it is used by the standard to mean the combination of indefinite scope and dynamic extent. ↩
Dynamic extent could perhaps be called ‘definite extent’, but this is not the term that the standard uses so I will avoid it. ↩
Here and below I am using the term ‘function’ in the very loose sense that CL usually uses it: almost none of the ‘functions’ I will talk about are actually mathematical functions: they’re what Scheme would call ‘procedures’. ↩
For languages which don’t have first-class functions or equivalent constructs, lexical scope and definite extent is the same as lexical scope and indefinite extent, because it is not possible to return objects which can refer to bindings from the place those bindings were created. ↩
More likely, you would end up making every function have, for instance an ambient keyword argument whose value would be an alist or plist which mapped between properties of the ambient environment and values for them. All functions which might call other functions would need this extra argument, and would need to be sure to pass it down suitably. ↩
This can be worked around, but it’s not simple to do so. ↩
In other words ‘this is all I can think of right now, but there are probably others’. ↩
Very often old Lisps had indefinite scope and definite extent in interpreted code but lexical scope and definite extent in compiled code: yes, compiled code behaved differently to interpreted code, and yes, that sucked. ↩