Disentangling iteration from value accumulation
Iteration forms and forms which accumulate values don’t have to be the same thing. I think that it turns out that separating them works rather well.
There’s no one true way to write programs, especially in Lisp1: a language whose defining feature is that it supports and encourages the seamless construction of new programming languages2. In particular there are plenty of different approaches to iteration, and to accumulating values during iteration. In CL there are at least three approaches in the base language:
- constructs which map a function over some ‘iterable’ object, often a list or a sequence of some other kind, to build another object with the results, as by
mapcarfor instance; - constructs which just iterate, as by
dotimes; - iteration constructs which combine iteration with possible value accumulation, such as
doand of courseloop.
What CL doesn’t have is any constructs which simply accumulate values. So, for instance, if you wanted to acquire the even numbers from a list with dolist you might write
(let ((evens '()))
(dolist (e l (nreverse evens))
(when (and (realp e) (evenp e))
(push e evens))))
Of course you could do this with loop:
(loop for e in l
when (and (realp e) (evenp e)) collect e)
but loop is a construct which combines iteration and value collection.
It’s tempting to say that, well, can’t you turn all iteration into mapping? Python sort of does this: objects can be ‘iterable’, and you can iterate over anything iterable, and then comprehensions let you accumulate values. But in general this doesn’t work very well: consider a file which you want to iterate over. But how? Do you want to iterate over its characters, its bytes, its lines, its words, over some other construct in the file? You can’t just say ‘a file is iterable’: it is, but you have to specify the intent before iterating over it3. You also have the problem that you very often only want to return some values, so the notion of ‘mapping’ is not very helpful. If you try and make everything be mapping you end up with ugly things like mapcan.
You do need general iteration constructs, I think: constructs which say ‘is there more? if there is give me the next thing’. In CL both the standard general iteration constructs combine, or can combine, iteration with accumulation: there is no pure general iteration construct. And there are no pure value accumulation constructs at all.
From Maclisp to CL
An interesting thing happened in the transition from Maclisp to CL.
Maclisp had prog, which was a special operator (it would have called it a special form), and which combined the ability to use go and to say return. This is a construct which dates back to the very early days of Lisp.
Common Lisp also has prog, but now it’s a macro, not a special operator. The reason its a macro is that CL has split the functionality of prog into three parts (four parts if you include variable binding):
prognis a special operator which evaluates the forms in its body in order;tagbodyis a special operator whch allows tags andgoin its body;blockis a special operator which supportsreturnandreturn-from- and of course
letprovides binding of variables.
Maclisp had let and progn: what it didn’t have was tagbody and block.
These can be combined (you don’t in fact need progn in this case) to form prog, which is something like
(defmacro prog ((&rest bindings)
&body tags/forms)
`(block nil
(let ,@bindings
(tagbody
,@tags/forms)
nil)))
So what CL has done is to divide prog into its component parts, which then can be used individually in other ways: it has provided the components of prog as individual constructs. You can build prog from these, but you can build other things as well (defun expands to something involving block, for instance), including things which don’t exist in base CL.
A linguistic separation of concerns
What CL has achieved is a separation of concerns at the language level: it has reduced the number of concerns addressed by each construct. It hasn’t done this completely: progn is not the only special operator which sequences the forms in its body, for instance, and let is not a macro defined in terms of lambda. But it’s taken steps in this direction compared to Maclisp.
This approach is really only viable for languages which have powerful macro systems where macros are not syntactically distinguished. Without a macro system then separating concerns at the language level would make almost all programs more verbose since constructs which combine lower-level ones can’t be created. With a macro system where macros are syntactically distinguished, such as Julia’s, then such constructs are always second-class citizens. With a macro system like CL’s this is no longer a problem: CL has prog, for instance, but it’s now a macro.
It seems to me that the only reason not to take this process as far as it can go in Lisps is if it makes the compiler’s job unduly hard. It makes no difference to users of the language, so long as it provides, as CL does the old, unseparated, convenient constructs.
From CL to here knows when
I can’t redesign CL and don’t want to do that. But I can experiment with building a language I’d like to use on top of it.
In particular CL has already provided the separated constructs you need to build your own iteration constructs, and no CL iteration constructs are special operators. Just as do is constructed from (perhaps) let, block and tagbody, and loop is constructed from some horrid soup if the same things, you can build your own iteration constructs this way. And the same is true for value accumulation constructs. And you can reasonably expect these to perform as well as the ones in the base language.
This is what I’ve done, several times in fact.
The first thing I built, long ago, was a list accumulation construct called collecting: within its body there is a local function, collect, which will accumulate a value onto the list returned from collecting. It secretly maintains a tail-pointer to the list so accumulation is constant-time. This was originally built to make it simpler to accumulate values when traversing tree or graph structures, to avoid the horrid and, in those days, slow explicit push … nreverse idiom.
So, for instance
(collecting
(labels ((walk (node)
...
(when ... (collect thing))
...
(dolist (...) (walk ...))))
(walk ...)))
might walk over some structure, collecting interesting things, and returning a list of them.
collecting was originally based on some ideas in Interlisp-D, and has since metastasized into a, well, collection of related constructs: multiple named collectors (collecting itself is now defined in terms of this construct), explicit collector objects, general accumulators and most recently a construct which accumulates values into vectors. It works pretty well.
The second part of the story is high-performance iteration constructs which just iterate, which are general, which are pleasant to use and have semantics which are easy to understand. Both loopand do fail the first three of these conditions for me, and loop fails the fourth as well.
Well, I’ve written a number of iteration constructs and constructs related to iteration. Finally, last year, my friend Zyni & I (the ideas are largely hers, I wrote most of the code I think) came up with Štar which we’ve described as ‘a simple and extensible iteration construct’. Lots of other people have written iteration constructs for CL: Štar occupies a position which tries to be as extreme as possible while remaining pleasant to use. There are no special keywords, the syntax is pretty much that of let and there is no value accumulation: all it does is iterate. The core of Štar exports six names, of which the three that support nested iteration are arguably unneeded in the same way that let* is. Teaching it how to iterate over things is simple, teaching it how to optimize such iterations is usually simple enough to do when it’s worth it. And it’s within $\varepsilon$ of anything in terms of performance.
It’s simple (at least in interface) and quick because it hardly does anything, of course: it relies entirely on iterators to do anything at all and iterator optimizers to do anything quickly. Even then all it does is, well, iterate.
These two components are thus attempts at separating the two parts of something like loop, Iterate or For, or other constructs which combine iteration and value accumulation: they are to these constructs what tagbody and block are to prog.
Reinventing the wheel
I used to ride bicycles a lot. And I got interested in the surprisingly non-obvious way that bicycle wheels work. After reading The bicycle wheel I decided that I could make wheels, and I did do that.
And a strange thing happened: although I rationally understood that the wheels I had made were as good or better than any other wheel, for the first little while after building them I was terrified that they would bend or, worse, collapse. There was no rational reason for this: it was just that for some reason I trusted my own workmanship less than I trusted whoever had made the off-the-shelf wheels they’d replaced (and, indeed, some of whose parts I had cannibalised to make them).
Of course they didn’t bend or collapse, and I still rode on one of them until quite recently.
The same thing happened with Štar: for quite a while after finishing it I had to work hard to force myself to use it: even though I knew it was fast and robust. It wasn’t helped that one of the basic early iterators was overcomplex and had somewhat fragile performance. It wasn’t until I gave up on it and replaced it by a much simpler and more limited one, while also making a much more general iterator fast enough to use for the complicated cases that it felt comfortable.
This didn’t happen with collecting: I think that’s because it did something CL didn’t already have versions of, while it’s very often possible to replace a construct using Štar with some nasty thing involving do or some other iteration construct. Also Štar is much bigger than collecting and it’s hard to remember that I’m not using a machine with a few MB of memory any more. Perhaps it’s also because I first wrote collecting a very long time ago.
But I got over this, and now almost the only times I’d use any other iteration construct are either when mapcar &c are obviously right, or when I’m writing code for someone else to look at.
And writing iterators is easy, especially given that you very often do not need optimizers for them: if you’re iterating over the lines in a file two function calls per line is not hurting much. Iterators, of course, can also iterate over recursively-defined structures such as trees or DAGs: it’s easy to say (for ((leaf (in-graph ... :only-leaves t))) ...).
Would it help?
In my biased experience, yes, quite a lot. I now much prefer writing and reading code that uses for to code that uses almost any of the standard iteration constructs, and collecting, together with its friends, simply does not have a standard equivalent at all: if you don’t have it, you need either to write it, or implement it explicitly each time.
But my experience is very biased: I have hated loop almost since it arrived in CL, and I find using do for anything non-trivial clumsy enough that I’ve previously written versions of it which require less repetition. And of course I was quite involved in the design and implementation of Štar, so it’s not surprising that I like it.
I’m also very comfortable with the idea that Lisp is about language design — in 2025 I don’t see any compelling advantage of Lisp other than constructing languages — and that people who write Lisp end up writing in their own idiolects. The argument against doing this seems to be that every Lisp project ends up being its own language and this means that it is hard to recruit people. I can only assume that the people who say that have never worked on any large system written in languages other than Lisp4: Greenspun’s tenth rule very much applies to these systems.
In summary: yes, it would help.
An example
In the examples directory for Štar there is an iterator called in-graph which can iterate over any graph, if it knows how to find the neighbours of a node. For instance:
> (for ((n (in-graph (list '(a b (c b) d))
(lambda (n)
(if (atom n) '() (cdr n))))))
(print n))
(a b (c b) d)
b
(c b)
b
d
nil
> (for ((n (in-graph (list '(a b (c b) d))
(lambda (n)
(if (atom n) '() (cdr n)))
:unique t)))
(print n))
(a b (c b) d)
b
(c b)
d
nil
> (for ((n (in-graph (list '(a b (c b) d))
(lambda (n)
(if (atom n) '() (cdr n)))
:order :breadth-first)))
(print n))
(a b (c b) d)
b
(c b)
d
b
nil
> (collecting (for ((n (in-graph (list '(a b (c b) d))
(lambda (n)
(if (atom n) '() (cdr n)))
:unique t
:only-leaves t)))
(collect n)))
(b d)
or
> (setf *print-circle* t)
t
> (for ((n (in-graph (list '#1=(a #2=(b c #1#) d #2#))
(lambda (n)
(if (atom n) '() (cdr n)))
:unique t)))
(print n))
#1=(a #2=(b c #1#) d #2#)
#1=(b c (a #1# d #1#))
c
d
nil
or
> (for ((p (in-graph (list *package*) #'package-use-list
:unique t :order :breadth-first)))
(format t "~&~A~%" (package-name p)))
COMMON-LISP-USER
ORG.TFEB.DSM
ORG.TFEB.HAX.ITERATE
ORG.TFEB.HAX.COLLECTING
ORG.TFEB.STAR
ORG.TFEB.TOOLS.REQUIRE-MODULE
COMMON-LISP
HARLEQUIN-COMMON-LISP
LISPWORKS
ORG.TFEB.HAX.UTILITIES
ORG.TFEB.HAX.SIMPLE-LOOPS
ORG.TFEB.HAX.SPAM
ORG.TFEB.DSM/IMPL
nil
in-graph is fairly simple, and uses both collectors and Štar in its own implementation:
(defun in-graph (roots node-neighbours &key
(only-leaves nil)
(order ':depth-first)
(unique nil)
(test #'eql)
(key #'identity))
;; Preorder / postorder would be nice to have
"Iterate over a graph
- ROOTS are the nodes to start from.
- NODE-NEIGHBOURS is a function which, given a node, returns its
neighbours if any.
- ORDER may be :DEPTH-FIRST (default) or :BREADTH-FIRST.
- UNIQUE, if given, will iterate nodes uniquely.
- TEST is the comparison test for nodes: it must be something
acceptable to MAKE-HASH-TABLE. Default is #'EQL.
- KEY, if given, extracts a key from a node for comparison in the
usual way.
There is no optimizer.
If the graph is cyclic an iteration using this will not terminate
unless UNIQUE is true, unless some other clause stops it. If the
graph is not directed you also need to use UNIQUE."
(check-type order (member :depth-first :breadth-first))
(let ((agenda (make-collector :initial-contents roots))
(duplicate-table (if unique (make-hash-table :test test) nil))
(this nil))
(values
(thunk ;predicate does all the work
(if (collector-empty-p agenda)
nil
(for ((it (stepping (it :as (pop-collector agenda)))))
(let ((neighbours (funcall node-neighbours it))
(k (and unique (funcall key it))))
(cond
((and unique (gethash k duplicate-table))
;; It's a duplicate: skip
(if (collector-empty-p agenda)
(final nil)
(next)))
((null neighbours)
;; Leaf, add it to the duplicate table if need be and say we found something
(when unique
(setf (gethash k duplicate-table) t))
(setf this it)
(final t))
(t
;; Not a leaf: update the agenda ...
(setf agenda
(case order
(:depth-first
(nconc-collectors (make-collector :initial-contents neighbours) agenda))
(:breadth-first
(nconc-collectors agenda (make-collector :initial-contents neighbours)))))
;; .. add it to the duplicate table if need be so it's
;; skipped next time ...
(when unique
(setf (gethash k duplicate-table) t))
;; ... and decide if we found something
(cond
(only-leaves
(if (collector-empty-p agenda)
(final nil)
(next)))
(t
(setf this it)
(final t)))))))))
(thunk this))))
-
‘Lisp’ here will usually mean ‘Common Lisp’. ↩
-
Although if you use
loopyou must accept that you will certainly suffer eternal damnation. Perhaps that’s worth it: Robert Johnson thought so, anyway. ↩ -
This is the same argument that explains why a universal equality predicate is nonsensical: equality of objects depends on what they are equal as and that is often not implicit in the objects. ↩
-
Or in Lisp, more than likely. ↩