Structures of arrays
Or, second system.
A while ago, I decided that I’d like to test my intuition that Lisp (specifically implementations of Common Lisp) was not, in fact, bad at floating-point code and that the ease of designing languages in Lisp could make traditional Fortran-style array-bashing numerical code pretty pleasant to write.
I used an intentionally naïve numerical solution to a gravitating many-body system as a benchmark, so I could easily compare Lisp & C versions. The brief result is that the Lisp code is a little slower than C, but not much: Lisp is not, in fact, slow. Who knew?
The point here though, is that I wanted to dress up the array-bashing code so it looked a lot more structured. To do this I wrote a macro which hid what was in fact an array of (for instance) double floats behind a bunch of syntax which made it look like an array of structures. That macro took a couple of hours.
This was fine and pretty simple, but it only dealt with a single type for each conceptual array of objects, there was no inheritance and it was restricted in various other ways. In particular it really was syntactic sugar on a vector: there was no distinct implementational type at all. So I thought well, I could make it more general and nicer.
Big mistake.
The second system
Here is an example of what I wanted to be able to do (this is in fact the current syntax):
(define-soa-class example ()
((x :array t :type double-float)
(y :array t :type double-float)
(p :array t :type double-float :group pq)
(q :array t :type double-float :group pq)
(r :array t :type fixnum)
(s)))
This defines a class, instances of which have five array slots and one scalar slot. Of the array slots:
xandyshare an array and will be neighbouring elements;pandqshare a different array, because the group option says they must not share withxandy;rwill be in its own array, unless the upgraded element type offixnumis the same as that ofdouble-float;sis just a slot.
The implementation will tell you this:
> (describe (make-instance 'example :dimensions '(2 2)))
#<example 8010059EEB> is an example
[...]
dimensions (2 2)
total-size 4
rank 2
tick 1
its class example has a valid layout
it has 3 arrays:
index 0, element type double-float, 2 slots
index 1, element type (signed-byte 64), 1 slot
index 2, element type double-float, 2 slots
it has 5 array slots:
name x, index 0 offset 0
name y, index 0 offset 1
name r, index 1 offset 0
name p, index 2 offset 0
name q, index 2 offset 1
This is already too complicated: the ability to control sharing via groups is almost certainly never going to be useful: it’s only even there because I thought of it quite early on and never removed it.
The class definition macro then needs to arrange life so that enough information is available so that a macro can be written which turns indexed slot access into indexed array access of the underlying arrays which are secretly stored in instances, inserting declarations to make this as fast as possible: anything slower than explicit array access is not acceptable. This might (and does) look like this, for example:
(with-array-slots (x y) (thing example)
(for* ((i ...) (j ...))
(setf (x i j) (- (y i j) (y j i)))))
As you can see from this, the resulting objects should be allowed to have rank other than 1. Inheritance should also work, including for array slots. Redefinition should be supported and obsolete macro expansions and instances at least detected.
In other words there are exactly two things I should have aimed at achieving: the ability to define fields of various types and have them grouped into (generally fewer) underlying arrays, and an implementational type to hold these things. Everything else was just unnecessary baggage which made the implementation much more complicated than it needed to be.
I had not finished making mistakes. The system needs to store some metadata about how slots map onto the underlying arrays, element types and so on, so the macro can use this to compile efficient code. There are two obvious ways to do this: use the property list of the class name, or subclass standard-class and store the metadata in the class. The first approach is simple, portable, has clear semantics, but it’s ‘hacky’; the second is more complicated, not portable, has unclear semantics1, but it’s The Right Thing2. Another wrong decision I made without even trying.
The only thing that saved me was that the nature of software is that you can only make a finite number of bad decisions in a finite time.
More bad decisions
I was not done. Early on, I thought that, well, I could make this whole thing be a shim around defstruct: single inheritance was more than enough, and obviously I could store metadata on the property list of the type name as described above. And there’s no nausea with multiple accessors or any of that nonsense.
But, somehow, I found writing a thing which would process the (structure-name ...) case of defstruct too painful, so I decided to go for the shim-around-defclass version instead. I even have a partly-complete version of the defstructy code which I abandoned. Another mistake.
I also decided that The Right Thing was to have the system support objects of rank 0. That constrains the underlying array representation (it needs to use rank \(n+1\) arrays for an object of rank \(n\)) in a way which I thought for a long time might limit performance.
Things I already knew
At any point during the implementation of this I could have told you that it was too general and the implementation was going to be too complicated for no real gain. I don’t know why I made so many bad choices.
The whole process took weeks and I nearly just gave up several times.
The light at the end of the tunnel
Or: all-up testing.
Eventually, I had a thing I thought might work. The macro syntax was a bit ugly (that macro still exists, with a different name) but it seemed to work. But since the whole purpose of the thing was performance, that needed to be checked. I wasn’t optimistic.
What I did was to write a version of my naïve gravitational many-body system using the new code, based closely on the previous one. The function that updates the state of the particles looks like this:
(defun/quickly step-pvs (source destination from below dt G &aux
(n (particle-vector-length source)))
;; Step a source particle vector into a destination one.
;;
;; Operation count:
;; 3
;; + (below - from) * (n - 1) * (3 + 8 + 9)
;; + (below - from) * (12 + 6)
;; = (below - from) * (20 * (n - 1) + 18) + 3
(declare (type particle-vector source destination)
(type vector-index from)
(type vector-dimension below)
(type fpv dt G)
(type vector-dimension n))
(when (eq source destination)
(error "botch"))
(let*/fpv ((Gdt (* G dt))
(Gdt^2/2 (/ (* Gdt dt) (fpv 2.0))))
(binding-array-slots (((source particle-vector :check nil :rank 1 :suffix _s)
m x y z vx vy vz)
((destination particle-vector :check nil :rank 1 :suffix _d)
m x y z vx vy vz))
(for ((i1 (in-naturals :initially from :bound below :fixnum t)))
(let/fpv ((ax/G zero.fpv)
(ay/G zero.fpv)
(az/G zero.fpv)
(x1 (x_s i1))
(y1 (y_s i1))
(z1 (z_s i1))
(vx1 (vx_s i1))
(vy1 (vy_s i1))
(vz1 (vz_s i1)))
(for ((i2 (in-naturals n t)))
(when (= i1 i2) (next))
(let/fpv ((m2 (m_s i2))
(x2 (x_s i2))
(y2 (y_s i2))
(z2 (z_s i2)))
(let/fpv ((rx (- x2 x1))
(ry (- y2 y1))
(rz (- z2 z1)))
(let/fpv ((r^3 (let* ((r^2 (+ (* rx rx) (* ry ry) (* rz rz)))
(r (sqrt r^2)))
(declare (type nonnegative-fpv r^2 r))
(* r r r))))
(incf ax/G (/ (* rx m2) r^3))
(incf ay/G (/ (* ry m2) r^3))
(incf az/G (/ (* rz m2) r^3))))))
(setf (x_d i1) (+ x1 (* vx1 dt) (* ax/G Gdt^2/2))
(y_d i1) (+ y1 (* vy1 dt) (* ay/G Gdt^2/2))
(z_d i1) (+ z1 (* vz1 dt) (* az/G Gdt^2/2)))
(setf (vx_d i1) (+ vx1 (* ax/G Gdt))
(vy_d i1) (+ vy1 (* ay/G Gdt))
(vz_d i1) (+ vz1 (* az/G Gdt)))))))
destination)
And it not only worked, the performance was very close to the previous version, straight out of the gate. The syntax is not as nice as that of the initial, quick-and-dirty version, but it is much more general, so I think that’s worth it on the whole.
There have been problems since then: in particular the dependency on when classes get defined. It will never be as portable as I’d like because of the unnecessary MOP dependencies3, but it is usable and quick4.
Was it worth it? May be, but it should have been simpler.
-
When exactly do classes get defined? Right. ↩
-
Nothing that uses the AMOP MOP is ever The Right Thing, because the whole thing was designed by people who were extremely smart, but still not as smart as they needed to be and thought they were. It’s unclear if any MOP for CLOS can ever be satisfactory, in part because CLOS itself suffers from the same smart-but-not-smart-enough problem to a large extent not helped by bring dropped wholesale into CL at the last minute: by the time CL was standardised people had written large systems in it, but almost nobody had written anything significant using CLOS, let alone the AMOP MOP. ↩
-
A mistake I somehow managed to avoid was using the whole slot-definition mechanism the MOP wants you to use. ↩
-
I will make it available at some point. ↩