Making CLOS slot access less slow
Access to slots in CLOS instances is often very slow. It’s probably not possible for it ever to be really fast, but the AMOP MOP does provide a way of making it, at least, less slow.
How slow is it?
Here are some benchmarks for accessing fields in objects of various kinds, using SBCL. All of these tests do something equivalent to
(defclass a ()
((i :initform 0 :type fixnum)))
(defclass a/no-fixnum ()
((i :initform 0)))
(defmethod svn ((a a) n)
(declare (type fixnum n)
(optimize speed (safety 0)))
(dotimes (i n)
(incf (the fixnum (slot-value a 'i)))))
(defmethod svn ((a a/no-fixnum) n)
(declare (type fixnum n)
(optimize speed (safety 0)))
(dotimes (i n)
(incf (the fixnum (slot-value a 'i)))))
They then call svn (or equivalent) with a large value of \(n\), do that a number of times \(m\) and then divide by \(2 \times n \times m\) to get an average time per access (incf accesses the slot twice).
For SBCL 2.6.3.178-a190d9710 on ARM64 Apple M1, seconds per access:
- raw fixnum increment \(1.58\times 10^{-10}\), ratio \(1.0\);
- slot access with
slot-value(slot typefixnum) \(1.20\times 10^{-8}\), ratio \(76\); - slot access with
slot-value(no slot type) \(1.22\times 10^{-8}\), ratio \(77\); - slot access with
slot-value(singleslot-value-using-classmethod) \(1.69\times 10^{-8}\), ratio \(107\); - slot access using
standard-instance-access\(1.00\times 10^{-9}\), ratio \(6.4\); - slot access, struct (slot type
fixnum) \(1.57\times 10^{-10}\), ratio \(1.0\); - slot access, struct (no type) \(1.58\times 10^{-10}\), ratio \(1.0\);
- slot access, cons (
car) \(1.59\times 10^{-10}\), ratio \(1.0\).
These numbers vary slightly, but this gives a good picture of what is going on. In particular you can see that slot-value within a method specialised on the class is more than 70 times slower than access for a structure slot, but if you can use standard-instance-access it is only about 6 times slower: standard-instance-access speeds things up by a factor of about 10, which changes CLOS slot access performance from laughably slow to merely pretty slow.
A macro
I’ve written a macro, called with-sia-slots which is like with-slots but uses standard-instance-access. It therefore has all the constraints imposed by that, but it is significantly faster than with-slots or slot-value. It has some overhead, as it has to dynamically compute the slot locations: this is better done outside any inner loop. This means that, for instance, you probably want to write code that looks like
(with-sia-slots (x) o
(dotimes (i many)
(setf x (... x ...))))
which will mean you only pay the overhead once.
The above tests don’t use with-sia-slots, as I wrote them partly to see if something like this was worth writing. However on a current (at the time of writing) SBCL with-sia-slots is asymptotically about 10 times faster than with-slots as demonstrated by these tests.
Up to package names it should be portable to any CL with an AMOP-compatible MOP. It can be found in my implementation-specific hacks, linked from here.