Measuring slot access cost in Common Lisp
I’ve been interested in how slow CLOS slot access is in Common Lisp. Here’s how I measured it.
I wanted to compare the cost of access to fields of various objects in Common Lisp. In particular I wanted to get a feel for the difference between a slot in a class defined with defclass, so an instance of a subclass of standard-object, and a field in a class defined with defstruct, so an instance of a subclass of structure-object.
A naïve model of the access cost
I measured forms like
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (slot-value a 'a)))
(incf s (the fixnum (slot-value a 'b)))
...))
For \(N\) iterations you might think the time \(T\) should be
\[T = N(c_l + n(c_i + c_s)) + c_c\]
where \(c_l\) is the per-step loop overhead, \(n\) is the number of slots, \(c_i\) is the time to increment a variable, \(c_s\) is the slot read cost (the thing we want), and \(c_c\) is the overhead of calling the function to do all this. \(N\gtrapprox 10^9\), so it’s safe to treat \(c_c\) as zero. This is linear in \(n\), and \(T\) is a thing we can measure, so we can differentiate and get an expression for \(c_s\) which is what we want.
\[c_s = \frac{1}{N}\frac{dT}{dn} - c_i\]
In fact, everything works in terms of the per-step time \(t\doteq T/N\) as \(N\) varies for different classes and numbers of slots to keep the runtimes reasonable, and then \(c_s = dt/dn - c_i\).
A better model
Well, this turns out to be wrong. In particular if you estimate \(c_i\) (see below on how I did this) and use it in the above expression you will end up calculating values of \(c_s\) for structures which are either absurdly tiny (\(\sim 10^{–11}\)s for a machine with a cycle time \(\sim 10^{-9}\)s) or even negative. The reason is pretty obviously that the increment and the access are largely overlapped.
So in what follows I simply treat \(c_i\) as zero. This may overestimate \(c_s\) somewhat. But a result of that overestimation is that the factor by which slot access is slower than structure field access will be underestimated, which will make CLOS seem faster than it is, since if \(a \gt b \gt c \gt 0\) then \((a - c)/(b - c) \gt a/b\). That’s good, because what I’m trying to demonstrate is that it’s really slow, so an underestimation is safe.
The new model expression for \(c_s\) is then just \(c_s = dt/dn\).
What I did
I measured slot access time in the same way for a class with 10 slots, measuring 2, 4, 6, 8 and 10 slots, and did the same thing for a structure with 10 fields.
Because the access times and numbers of accesses per step vary widely I adjusted the number of iterations to keep the run-times sane: more than 10 seconds per test but ideally less than 60.
Each measurement was repeated 4 times.
I then fitted a linear function to the data for each class (least-squares fit), and used its gradient and the estimated variable-increment cost to estimate \(c_s\) for each type.
All the measurements were done on an M1 MacBook Air, using caffeinate to prevent it sleeping. I measured LispWorks 8.1.2 and SBCL 2.6.4. Total run times were somewhat over an hour for each implementation.
Results
SBCL slot access data and best fit
LispWorks slot access data and best fit
SBCL structure field access data and best fit
LispWorks structure field access data and best fit
From these you can see that the results are consistent between runs and the best fit is pretty good.
The per-slot cost is then the slope of the best fit curve, or perhaps slightly less.
Structure field access cost estimate
- SBCL: \(c_s \approx 3.2\times 10^{-10}\)s.
- LispWorks: \(c_s \approx 3.1\times 10^{-10}\)s.
Note that these are both almost certainly a single cycle up to rounding.
standard-instance subclass slot access cost estimate
- SBCL: \(c_s \approx 1.2\times 10^{-8}\)s.
- LispWorks: \(c_s \approx 1.0\times 10^{-8}\)s.
Ratios
The ratios between these two values for each implementation are then about 38 for SBCL and about 32 for LispWorks: this is how much slower CLOS slot access is than structure field access. In fact it is probably an underestimate of how much slower it is.
What is obvious1
CLOS slot access is really slow.
What is less obvious but almost certainly true2
This is not because multiple inheritance is inherently slow: it’s because the design of CLOS, especially if you want to take the AMOP MOP seriously, implies crappy performance.
Can this be fixed? Yes, I think so, with well-defined tradeoffs. Will it be? Up to implementors. So, probably not, sadly.
Notes
To get an estimate of the time to increment a variable, \(c_i\), first measure a large number of iterations of an empty loop and then a loop which increments a variable 100 times for each step. Both of the implementations I measured do not optimize empty loops away, intentionally I think. This estimate is now not used (see above), but if it’s not about a clock cycle (about \(3.3\times 10^{-10}\)s on M1) then probably something is wrong.
Code
This is the CL code I used.
;;;; Some slot-value benchmarks
;;;
;;; None of this code is general-purpose.
;;;
(in-package :cl-user)
(define-condition too-short (simple-error)
((seconds :initform 0 :initarg :seconds :reader too-short-seconds)))
(defmacro noting-too-short (&body forms)
`(handler-bind ((too-short (lambda (e)
(format *debug-io* "~&Too short: ~,2Fs when minimum was ~Ds~%"
(too-short-seconds e)
*minimum-seconds*)
(continue e))))
,@forms))
(defvar *minimum-seconds* 10) ;how long it must run for
(defmacro ticks (&body forms)
`(let ((start (get-internal-real-time))
(end (progn
,@forms
(get-internal-real-time))))
(let* ((elapsed-ticks (- end start))
(elapsed-seconds (/ elapsed-ticks internal-time-units-per-second)))
(when (< elapsed-seconds *minimum-seconds*)
(cerror "just return ~D (~,2F seconds)"
(make-condition
'too-short
:format-control "~D ticks (~,2F seconds) is not long enough"
:format-arguments (list elapsed-ticks (float elapsed-seconds))
:seconds (float elapsed-seconds))
elapsed-ticks (float elapsed-seconds)))
elapsed-ticks)))
(defun seconds (ticks &optional (divider 1))
(/ ticks internal-time-units-per-second divider))
(defun note (control &rest args)
(format *debug-io* "~&[~?]~%" control args)
(force-output *debug-io*))
(defmacro noting ((&rest notes) &body forms)
;; Single value only, but this is all we need
`(progn
(format *debug-io* "~&[~@{~A~^ ~}" ,@notes)
(force-output *debug-io*)
(let ((r (progn ,@forms)))
(format *debug-io* " -> ~A]~%" r)
(force-output *debug-io*)
r)))
(defun inc-n (n incs)
(declare (type fixnum n incs)
(optimize speed (safety 0)))
(case incs
(0
(dotimes (i n 0)))
(100
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s)
(incf s))))
(otherwise
(error "what even is this"))))
(defun estimate-increment-time (&key
(exponent 11)
&aux
(n (round (expt 10 exponent)))
(n/100 (round (expt 10 (- exponent 2)))))
(declare (type fixnum n n/100))
(/ (- (seconds (noting (100 n/100) (ticks (inc-n n/100 100))) n/100)
(seconds (noting (0 n) (ticks (inc-n n 0))) n))
100))
(defclass a ()
((a :initform 0 :reader a-a)
(b :initform 0 :reader a-b)
(c :initform 0 :reader a-c)
(d :initform 0 :reader a-d)
(e :initform 0 :reader a-e)
(f :initform 0 :reader a-f)
(g :initform 0 :reader a-g)
(h :initform 0 :reader a-h)
(i :initform 0 :reader a-i)
(j :initform 0 :reader a-j)))
(defstruct b
(a 0)
(b 0)
(c 0)
(d 0)
(e 0)
(f 0)
(g 0)
(h 0)
(i 0)
(j 0))
(defgeneric svn (o n count &key)
(declare (optimize speed)))
(defmethod svn ((a a) n count &key (reader nil))
(declare (type fixnum n count)
(optimize speed (safety 0)))
(if reader
(case count
(2
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (a-a a)))
(incf s (the fixnum (a-b a))))))
(4
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (a-a a)))
(incf s (the fixnum (a-b a)))
(incf s (the fixnum (a-c a)))
(incf s (the fixnum (a-d a))))))
(6
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (a-a a)))
(incf s (the fixnum (a-b a)))
(incf s (the fixnum (a-c a)))
(incf s (the fixnum (a-d a)))
(incf s (the fixnum (a-e a)))
(incf s (the fixnum (a-f a))))))
(8
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (a-a a)))
(incf s (the fixnum (a-b a)))
(incf s (the fixnum (a-c a)))
(incf s (the fixnum (a-d a)))
(incf s (the fixnum (a-e a)))
(incf s (the fixnum (a-f a)))
(incf s (the fixnum (a-g a)))
(incf s (the fixnum (a-h a))))))
(10
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (a-a a)))
(incf s (the fixnum (a-b a)))
(incf s (the fixnum (a-c a)))
(incf s (the fixnum (a-d a)))
(incf s (the fixnum (a-e a)))
(incf s (the fixnum (a-f a)))
(incf s (the fixnum (a-g a)))
(incf s (the fixnum (a-h a)))
(incf s (the fixnum (a-i a)))
(incf s (the fixnum (a-j a))))))
(otherwise
(error "what even is this")))
(case count
(2
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (slot-value a 'a)))
(incf s (the fixnum (slot-value a 'b))))))
(4
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (slot-value a 'a)))
(incf s (the fixnum (slot-value a 'b)))
(incf s (the fixnum (slot-value a 'c)))
(incf s (the fixnum (slot-value a 'd))))))
(6
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (slot-value a 'a)))
(incf s (the fixnum (slot-value a 'b)))
(incf s (the fixnum (slot-value a 'c)))
(incf s (the fixnum (slot-value a 'd)))
(incf s (the fixnum (slot-value a 'e)))
(incf s (the fixnum (slot-value a 'f))))))
(8
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (slot-value a 'a)))
(incf s (the fixnum (slot-value a 'b)))
(incf s (the fixnum (slot-value a 'c)))
(incf s (the fixnum (slot-value a 'd)))
(incf s (the fixnum (slot-value a 'e)))
(incf s (the fixnum (slot-value a 'f)))
(incf s (the fixnum (slot-value a 'g)))
(incf s (the fixnum (slot-value a 'h))))))
(10
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (slot-value a 'a)))
(incf s (the fixnum (slot-value a 'b)))
(incf s (the fixnum (slot-value a 'c)))
(incf s (the fixnum (slot-value a 'd)))
(incf s (the fixnum (slot-value a 'e)))
(incf s (the fixnum (slot-value a 'f)))
(incf s (the fixnum (slot-value a 'g)))
(incf s (the fixnum (slot-value a 'h)))
(incf s (the fixnum (slot-value a 'i)))
(incf s (the fixnum (slot-value a 'j))))))
(otherwise
(error "what even is this")))))
(defmethod svn ((b b) n count &key)
(declare (type fixnum n)
(optimize speed (safety 0)))
(case count
(2
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (b-a b)))
(incf s (the fixnum (b-b b))))))
(4
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (b-a b)))
(incf s (the fixnum (b-b b)))
(incf s (the fixnum (b-c b)))
(incf s (the fixnum (b-d b))))))
(6
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (b-a b)))
(incf s (the fixnum (b-b b)))
(incf s (the fixnum (b-c b)))
(incf s (the fixnum (b-d b)))
(incf s (the fixnum (b-e b)))
(incf s (the fixnum (b-f b))))))
(8
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (b-a b)))
(incf s (the fixnum (b-b b)))
(incf s (the fixnum (b-c b)))
(incf s (the fixnum (b-d b)))
(incf s (the fixnum (b-e b)))
(incf s (the fixnum (b-f b)))
(incf s (the fixnum (b-g b)))
(incf s (the fixnum (b-h b))))))
(10
(let ((s 0))
(declare (type fixnum s))
(dotimes (i n s)
(incf s (the fixnum (b-a b)))
(incf s (the fixnum (b-b b)))
(incf s (the fixnum (b-c b)))
(incf s (the fixnum (b-d b)))
(incf s (the fixnum (b-e b)))
(incf s (the fixnum (b-f b)))
(incf s (the fixnum (b-g b)))
(incf s (the fixnum (b-h b)))
(incf s (the fixnum (b-i b)))
(incf s (the fixnum (b-j b))))))
(otherwise
(error "what even is this"))))
(defun measure-thing (thing &key
(exponent 11)
(specs
'((2 -0.2)
(4 -0.4)
(6 -0.6)
(8 -0.8)
(10 -1)))
(sleep 0)
&aux (cn (class-name (class-of thing))))
(mapcar (lambda (spec)
(destructuring-bind (count delta &rest kws &key) spec
(let ((iterations (round (expt 10 (+ exponent delta)))))
(let ((per-step (float
(seconds (noting (cn count iterations)
(ticks (apply #'svn thing
iterations count kws)))
iterations))))
(note "~S ~D elapsed ~Ds per-step ~Ds"
cn count (* per-step iterations) per-step)
(when (> sleep 0)
(noting ("sleep" sleep)
(sleep sleep)))
(list cn count per-step)))))
specs))
(defun measure-things (&key
(things-and-exponents `((,(make-b) 11)
(,(make-instance 'a) 10)))
(log-file "thing-times.ldat")
(tries 4)
(sleep 5))
;; Dump measurements to a log file
(with-standard-io-syntax
(with-open-file (log log-file :direction :output
:if-exists :supersede)
(noting-too-short
(let ((increment-time (float (estimate-increment-time))))
(note "increment time ~Ds" increment-time)
(pprint increment-time log)
(force-output log))
(dolist (thing-and-exponent things-and-exponents)
(destructuring-bind (thing exponent) thing-and-exponent
(note "~S exponent ~D"
(class-name (class-of thing))
exponent)
(dotimes (try tries)
(pprint
(measure-thing thing
:exponent exponent
:sleep sleep)
log)
(force-output log)))))))
log-file)
This is the Racket code which plotted the data and computed the fit & cost.
#lang racket
;;;; Fit data from tsv
;;;
(require simple-polynomial
plot)
(define (snarf from)
;; Stolen from warranted (wcs.rkt): just read all the forms, safely
(call-with-default-reading-parameterization
(thunk
(parameterize ([read-accept-lang #f]
[read-accept-reader #f])
(call-with-input-file from
(λ (p)
(for/list ([form (in-port read p)])
form)))))))
(define (classify file-data)
;; The data is an increment time, followed by lists of (class-name
;; slot-count seconds) Return a hash table mapping from class names
;; and the imcrememt time
(match-let ([(cons increment-time records) file-data])
(define cmap (make-hasheqv))
(for* ([record (in-list records)]
[single (in-list record)])
(match-let ([(list class-name count seconds) single])
(hash-update! cmap class-name
(λ (c)
(cons (list count seconds) c))
'())))
(values cmap increment-time)))
(define (linear-fit class-name cmap)
(points->best-fit-polynomial (hash-ref cmap class-name) 1))
(define (slot-cost class-name cmap (increment-time 0.0))
(- (first (polynomial-terms (linear-fit class-name cmap)))
increment-time))
(define (file-slot-cost class-name file
#:use-increment-time (use-increment-time #f))
(let-values ([(cmap increment-time)
(classify (snarf file))])
(slot-cost class-name cmap (if use-increment-time
increment-time
0.0))))
(define (file-A/B-ratio file #:use-increment-time (use-increment-time #f))
(/ (file-slot-cost 'A file #:use-increment-time use-increment-time)
(file-slot-cost 'B file #:use-increment-time use-increment-time)))
(define (plot-linear-fit class-name cmap
#:to-file (to-file #f)
#:title (title #f))
(parameterize ([plot-font-family 'modern]
[plot-width 560]
[plot-x-far-axis? #f]
[plot-y-far-axis? #f]
[plot-x-ticks (linear-ticks #:number 5)])
(define pts (hash-ref cmap class-name))
((if to-file
(curryr plot-file to-file)
plot)
(list
(points pts #:sym 'plus #:label (format "~A data" class-name))
(function (points->best-fit-polynomial pts 1) #:label "linear fit"))
#:x-min 0
#:x-max 10.5
#:x-label "count"
#:y-label "seconds/step"
#:title title)))
(define (file-plot-linear-fit class-name file
#:to-file (to-file #f)
#:title (title #f))
(let-values ([(cmap _) (classify (snarf file))])
(plot-linear-fit class-name cmap
#:to-file to-file
#:title title)))