Last time, we implemented a basic framework that can generate executables, came up with a data model, and implemented a basic code generator that can handle simple constants. In this part, we're going to start implementing primitive operations so our Lisp can start doing real computations! If you're following along in An Incremental Approach to Compiler Construction, this post starts at section 3.3.
Structure of Primitive Calls
In the code our compiler eventually accepts, some expressions, like (my-fn 12 27 x)
, will refer to user-defined functions that are defined elsewhere in the
program. Others, like (+ 1 2)
, reference primitive operations that are defined
as part of the language.
We could just check the name of the function being called, but then we'd risk
turning a call to a variable named +
that shadows the global primitive into a
call to the built-in +
. Instead, we can just redefine the language: the
primcall
function is reserved, and any calls to built-in primitives will have
to look like (primcall operation-name args...)
to get compiled. Later on, we
can do the transformation automatically, so the user doesn't have to worry about
the details of what's primitive and what's not.
Now that we know what primitive calls look like, let's define some accessor functions to make our code a bit more readable:
; Check whether the passed form is a primitive call (primcall) form (define (primitive-call? form) (eq? 'primcall (car form))) ; Get the primitive operation from a passed primcall form (define (primitive-op form) (cadr form)) ; Get the Nth argument of a passed primcall form (define (primitive-op-arg1 form) (caddr form)) (define (primitive-op-arg2 form) (cadddr form)) ; Get all arguments of a passed primcall form (define (primitive-op-args form) (cddr form))
These functions all accept a primitive call expression, or 'form' in Lisp
parlance, and return various properties of it. If you're not familiar with
Lisp's naming schemes, the calls to caddr
, cddr
, and friends might seem a
bit opaque. Not to worry - the naming scheme is actually quite logical when you
know how Lisp's lists are structured. Lists (as opposed to vectors) in Scheme
are defined as chains of pairs. In memory, the list (1 2 3)
looks something
like this:
car cdr car cdr car cdr |-----|-----| |-----|-----| |-----|-----| | 1 | o--+--->| 2 | o--+--->| 3 | | |-----|-----| |-----|-----| |-----|-----|
In other words, it's a linked list whose cells are just pairs. Each pair has a
car
field, the first element, and a cdr
field, the second element. So the
first item in the list x
is (car x)
, the second item is (car (cdr x))
, and
so on. Long chains of car
and cdr
calls get annoying, though, so Scheme
defines c?r
functions for everything up through cddddr
. Those functions just
compress the corresponding sequence of car
and cdr
calls into a more
readable name - (cadr x)
is short for (car (cdr x))
, (cddr x)
is short for
(cdr (cdr x))
, and so on.
Now that the accessors are defined and we know a little more about how lists are stored, we can move on to the actual implementation.
Primitive Unary Operations
First, we modify compile-expr
to handle the new type of input form. As part of
this, we also have to add a way to check whether a given form is a primitive
value that can be passed to immediate-rep
.
(define (immediate? x) (or (integer? x) (char? x) (boolean? x) (null? x))) (define (compile-expr e) (cond ((immediate? e) (emit "movl $~a, %eax" (immediate-rep e))) ((primitive-call? e) (compile-primitive-call e)) ))
Rather than emitting a movl
instruction unconditionally, now we have to check
whether the form we're compiling is an immediate or a primitive call, and act
appropriately. Once we're dispatching on the expression type, we can implement
the body of compile-primitive-call
.
(define (compile-primitive-call form) (case (primitive-op form) ((add1) (compile-expr (primitive-op-arg1 form)) (emit "addl $~a, %eax" (immediate-rep 1))) ((sub1) (compile-expr (primitive-op-arg1 form)) (emit "subl $~a, %eax" (immediate-rep 1))) ))
Now our compiler should be able to understand the basic add1
and sub1
primitives, which increment or decrement the argument's value, respectively. The
implementations here aren't very complex, mainly due to the way we chose to
represent integers. If you'll recall, integers are stored as a 30-bit field of
their value with the two low-order bits set to zero. So the integer 1 is stored
as 4, 12 as 48, and so on. As a direct consequence, we can add and subtract
integers without any conversions at all, since addition and subtraction won't
change the lower two bits.
Time to test it! Remember that if you're entering Lisp forms like (primcall add1 2)
directly, you have to prefix them with a single-quote (') so the host
Scheme won't try to execute them as code.
scheme@(guile-user)> (load "compiler.scm") ... scheme@(guile-user)> (compile-and-run '(primcall add1 5)) 6 scheme@(guile-user)> (compile-and-run '(primcall add1 1)) 0
More Unary Operations
Incrementing and decrementing numbers isn't very exciting. Let's add some more unary primitives!
For now, this just includes the type-checking predicates: integer?
, zero?
,
boolean?
, and char?
. These are all implemented in basically the same way.
They each AND the value with its corresponding type mask (e.g. 3
for
integers), then check the result against the tag bits that are supposed to be in
that area, convert the result to a boolean, and return it. We'll also define a
helper function for the "check and convert to boolean" code sequence, since we
use it so much.
(define (emit-is-eax-equal-to val) (emit "cmpl $~a, %eax" val) ; check eax against val (emit "movl $0, %eax") ; zero eax, leaving equal flag in place (emit "sete %al") ; set low bit of eax if they were equal (emit "sall $~a, %eax" bool-shift) ; shift the bit up to the bool position (emit "orl $~a, %eax" bool-tag)) ; add boolean type tag (define (compile-primitive-call form) (case (primitive-op form) ... ; integer? - check whether the first arg is an integer ((integer?) (compile-expr (primitive-op-arg1 form)) (emit "andl $~a, %eax" fixnum-mask) (emit-is-eax-equal-to 0)) ; boolean? - check whether the first arg is a boolean ((boolean?) (compile-expr (primitive-op-arg1 form)) (emit "andl $~a, %eax" bool-mask) (emit-is-eax-equal-to bool-tag)) ; char? check whether the first arg is a character ((char?) (compile-expr (primitive-op-arg1 form)) (emit "andl $~a, %eax" char-mask) (emit-is-eax-equal-to char-tag)) ; zero? check whether the first arg is the fixnum 0 ((zero?) (compile-expr (primitive-op-arg1 form)) (emit-is-eax-equal-to 0)) ))
Binary Primitives
Unary operations are great, but now it's time to extend our compiler to
multi-argument expressions, but we run into a problem. We can't evaluate a + b
by just keeping our intermediate value in %eax
as we've been doing up to this
point. We have to evaluate a
, then evaluate b
while preserving a
's old
value, and finally add the two and store the result back in %eax
.
To solve this problem, we'll have to store the intermediary result somewhere;
the stack's already conveniently allocated for us, there's a pointer to it in
the %esp
register, and storing temporary data is its whole purpose in life.
Let's take advantage of it.
In memory, the stack is laid out starting at high addresses and growing
downwards towards 0x00000000. As far as our code is concerned, the first word of
memory it's safe to use is at %esp-4
, the next is at %esp-8
, and so on.
Anything at the stack pointer and above belongs to the code calling us and isn't
safe to overwrite.
low addresses --------------- | | | | | | | | | | ... | | %esp-24 | | %esp-20 | | %esp-16 | | %esp-12 | | %esp-8 | | %esp-4 |-------------| | saved regs | <---- %esp |-------------| | return addr | |-------------| high addresses
So to store intermediate data on the stack, we have two (actually more, but I'll
present two here) options: we can use the push
and pop
instructions to move
the stack pointer up and down, storing and retrieving data as it goes, or we can
keep track of the offsets ourselves and leave %esp
in one place. While it
might seem like more work to go with option 2, it'll help us out later when it
comes time to add local variable bindings, so we're going to go with that.
First, we have to augment our compile-primitive-call
and compile-expr
functions with a new parameter, si
(stack index). This is the first stack
offset that's safe to store data in. Initially it'll be set to -4, but as we
store more intermediate results it'll decrease to access further elements on the
stack.
(define wordsize 4) (define (compile-primitive-call form si) (case (primitive-op form) ... same contents, but every si is passed unchanged to every (compile-expr) call ... )) (define (compile-expr e si) (cond ((primitive-call? e) (compile-primitive-call e si)) )) (define (compile-program program) ... (compile-expr program (- wordsize)) ...)
After testing that everything still works, we can move on to writing basic
binary primitives. To store intermediate values, we copy them from %eax
to
offsets into the stack like -4(%esp)
, and adjust the stack offset passed to
the second call to compile-expr
. Note that the parameters to -
are evaluated
in reverse order, since the second value is getting subtracted from the first.
(define (compile-primitive-call form si) (case (primitive-op form) ... ; + - addition of two fixnums ((+) (compile-expr (primitive-op-arg1 form) si) (emit "movl %eax, ~a(%esp)" si) (compile-expr (primitive-op-arg2 form) (- si wordsize)) (emit "addl ~a(%esp), %eax" si)) ; - - subtraction of two fixnums ((-) (compile-expr (primitive-op-arg2 form) si) (emit "movl %eax, ~a(%esp)" si) (compile-expr (primitive-op-arg1 form) (- si wordsize)) (emit "subl ~a(%esp), %eax" si)) ))
We can also implement multiplication, equality, and a less-than operation.
Equality and less-than work the same way as the type predicates did, and char
equality is just a shifted version of the same, but multiplication's a little
trickier. Since we have as input what are essentially two integers 4*a
and
4*b
, and we want a result of 4*(a*b)
as output, we have to divide one of the
inputs to the multiplication by four, since otherwise we'd get 16*a*b
. We can
do this efficiently by shifting the input right by two bits.
(define (compile-primitive-call form si) (case (primitive-op form) ... ; * - multiplication of two fixnums ((*) (compile-expr (primitive-op-arg1 form) si) (emit "movl %eax, ~a(%esp)" si) (compile-expr (primitive-op-arg2 form) (- si wordsize)) (emit "shrl $~a, %eax" fixnum-shift) ; make sure only one is shifted (emit "imull ~a(%esp), %eax" si)) ; = - fixnum equality test ((=) (compile-expr (primitive-op-arg1 form) si) (emit "movl %eax, ~a(%esp)" si) (compile-expr (primitive-op-arg2 form) (- si wordsize)) (emit "cmpl %eax, ~a(%esp)" si) (emit "movl $0, %eax") (emit "sete %al") (emit "sall $~a, %eax" bool-shift) (emit "orl $~a, %eax" bool-tag)) ; = - fixnum less-than test ((<) (compile-expr (primitive-op-arg1 form) si) (emit "movl %eax, ~a(%esp)" si) (compile-expr (primitive-op-arg2 form) (- si wordsize)) (emit "cmpl %eax, ~a(%esp)" si) (emit "movl $0, %eax") (emit "setl %al") (emit "sall $~a, %eax" bool-shift) (emit "orl $~a, %eax" bool-tag)) ; char=? - character equality test ((char=?) (compile-expr (primitive-op-arg1 form) si) (emit "shrl $~a, %eax" char-shift) ; remove tag bits (emit "movl %eax, ~a(%esp)" si) (compile-expr (primitive-op-arg2 form) (- si wordsize)) (emit "shrl $~a, %eax" char-shift) ; remove tag bits (emit "cmpl %eax, ~a(%esp)" si) (emit "movl $0, %eax") (emit "sete %al") (emit "sall $~a, %eax" bool-shift) (emit "orl $~a, %eax" bool-tag)) ))
Testing
At this point our compiler has gained more features than we can easily test by hand every time we make a change. If we want to keep scaling up its feature set with confidence, we'll need a set of automated tests we can run to make sure nothing's broken. Since the exact mechanics of the test suite aren't relevant to actually building the compiler, I'm not going to go over the implementation here. However, you can download it along with the rest of the source code, and I'll be adding new test cases as the compiler gains new features.
If you have Guile installed on your system, you can run it like so:
$ guile --fresh-auto-compile tests.scm ... $
Each test case that passes will be printed along with "success", and any failures will be highlighted in ALL CAPS along with their actual and expected outputs. Next time, we'll add more basic forms and implement a simple memory allocation scheme.