In the last post, we added support for local variables, conditional expressions, and heap-allocated objects. This time around, we'll augment our compiler with the ability to call functions, handle closures, and provide tail-call optimization.
The Mighty Lambda
In scheme, functions are created by the lambda
special form, which looks like
(lambda (arg0 arg1 arg2) body-form)
. Implementing it is going to require
several new features that our compiler doesn't have yet. Let's look at the steps
which occur (from the programmer's perspective) when they evaluate a lambda
form:
- The current bindings of any variables referenced by the lambda aside from its parameters are captured and stored, so calls to the result of the lambda can use them even after the current function has completed.
- The call returns a closure object which can be returned and stored like any other data, as well as being used to actually call the function.
From a high-level perspective, our implementation of lambda
will look
something like this:
- Scan the program, annotating each
lambda
with its list of free variables that will need to be captured. - Scan the program looking for
lambda
forms. For each one, compile its body into a separately-labelled region of instructions. - Replace each actual
lambda
call with a specialclosure
form that takes the values of the free variables, packages them along with a pointer to the generated code region, and constructs a closure object on the heap. - Replace non-primitive function calls with
funcall
forms, which prepare the arguments and execute the function call.
This means we'll have to start preprocessing our input code before we actually
generate any assembly instructions. To start with, let's define an alternative
representation of the input code that makes those three steps a bit simpler.
Defining the structure by example, the expression (let ((y 4)) ((lambda (x) (+ y x)) 3))
would be converted to:
(labels ((lbl0 (code (x) (y) (primcall + y x)))) (let ((y 4)) (funcall (closure lbl0 y) 3))
In other words, we add a new root labels
form, which wraps a series of
labelled code regions and a primary "main" form that's evaluated when the
program runs. Before we move on to any actual code generation, let's work out
how to transform the input code into this augmented form.
Program Transformations
The first transformation we need to implement is annotation of free variables. We can implement this as a recursive function that takes an expression and its environment, then returns the expression's annotated form along with a list of free variables. As we run the analysis recursively, a list of bound variables flows down the call stack from the root to the leaves. At the same time, a list of free variables flows upwards from the leaves to the root. We collect all the free variables at each node and deduplicate them, then pass them upwards.
When handling the recursive traversal, there's a few cases we need to deal with:
- Lambda forms - These are what actually gets annotated, so we need to change the form to include a list of free variables that the lambda should capture. We can find that by annotating the body form with an environment consisting only of the arguments.
- Let forms - These change the inner environment, so we have to modify the environment before recursing.
- Symbols - These are actual variable references. If the var is bound, then we can just return it. Otherwise, we can return that the var is free.
- Primitive calls - We can ignore the primitive name, but we have to traverse down into the arguments.
- Normal calls - These are easy; we just traverse down into each element, including the first one.
Here's an implementation of the simple cases:
(define (lambda? x) (eq? (car x) 'lambda)) ; utility function removing duplicate elements in a list (define (remove-duplicates xs) (if (null? xs) xs (cons (car xs) (remove-duplicates (delq (car xs) (cdr xs)))))) (define (xform-annotate-free-vars e env) (cond ((null? e) (list e '())) ; if the var is bound, then don't add it to the free list ((variable? e) (list e (if (member e env) '() (list e)))) ; catch-all for immediates ((not (list? e)) (list e '())) ; handle if form so we don't end up with 'if' as a free var ((if? e) (let* ((parts (map (lambda (e) (xform-annotate-free-vars e env)) (cdr e))) (annotated (map car parts)) (free-vars (map cadr parts))) (list `(if ,@annotated) (remove-duplicates (apply append free-vars))))) ; deal with lambdas ((lambda? e) (let* ((args (cadr e)) (body-form (caddr e)) ; treat the outer environment as unbound (annotated-body (xform-annotate-free-vars body-form args)) ; free variables list shouldn't include args, even though they'll be ; output as free when analyzing the body (inner-free (filter (lambda (v) (not (memq v args))) (cadr annotated-body)))) (list `(lambda ,args ,inner-free ,(car annotated-body)) ; outgoing free vars shouldn't include parts of our input env (filter (lambda (v) (not (memq v env))) inner-free)))) ))
At this point we still have to handle primitive calls, normal calls, and let expressions. The implementations for both types of call are almost identical. For each of them, we recursively annotate the parameter forms using our own environment. Then, return the reassembled call form alongside the deduplicated list of free variables.
(define (xform-annotate-free-vars e env) (cond ... ; primitive call - ignore the function, map over args ((primitive-call? e) (let* ((results (map (lambda (p) (xform-annotate-free-vars p env)) (cddr e))) (annotated (map car results)) (free (apply append (map cadr results)))) (list `(primcall ,(cadr e) ,@annotated) (remove-duplicates free)))) (else (let* ((results (map (lambda (p) (xform-annotate-free-vars p env)) e)) (annotated (map car results)) (free (apply append (map cadr results)))) (list annotated (remove-duplicates free)))) ))
Finally, we need to handle let expressions. This one's a lot more complex than the other expression types since we need to deal with two parallel environments at the same time. The process goes something like this:
- Recursively annotate each of the binding values passed to
let
, using our input environment. Sincelet
doesn't bind any variables until all parameters have been evaluated, we have to separate the "inner" environment from the outer one. - Collect the free variables from all binding values together.
- Add all the newly-bound names to the environment, forming the "inner environment" in which the body forms will be evaluated.
- Recursively annotate each body form using the inner environment.
- Collect the free variables from the binding values together with those from each body form together, yielding the final free variable list.
- Reassemble and return the
let
expression with annotated versions in place of each binding value and body form.
In code, the whole process looks like this:
(define (xform-annotate-free-vars e env) (cond ... ; check for let form and apply bindings ((let? e) (let* ((bindings (cadr e)) (body-forms (cddr e)) (bind-names (map car bindings)) (bind-bodies (map cadr bindings)) ; analyze all binding bodies (body-free (map (lambda (e) (xform-annotate-free-vars e env)) bind-bodies)) (new-bodies (map car body-free)) (body-free (apply append (map cadr body-free))) (inner-env (append bind-names env)) (inner-annotated (map (lambda (b) (xform-annotate-free-vars b inner-env)) body-forms))) (list `(let ,(map list bind-names new-bodies) ,@(map car inner-annotated)) (remove-duplicates (append (apply append (map cadr inner-annotated)) body-free))))) ... ))
Now we can test our new annotation pass! It's not actually plugged into the code generator yet, but since it's a pure function that just transforms the input syntax tree, that's okay. We can test it interactively:
scheme@(guile-user)> (xform-annotate-free-vars '(if x (lambda (y z) (primcall + x (primcall + y z)))) '()) ((if x (lambda (y z) (x) (primcall + x (primcall + y z)))) (x)) scheme@(guile-user)> (xform-annotate-free-vars '(let ((x 1) (y 2)) (primcall + x y)) '()) ((let ((x 1) (y 2)) (primcall + x y)) ())
Now that we can tell the transformation pass attaches the free variables to the
lambda
form, we can move on to generating the top-level labels
form based on
a given input program that's already been run through the lambda-annotation
transform. The implementation here is relatively straightforward; it recursively
traverses the input form, replacing lambdas and normal function calls as needed.
; Transform a program in annotated lambda form into 'labels' form ; ; Given a program in annotated lambda form (as produced by ; xform-annotate-free-vars) this will convert it into (labels ((...)) body) ; form, by generating names for each lambda and hoisting their bodies and ; arguments into new labels, then replacing the lambdas with `funcall`s. (define (xform-labels pgm) (let () (define next-label-tag 0) (define label-forms '()) ; add a label with the given contents, returning its name as a symbol (define (make-label form) (let* ((new-tag next-label-tag) (tag-name (string->symbol (format #f "lbl~a" new-tag))) (new-label-forms (cons (list tag-name form) label-forms))) (begin (set! next-label-tag (+ 1 new-tag)) (set! label-forms new-label-forms) tag-name))) (define (xform pgm) (cond ((not (list? pgm)) pgm) ; non-list stuff is unchanged ((null? pgm) pgm) ; transform lambdas into labels ((lambda? pgm) (let* ((arguments (cadr pgm)) (free-vars (caddr pgm)) (body (xform (cadddr pgm))) (name (make-label `(code ,arguments ,free-vars ,body)))) `(closure ,name ,@free-vars))) ; recursively transform let expressions ((let? pgm) (let* ((bindings (cadr pgm)) (body (cddr pgm)) (binding-names (map car bindings)) (binding-values (map cadr bindings))) `(let ,(map list binding-names (map xform binding-values)) ,@(map xform body)))) ; handle function calls and if exprs ((if? pgm) `(if ,@(map xform (cdr pgm)))) ((primitive-call? pgm) `(primcall ,(primitive-op pgm) ,@(map xform (primitive-op-args pgm)))) ; turn normal function calls into funcall forms (else `(funcall ,(xform (car pgm)) ,@(map xform (cdr pgm)))) )) (let ((new-pgm (xform pgm))) `(labels ,label-forms ,new-pgm)) ))
The main thing about this implementation that's worth explaining is the usage of
internal definitions and mutable state to simplify the implementation. Scheme
specifies that a define
form that appears inside the body of a let*
(or one
of several other forms) is only visible from inside that form. The
implementation here uses the internal definitions next-label-tag
and
label-forms
to keep the current label tag and the list of emitted labels in
mutable state, updated via the set!
function.
Every invocation of the internal make-label
function will generate a new
label, add the passed label form to the output list, and return the name
associated with it.
Finally, we'll define a helper function to thread the two transformation passes together:
(define (precompile-transform program) (xform-labels (car (xform-annotate-free-vars program '()))))
Let's plug our example from earlier into this, just to sanity-check that our implementation is working properly. It won't be formatted as nicely, but we should get the same data structure out:
scheme@(guile-user)> (precompile-transform '(let ((y 4)) ((lambda (x) (primcall + y x)) 3))) (labels ((lbl0 (code (x) (y) (primcall + y x)))) (let ((y 4)) (funcall (closure lbl0 y) 3)))
Generating Code for Label-Structured Programs
Now that we can transform the input program, we need to change our
code-generation functions to handle the new labels
representation in place of
the raw input it was dealing with earlier. Fortunately, the whole point of the
labels
representation is to make this job easy. The first thing we need to do
is rewrite our compile-program
function to run transformation passes on the
input, then emit code for the resulting labels
form:
; emit code for the top-level labels form (define (emit-labels labels body) ... ) (define (compile-program program) (set! current-label 0) (let ((intermediate (precompile-transform program))) (emit-labels (cadr intermediate) (caddr intermediate))))
Previously, our code generator could just emit a linear sequence of assembly,
optionally with labels in the case of an if
form. Now that functions come into
play, our execution flow can become nonlinear, so we have to emit each labelled
form as its own chunk of assembly, then finally emit code for the top-level
body.
In other words, the assembly output from our example function would look something like this:
lbl0: [code to evaluate lambda] ret scheme_entry: push %esi push %edi push %edx movl 16(%esp), %esi [code to build closure] [code to evaluate 3] [code to call closure with 3] pop %edx pop %edi pop %esi ret
Before we get into how to actually generate the code, let's go over the strategy we'll be using for passing arguments and captured variables to code forms. In order to generate their code, we'll have to define some convention for how they get the needed parameters. To start with, let's deal with function arguments and leave captured closure values for later.
Since this is a Scheme, we have to keep in mind that storing and passing functions around is a perfectly valid thing for code to do. To support this, we'll define the memory layout of a closure object. This needs to store both the address of the code to evaluate it and the values of any captured variables. In memory, it'll look something like this:
CLOSURE ----------------------------------------- | code | captured | captured | | pointer | value 0 | value 1 | ... | | | | ----------------------------------------- 00 01 02 03 04 05 06 07 08 09 ...
When we evaluate a closure
form, we have to allocate one of these objects,
fill it with any captured values, and then return it. We can then pass a pointer
to this object into the called function so it can get the appropriate values for
its free variables. Since we're controlling the whole environment, we can assign
a register that will always hold a pointer to the current closure. Then, we
just need to save and restore its value around calls. We're already saving
%edx
, so we'll recycle it for this purpose. If you recall, %edx
gets
overwritten when we multiply integers, so we have to add some save/restore logic
there.
(define (compile-primitive-call form si env) (case (primitive-op form) ... ; * - multiplication of two fixnums ((*) (compile-expr (primitive-op-arg1 form) si env) (emit "movl %eax, ~a(%esp)" si) (compile-expr (primitive-op-arg2 form) (- si wordsize) env) (emit "shrl $~a, %eax" fixnum-shift) ; make sure only one is shifted (emit "movl %edx, %edi") (emit "imull ~a(%esp), %eax" si) (emit "movl %edi, %edx")) ... ; (set-car! pair obj) - mutate the car of a pair ((set-car!) ... (emit "movl ~a(%esp), %edi" si) (emit "movl %eax, ~a(%edi)" (- pair-tag))) ; (set-car! pair obj) - mutate the car of a pair ((set-cdr!) ... (emit "movl ~a(%esp), %edi" si) (emit "movl %eax, ~a(%edi)" (- 4 pair-tag))) ... ))
Similar to the way it works in C, we'll be passing function arguments on the stack. From the caller's perspective, the stack looks like this before the call takes place:
low addresses --------------- | | | ... | | | |-------------| | arg2 | \ | arg1 | >--- function arguments | arg0 | / |-------------| | return addr | ----- return address for callee |-------------| | closure ptr | ----- caller's saved closure pointer |-------------| | local2 | \ | local1 | >--- caller's locals | local0 | / |-------------| | esi | <---- %esp | edi | | edx | |-------------| | ... | |-------------| high addresses
The implementation of funcall
has to evaluate the arguments, store them into
the appropriate locations, save the closure pointer, then perform the call
instruction. Immediately after the call
instruction in the caller, the
callee's view of the stack will look like this:
low addresses --------------- | | | ... | | | ----- space for locals |-------------| | arg2 | \ | arg1 | >--- function arguments | arg0 | / |-------------| | return addr | <---- %esp |-------------| | closure ptr | <---- caller's saved closure pointer |-------------| | local2 | \ | local1 | >--- caller's locals | local0 | / |-------------| | ... | |-------------| high addresses
Now that we have some idea of the way our generated code will interact with its
caller, we can start updating the code generator. First, we define how to emit
assembly code implementing a passed code
label.
; emit code for an individual label form (define (emit-label-code label env) ; switch based on what kind of label it is (case (caadr label) ; for code labels, construct an environment and generate code under the ; appropriate label ((code) (let* ((code-form (cadr label)) (name (car label)) (args (cadr code-form)) (captured (caddr code-form)) (body (cadddr code-form)) (inner-env (append (map list args (map (lambda (i) `(var ,(* (- wordsize) (+ 1 i)))) (iota (length args)))) env)) (locals-start (- (* wordsize (+ 1 (length args)))))) (emit "~a:" name) (compile-expr body locals-start inner-env) (emit "ret"))) ))
If we were to test it, though, that implementation would fail. Since we're now
dealing with more than just stack variables in our environment, we have to add
a way to keep track of how to access something. As you might notice,
emit-label-code
populates the environment with pairs that look like
(name (var -4))
, rather than the (name -4)
we were using previously. We're
not going to deal with other types of variable yet, but we do have to change the
implementations of compile-var-load
and compile-let
to handle this new
format:
(define (compile-var-load v si env) (let ((value (cadr (assoc v env)))) (case (car value) ; stack variable tracked by offset ((var) (emit "movl ~a(%esp), %eax" (cadr value))) ))) (define (compile-let bindings body si env) (let* ((stack-offsets (map (lambda (x) `(var ,(- si (* x wordsize)))) (range 0 (length bindings)))) (names (map car bindings)) (exprs (map cadr bindings)) (inner-si (- si (* (length bindings) wordsize))) (inner-env (append (map list names stack-offsets) env))) ; evaluate exprs and assign them to stack locations (for-each (lambda (expr offset) (compile-expr expr inner-si env) (emit "movl %eax, ~a(%esp)" (cadr offset))) exprs stack-offsets) ; evaluate all body forms - this will leave the last one's output in %eax (for-each (lambda (form) (compile-expr form inner-si inner-env)) body) ))
Once we've defined how to emit code for a single label, we just have to link
that with the larger compilation process by defining emit-labels
. The contents
aren't that complex - it basically just calls emit-label-code
on each label,
then generates the scheme_entry
symbol used to run the program along with the
normal preamble and clean-up instructions.
; emit code for the top-level labels form (define (emit-labels labels body) (let* (; get an environment mapping each label to its asm string label (symbol-env (map (lambda (x) `(,(car x) (label ,(car x)))) labels))) ; emit text section preamble (emit ".text") (emit ".p2align 4,,15") (emit ".globl scheme_entry") ; emit code for each label (for-each (lambda (lbl) (emit-label-code lbl symbol-env)) labels) ; emit the top-level form ; handle incoming call from C (emit "scheme_entry:") (emit "push %esi") (emit "push %edi") (emit "push %edx") (emit "movl 16(%esp), %esi") ; our code goes here (compile-expr body (- wordsize) symbol-env) ; restore state for return to C (emit "pop %edx") (emit "pop %edi") (emit "pop %esi") (emit "ret") ))
If you want, you can test the result to verify that it still works for all our
previous test cases. Since the same result gets emitted if no labels are
present, that makes sense. However, when the code involves a call to a lambda
,
it doesn't generate the right assembly (output manually indented for
readability):
scheme@(guile-user)> (compile-program '((lambda (x) (primcall + 1 x)) 2)) .text .p2align 4,,15 .globl scheme_entry lbl0: movl $4, %eax movl %eax, -8(%esp) movl -4(%esp), %eax addl -8(%esp), %eax ret scheme_entry: push %esi push %edi push %edx movl 16(%esp), %esi pop %edx pop %edi pop %esi ret
You can see it emits the code for our lambda
's body correctly into lbl0
. The
argument is loaded from offset -4 and the first local starts at -8. However,
it's missing the actual function call from scheme_entry
because we haven't
implemented the new funcall
form yet. To do that, we follow the same process
used for implementing the other special forms.
(define (funcall? x) (eq? (car x) 'funcall)) (define (compile-expr e si env) (cond ... ((funcall? e) (compile-funcall e si env)) ))
Implementing compile-funcall
is a bit tricky, though. Let's start with a basic
version that doesn't have to worry about free variables. The generated code
should:
- Evaluate all the function arguments, storing their outputs into the argument cells on the stack.
- Evaluate the call target
- Save a copy of the closure pointer
- Replace the closure pointer with that of the call target
- Advance
%esp
- Call the new closure's code address
- Upon return, restore
%esp
- Load the old closure pointer from the stack
While evaluating the function arguments and call target, we also have to be careful not to clobber any of the arguments we're building up on the stack. To do that, we just pass a modified version of the stack index that's been shifted so none of the allocations made by those evaluations will overwrite the call parameters. In code, the whole process looks like this:
(define (compile-funcall e si env) (let* ((orig-si si) (call-target (cadr e)) ; stack offset that args start at (args-start (- si (* 2 wordsize))) ; space for return addr and closure (args (cddr e)) ; build stack offsets for each argument ; results in a list of (arg-body stack-offset) pairs (args (map list args (map (lambda (i) (- args-start (* wordsize i))) (iota (length args))))) (eval-si (- args-start (* wordsize (length args))))) ; evaluate arguments from left to right, storing into arg cells (for-each (lambda (arg) (begin (compile-expr (car arg) eval-si env) (emit "movl %eax, ~a(%esp)" (cadr arg)))) args) ; evaluate operator (compile-expr call-target eval-si env) ; store current closure pointer and switch to the new closure (emit "movl %edx, ~a(%esp)" si) (emit "movl %eax, %edx") (emit "subl $~a, %edx" closure-tag) ; advance %esp and call the function (emit "subl $~a, %esp" (- si)) (emit "call *(%edx)") ; restore the stack pointer afterwards and reload our current closure (emit "addl $~a, %esp" (- si)) (emit "movl ~a(%esp), %edx" si) ))
We'll have to come back to this later to add tail-call recursion, but it'll work
for now. There's one more thing we need to support before we can run our first
test program: the closure
special form. We need a way to actually construct
closure objects so that the funcall
implementation can load the code pointer
out of them.
Fortunately for us, this is simple compared to everything else we've been doing. Just allocate some memory, fill the code pointer, and we're done! We'll have to update it later to support free variables, but even that won't be too bad.
(define (closure? x) (eq? (car x) 'closure)) (define (compile-var-load v si env) (let ((value (cadr (assoc v env)))) (case (car value) ... ((label) (emit "movl $~a, %eax" (cadr value))) ))) (define (compile-closure label free-vars si env) ; construct closure object - fetch the label and store code ptr (emit-var-load (lookup lvar env) si env) (emit "movl %eax, 0(%esi)") ; build the tagged pointer (emit "movl %esi, %eax") (emit "orl $~a, %eax" closure-tag) ; advance allocation pointer (emit "addl $~a, %esi" wordsize)) (define (compile-expr e si env) (cond ... ((closure? e) (compile-closure (cadr e) (cddr e) si env)) ))
At this point, we should be able to run a few simple tests, even though we don't have captured variables implemented yet.
scheme@(guile-user)> (compile-and-run '(let ((f (lambda (x) (primcall + 2 x)))) (f 7))) 9 scheme@(guile-user)> (compile-and-run '((lambda (x y) (primcall + x y)) 1 6)) 7
Success! We can define functions using lambda
, then call them and get output!
Next, let's try to get free variables working.
Free/Captured Variables
Adding support for free variables doesn't actually add much complexity, even though it might seem difficult. We already have most of the infrastructure in place. If we want to enable capturing variables, we only need a few changes:
- Include values of captured variables into the closure, in addition to the closure pointer we're already storing
- Bind captured variables into the environment when generating code for the body of a lambda
- Access values of captured variables through the global closure pointer,
%edx
.
First off, we need to augment emit-label-code
to include the captured
variables in the environment. We can tag them with a new captured
symbol, so
we know to load them from the closure pointer instead of from the stack. When
compiling a lambda's body form, free variable N will be stored in the
environment as (captured [closure object offset for value])
. We know the
offset will be (N+1)*wordsize
; the first element of the closure object is the
code pointer, then 1*wordsize and on are the free variables from 0 to N.
; emit code for an individual label form (define (emit-label-code label env) ; switch based on what kind of label it is (case (caadr label) ... ((code) (let* (... (inner-env (append ; arguments ... ; captured variables (map list captured (map (lambda (i) `(captured ,(* wordsize (+ 1 i)))) (iota (length captured)))) env)) ...) ...)) ))
Adding support for this to compile-var-load
is also pretty easy:
(define (compile-var-load v si env) (let ((value (cadr (assoc v env)))) (case (car value) ... ((captured) (emit "movl ~a(%edx), %eax" (cadr value))) )))
At this point, our compiled code should be able to fetch values from its
captured environment. Now we just need to make sure we actually put them there.
To support this, we need to augment compile-closure
to fetch and store the
values of captured variables. This isn't too hard; we just insert a series of
variable loads while constructing the closure object. The captured values
can't be expressions, so we know each of them can be loaded in one
instruction.
The other thing we need to keep in mind is to enforce our memory-allocation
invariants. We're allocating wordsize*(1+len(free_vars))
bytes, but we need to
pad our allocation up to the next 2*wordsize
-aligned size, so the next pointer
we allocate can be properly tagged.
(define (alloc-align x) (logand (+ x 7) -8)) (define (compile-closure label free-vars si env) ... (for-each (lambda (free i) (compile-var-load free si env) (emit "movl %eax, ~a(%esi)" (* 4 (+ 1 i)))) free-vars ; values for free (iota (length free-vars))) ; values for i ... ; advance allocation pointer (emit "addl $~a, %esi" (alloc-align (* wordsize (+ 1 (length free-vars))))))
The implementation of alloc-align
uses some bitwise wizardry I pulled from the
book "Hacker's Delight", which is composed entirely of useful tricks like that.
Anyway, let's try out the new language feature!
scheme@(guile-user)> (compile-and-run '(let ((make-foo (lambda (y) (lambda (x) (primcall + x y))))) ((make-foo 2) 3))) 5
Success! We can define and call the higher-order function make-foo
, generating
a new function that adds 2 to its input. This example actually tests both calls
to functions without any free variables (like make-foo
) as well as calls to
functions that capture their environment (like the generated functions you get
as its output).
Just to make sure we're on the same page, here's the annotated breakdown of what that program's compiled form actually involves:
.text .p2align 4,,15 .globl scheme_entry ; Implementation of make-foo lbl1: ; Construct a closure movl $lbl0, %eax ; Load the code pointer (for the contained lambda) movl %eax, 0(%esi) ; Store it into 0(closure_ptr) movl -4(%esp), %eax ; Load the first argument (y) movl %eax, 4(%esi) ; Store it into 4(closure_ptr), the first free var slot movl %esi, %eax ; Move closure pointer from allocation ptr to eax orl $6, %eax ; Tag the pointer with the closure type tag addl $8, %esi ; Advance allocation pointer 2 words (1 code ptr + 1 free var) ret ; Return to caller ; Implementation of make-foo's contained lambda lbl0: movl -4(%esp), %eax ; Load the first argument movl %eax, -8(%esp) ; Store it into a temp location movl 4(%edx), %eax ; Load the first captured variable addl -8(%esp), %eax ; Add the first arg's value to the captured var's value ret ; Return to the caller ; Main procedure scheme_entry: ; Preamble - jump in from C push %esi push %edi push %edx movl 16(%esp), %esi ; Evaluate the lambda that's bound to make-foo movl $lbl1, %eax ; Load the code pointer (for make-foo) movl %eax, 0(%esi) ; Store it into 0(closure_ptr) movl %esi, %eax ; Copy allocation ptr/closure ptr to eax orl $6, %eax ; Tag the value in eax addl $8, %esi ; Advance alloc ptr 2 words (1 code ptr + 1 padding) movl %eax, -4(%esp) ; Store closure value to stack temporary (make-foo) ; Evaluate ((make-foo 2) 3) movl $12, %eax ; Load the integer value 3 (shifted/tagged) movl %eax, -16(%esp) ; Store into callee's argument location (outer_arg0) ; Evaluate the inner call target movl $8, %eax ; Load the integer value 2 (shifted/tagged) movl %eax, -28(%esp) ; Store into callee's argument location (inner_arg0) movl -4(%esp), %eax ; Load make-foo value from the stack movl %edx, -20(%esp) ; Store current closure pointer on the stack movl %eax, %edx ; Load make-foo as current closure and strip the tag subl $6, %edx subl $20, %esp ; Move stack pointer up above top-level local data call *(%edx) ; Call to make-foo with inner_arg0 as parameter addl $20, %esp ; Move stack pointer back down movl -20(%esp), %edx ; Restore previous closure pointer ; At this point, eax is the return value of (make-foo 2) ; Execute the outer call ; We already have 3 stored in the parameter slot from earlier movl %edx, -8(%esp) ; Store current closure pointer on the stack movl %eax, %edx ; Move the result of (make-foo 2) into closure pointer subl $6, %edx ; Strip the tag from it subl $8, %esp ; Move stack pointer up above our local data call *(%edx) ; Call to generated function with outer_arg0 as param addl $8, %esp ; Restore stack pointer movl -8(%esp), %edx ; Restore previous closure pointer ; At this point, eax is the return value of the outer call ; Postamble - clean up pop %edx pop %edi pop %esi ret
You might have noticed that the generated assembly is super inefficient. GCC
or Clang would optimize something similar down to movl $20, %eax
. We'll add
optimizations of our own later on, but for now we're primarily concerned with
getting the right answer. Computers are fast, and for the moment it's acceptable
to trade away performance in favor of maximal clarity.
There's one last thing to add here if we want to achieve a compliant Scheme
implementation, though: tail-call optimziation. When evaluating a recursive
function like (define (sum xs accum) (if (null? xs) accum (sum (cdr xs) (+ (car xs) accum))))
, the output of the recursive call (to sum
, here) is used
directly as the return value of the contained function. In such cases, the
standard requires that we not perform a call
, but instead directly jmp
to
our call target. Since the return value will be the same either way, there's no
difference from the perspective of the caller.
This optimization also allows infinite recursion without overflowing the stack, as long as the infinite recursion chain consists only of tail calls.
Tail-Call Optimization
Actually implementing tail calls requires two main pieces:
- Figuring out when to do a tail-call and when to do a normal call
- Modifying
funcall
to replace our stack frame for a tail-call
While we could pass an is-tailcall
value through along with the environment to
achieve the first part, that could get hairy and might be error-prone. Instead,
we'll use another code-transformation pass, converting funcall
to tailcall
in situations where a tail-call is appropriate. This cleanly separates the
"should we do a tail call" logic from the actual code generation, and allows us
to test the two independently.
After adding the new pass, the whole compilation process will look like this:
input program | V ---------------- | annotate | | free | | variables | ---------------- | V ---------------- | convert to | | labels form | ---------------- | V ---------------- | funcall to | | tailcall | | annotation | ---------------- | V ---------------- | code | | generator | ---------------- | V asm code
The implementation of the transformation pass isn't too complex. All we need to
do is recursively traverse through all the forms which act as return values for
their parent form, replacing funcall
with tailcall
. We don't have to worry
about situations like (primcall car (lambda ...) foo)
, since all lambda
s
will have been replaced by label/closure pairs already. Unfortunately, we can't
treat the root form as a candidate for tailcall
conversion, since it needs to
do cleanup before returning to C.
; Convert `funcall`s in tail-call position into `tailcall`s ; ; Accepts a `labels`-form program as input, and applies the described ; transformation. The two operations differ in their names, but are otherwise ; equivalent in structure. Since the input is in labels form, we don't need to ; traverse down to subtrees we know will never be tailcalls. (define (xform-tailcall-opt pgm) (let* ((label-forms (cadr pgm)) (body-form (caddr pgm)) ) ; recursively annotate forms that are the return value of their parent (define (annotate pgm) (cond ((not (list? pgm)) pgm) ; non-list stuff is unchanged ((null? pgm) pgm) ; bodies of if exprs are top-level, but cond isn't ((if? pgm) `(if ,(cadr pgm) ,@(map annotate (cddr pgm)))) ; only last body form of a let expr is top-level ((let? pgm) (let ((rev (reverse pgm))) (reverse (cons (annotate (car rev)) (cdr rev))))) ; funcall turns into tailcall at top level ; ; arg forms aren't top-level, and so remain unchanged ((funcall? pgm) (cons 'tailcall (cdr pgm))) ; primcall args aren't tailcalls, so we don't need to do anything ((primitive-call? pgm) pgm) ; closures can't have full forms as arguments ((closure? pgm) pgm) )) (define (annotate-label lbl-form) ; annotate contents of label, but leave the wrapper alone (case (caadr lbl-form) ((code) ; code label (let ((code-form (cadr lbl-form))) `(,(car lbl-form) (code ,(cadr code-form) ; args ,(caddr code-form) ; free vars ,(annotate (cadddr code-form)))))) ; annotated body (else lbl-form) ) ) `(labels ,(map annotate-label label-forms) ,body-form))) (define (precompile-transform program) (xform-tailcall-opt (xform-labels (car (xform-annotate-free-vars program '())))))
Now we need to change compile-expr
to handle the new kind of form. Since the
generated code for tailcall
and funcall
are very similar, we can use
compile-funcall
to handle both of them, and just add a parameter for whether
or not to perform tail-call optimization.
(define (tailcall? x) (eq? (car x) 'tailcall)) (define (compile-funcall e si env is-tailcall) ...) (define (compile-expr e si env) (cond ... ((funcall? e) (compile-funcall e si env #f)) ((tailcall? e) (compile-funcall e si env #t)) ... ))
Once compile-expr
knows how to handle the new type of form, we just have to
implement the tail-call operation itself. Implementing tail calls is very
similar to implementing normal function calls; the only differences are:
- We have to copy the arguments down into our own stack frame before the call
- We don't need to save the closure pointer or update
%esp
- We use
jmp
instead ofcall
- There's no cleanup code afterwards, since we never return
Since we're essentially replacing the words in our stack cells with the outputs of the argument evaluation, we also have to make sure we copy upwards along the stack. That way, even if the callee has more arguments than the caller, we can still ensure that the evaluated arguments are copied correctly, since the args stored lower in the stack will have already been copied by the time they get overwritten by the callee's args.
(define (compile-funcall e si env is-tailcall) (let* (...) ... (if is-tailcall (begin ; tailcall - copy args down and jump ; copy args (for-each (lambda (arg tgt) (begin (emit "movl ~a(%esp), %eax" (cadr arg)) (emit "movl %eax, ~a(%esp)" (- (- wordsize) (* wordsize tgt))))) args (iota (length args))) (emit "jmp *(%edx)")) (begin ; no tailcall - use normal mechanism ...)) ))
I'll omit the demonstration here, since we can't actually show the usefulness of
this feature yet. Our let
bindings aren't visible inside their own definitions
and we don't have define
, so we can't easily define the kind of recursive
functions that would benefit from tail-call optimizations. Regardless, our
Scheme can now define and call functions,
pass functions to other higher-order functions, and other similar capabilities
you'd expect from a functional programming language. Next time, we'll add
support for complex constants using the quote
form and start adding support
for higher-level forms like let*
or cond
.