Let's Build a Compiler 04: Function Calls

30 minute read Published: 2019-07-14

In the last post, we added support for local variables, conditional expressions, and heap-allocated objects. This time around, we'll augment our compiler with the ability to call functions, handle closures, and provide tail-call optimization.

The Mighty Lambda

In scheme, functions are created by the lambda special form, which looks like (lambda (arg0 arg1 arg2) body-form). Implementing it is going to require several new features that our compiler doesn't have yet. Let's look at the steps which occur (from the programmer's perspective) when they evaluate a lambda form:

  1. The current bindings of any variables referenced by the lambda aside from its parameters are captured and stored, so calls to the result of the lambda can use them even after the current function has completed.
  2. The call returns a closure object which can be returned and stored like any other data, as well as being used to actually call the function.

From a high-level perspective, our implementation of lambda will look something like this:

This means we'll have to start preprocessing our input code before we actually generate any assembly instructions. To start with, let's define an alternative representation of the input code that makes those three steps a bit simpler. Defining the structure by example, the expression (let ((y 4)) ((lambda (x) (+ y x)) 3)) would be converted to:

(labels ((lbl0 (code (x) (y) (primcall + y x))))
  (let ((y 4))
    (funcall (closure lbl0 y) 3))

In other words, we add a new root labels form, which wraps a series of labelled code regions and a primary "main" form that's evaluated when the program runs. Before we move on to any actual code generation, let's work out how to transform the input code into this augmented form.

Program Transformations

The first transformation we need to implement is annotation of free variables. We can implement this as a recursive function that takes an expression and its environment, then returns the expression's annotated form along with a list of free variables. As we run the analysis recursively, a list of bound variables flows down the call stack from the root to the leaves. At the same time, a list of free variables flows upwards from the leaves to the root. We collect all the free variables at each node and deduplicate them, then pass them upwards.

When handling the recursive traversal, there's a few cases we need to deal with:

Here's an implementation of the simple cases:

(define (lambda? x) (eq? (car x) 'lambda))

; utility function removing duplicate elements in a list
(define (remove-duplicates xs)
  (if (null? xs) xs
      (cons (car xs)
            (remove-duplicates (delq (car xs) (cdr xs))))))

(define (xform-annotate-free-vars e env)
  (cond
    ((null? e) (list e '()))

    ; if the var is bound, then don't add it to the free list
    ((variable? e)
     (list e (if (member e env) '() (list e))))

    ; catch-all for immediates
    ((not (list? e)) (list e '()))

    ; handle if form so we don't end up with 'if' as a free var
    ((if? e)
     (let* ((parts (map (lambda (e) (xform-annotate-free-vars e env)) (cdr e)))
            (annotated (map car parts))
            (free-vars (map cadr parts)))
       (list `(if ,@annotated)
             (remove-duplicates (apply append free-vars)))))

    ; deal with lambdas
    ((lambda? e)
     (let* ((args (cadr e))
            (body-form (caddr e))

            ; treat the outer environment as unbound 
            (annotated-body (xform-annotate-free-vars body-form args))

            ; free variables list shouldn't include args, even though they'll be
            ; output as free when analyzing the body
            (inner-free (filter (lambda (v) (not (memq v args)))
                                (cadr annotated-body))))
       (list `(lambda ,args ,inner-free ,(car annotated-body))

             ; outgoing free vars shouldn't include parts of our input env
             (filter (lambda (v) (not (memq v env))) inner-free))))
  ))

At this point we still have to handle primitive calls, normal calls, and let expressions. The implementations for both types of call are almost identical. For each of them, we recursively annotate the parameter forms using our own environment. Then, return the reassembled call form alongside the deduplicated list of free variables.

(define (xform-annotate-free-vars e env)
  (cond
    ...

    ; primitive call - ignore the function, map over args
    ((primitive-call? e)
     (let* ((results (map (lambda (p) (xform-annotate-free-vars p env))
                          (cddr e)))
            (annotated (map car results))
            (free (apply append (map cadr results))))
       (list `(primcall ,(cadr e) ,@annotated)
             (remove-duplicates free))))

    (else (let* ((results (map (lambda (p) (xform-annotate-free-vars p env)) e))
                 (annotated (map car results))
                 (free (apply append (map cadr results))))
            (list annotated (remove-duplicates free))))
  ))

Finally, we need to handle let expressions. This one's a lot more complex than the other expression types since we need to deal with two parallel environments at the same time. The process goes something like this:

  1. Recursively annotate each of the binding values passed to let, using our input environment. Since let doesn't bind any variables until all parameters have been evaluated, we have to separate the "inner" environment from the outer one.
  2. Collect the free variables from all binding values together.
  3. Add all the newly-bound names to the environment, forming the "inner environment" in which the body forms will be evaluated.
  4. Recursively annotate each body form using the inner environment.
  5. Collect the free variables from the binding values together with those from each body form together, yielding the final free variable list.
  6. Reassemble and return the let expression with annotated versions in place of each binding value and body form.

In code, the whole process looks like this:

(define (xform-annotate-free-vars e env)
  (cond
    ...

    ; check for let form and apply bindings
    ((let? e)
     (let* ((bindings (cadr e))
            (body-forms (cddr e))
            (bind-names (map car bindings))
            (bind-bodies (map cadr bindings))

            ; analyze all binding bodies
            (body-free (map (lambda (e) (xform-annotate-free-vars e env))
                            bind-bodies))
            (new-bodies (map car body-free))
            (body-free (apply append (map cadr body-free)))

            (inner-env (append bind-names env))
            (inner-annotated
              (map (lambda (b) (xform-annotate-free-vars b inner-env))
                   body-forms)))
       (list `(let ,(map list bind-names new-bodies) ,@(map car inner-annotated))
             (remove-duplicates (append (apply append (map cadr inner-annotated))
                                        body-free)))))

    ...
  ))

Now we can test our new annotation pass! It's not actually plugged into the code generator yet, but since it's a pure function that just transforms the input syntax tree, that's okay. We can test it interactively:

scheme@(guile-user)> (xform-annotate-free-vars '(if x (lambda (y z) (primcall + x (primcall + y z)))) '())
((if x (lambda (y z) (x) (primcall + x (primcall + y z)))) (x))
scheme@(guile-user)> (xform-annotate-free-vars '(let ((x 1) (y 2)) (primcall + x y)) '())
((let ((x 1) (y 2)) (primcall + x y)) ())

Now that we can tell the transformation pass attaches the free variables to the lambda form, we can move on to generating the top-level labels form based on a given input program that's already been run through the lambda-annotation transform. The implementation here is relatively straightforward; it recursively traverses the input form, replacing lambdas and normal function calls as needed.

; Transform a program in annotated lambda form into 'labels' form
;
; Given a program in annotated lambda form (as produced by
; xform-annotate-free-vars) this will convert it into (labels ((...)) body)
; form, by generating names for each lambda and hoisting their bodies and
; arguments into new labels, then replacing the lambdas with `funcall`s.
(define (xform-labels pgm)
  (let ()
    (define next-label-tag 0)
    (define label-forms '())

    ; add a label with the given contents, returning its name as a symbol
    (define (make-label form)
      (let* ((new-tag next-label-tag)
             (tag-name (string->symbol (format #f "lbl~a" new-tag)))
             (new-label-forms (cons (list tag-name form) label-forms)))
        (begin (set! next-label-tag (+ 1 new-tag))
               (set! label-forms new-label-forms)
               tag-name)))

    (define (xform pgm)
      (cond
        ((not (list? pgm)) pgm) ; non-list stuff is unchanged
        ((null? pgm) pgm)

        ; transform lambdas into labels
        ((lambda? pgm)
         (let* ((arguments (cadr pgm))
                (free-vars (caddr pgm))
                (body (xform (cadddr pgm)))
                (name (make-label `(code ,arguments ,free-vars ,body))))
           `(closure ,name ,@free-vars)))

        ; recursively transform let expressions
        ((let? pgm)
         (let* ((bindings (cadr pgm))
                (body (cddr pgm))
                (binding-names (map car bindings))
                (binding-values (map cadr bindings)))
           `(let ,(map list binding-names (map xform binding-values))
                 ,@(map xform body))))

        ; handle function calls and if exprs
        ((if? pgm) `(if ,@(map xform (cdr pgm))))
        ((primitive-call? pgm)
         `(primcall ,(primitive-op pgm)
                    ,@(map xform (primitive-op-args pgm))))

        ; turn normal function calls into funcall forms
        (else `(funcall ,(xform (car pgm))
                        ,@(map xform (cdr pgm))))
        ))

    (let ((new-pgm (xform pgm)))
      `(labels ,label-forms
               ,new-pgm))
    ))

The main thing about this implementation that's worth explaining is the usage of internal definitions and mutable state to simplify the implementation. Scheme specifies that a define form that appears inside the body of a let* (or one of several other forms) is only visible from inside that form. The implementation here uses the internal definitions next-label-tag and label-forms to keep the current label tag and the list of emitted labels in mutable state, updated via the set! function.

Every invocation of the internal make-label function will generate a new label, add the passed label form to the output list, and return the name associated with it.

Finally, we'll define a helper function to thread the two transformation passes together:

(define (precompile-transform program)
  (xform-labels (car (xform-annotate-free-vars program '()))))

Let's plug our example from earlier into this, just to sanity-check that our implementation is working properly. It won't be formatted as nicely, but we should get the same data structure out:

scheme@(guile-user)> (precompile-transform '(let ((y 4)) ((lambda (x) (primcall + y x)) 3)))
(labels ((lbl0 (code (x) (y) (primcall + y x)))) (let ((y 4)) (funcall (closure lbl0 y) 3)))

Generating Code for Label-Structured Programs

Now that we can transform the input program, we need to change our code-generation functions to handle the new labels representation in place of the raw input it was dealing with earlier. Fortunately, the whole point of the labels representation is to make this job easy. The first thing we need to do is rewrite our compile-program function to run transformation passes on the input, then emit code for the resulting labels form:

; emit code for the top-level labels form
(define (emit-labels labels body)
  ...
  )

(define (compile-program program)
  (set! current-label 0)
  (let ((intermediate (precompile-transform program)))
    (emit-labels (cadr intermediate)
                 (caddr intermediate))))

Previously, our code generator could just emit a linear sequence of assembly, optionally with labels in the case of an if form. Now that functions come into play, our execution flow can become nonlinear, so we have to emit each labelled form as its own chunk of assembly, then finally emit code for the top-level body.

In other words, the assembly output from our example function would look something like this:

lbl0:
  [code to evaluate lambda]
  ret

scheme_entry:
  push %esi
  push %edi
  push %edx
  movl 16(%esp), %esi

  [code to build closure]
  [code to evaluate 3]
  [code to call closure with 3]

  pop %edx
  pop %edi
  pop %esi
  ret

Before we get into how to actually generate the code, let's go over the strategy we'll be using for passing arguments and captured variables to code forms. In order to generate their code, we'll have to define some convention for how they get the needed parameters. To start with, let's deal with function arguments and leave captured closure values for later.

Since this is a Scheme, we have to keep in mind that storing and passing functions around is a perfectly valid thing for code to do. To support this, we'll define the memory layout of a closure object. This needs to store both the address of the code to evaluate it and the values of any captured variables. In memory, it'll look something like this:

                  CLOSURE
 -----------------------------------------
 |   code    |  captured |  captured |
 |  pointer  |  value 0  |  value 1  | ...
 |           |           |           |
 -----------------------------------------
  00 01 02 03 04 05 06 07 08 09 ...

When we evaluate a closure form, we have to allocate one of these objects, fill it with any captured values, and then return it. We can then pass a pointer to this object into the called function so it can get the appropriate values for its free variables. Since we're controlling the whole environment, we can assign a register that will always hold a pointer to the current closure. Then, we just need to save and restore its value around calls. We're already saving %edx, so we'll recycle it for this purpose. If you recall, %edx gets overwritten when we multiply integers, so we have to add some save/restore logic there.

(define (compile-primitive-call form si env)
  (case (primitive-op form)
    ...

    ; * - multiplication of two fixnums
    ((*)
     (compile-expr (primitive-op-arg1 form) si env)
     (emit "movl %eax, ~a(%esp)" si)
     (compile-expr (primitive-op-arg2 form) (- si wordsize) env)
     (emit "shrl $~a, %eax" fixnum-shift) ; make sure only one is shifted
     (emit "movl %edx, %edi")
     (emit "imull ~a(%esp), %eax" si)
     (emit "movl %edi, %edx"))

    ...

    ; (set-car! pair obj) - mutate the car of a pair
    ((set-car!)
     ...
     (emit "movl ~a(%esp), %edi" si)
     (emit "movl %eax, ~a(%edi)" (- pair-tag)))

    ; (set-car! pair obj) - mutate the car of a pair
    ((set-cdr!)
     ...
     (emit "movl ~a(%esp), %edi" si)
     (emit "movl %eax, ~a(%edi)" (- 4 pair-tag)))

    ...
  ))

Similar to the way it works in C, we'll be passing function arguments on the stack. From the caller's perspective, the stack looks like this before the call takes place:

   low addresses
  ---------------
  |             |
  |     ...     |
  |             |
  |-------------|
  |    arg2     | \
  |    arg1     |  >--- function arguments
  |    arg0     | /
  |-------------|
  | return addr | ----- return address for callee
  |-------------|
  | closure ptr | ----- caller's saved closure pointer
  |-------------|
  |   local2    | \
  |   local1    |  >--- caller's locals
  |   local0    | /
  |-------------|
  |     esi     | <---- %esp
  |     edi     |
  |     edx     |
  |-------------|
  |     ...     |
  |-------------|
   high addresses

The implementation of funcall has to evaluate the arguments, store them into the appropriate locations, save the closure pointer, then perform the call instruction. Immediately after the call instruction in the caller, the callee's view of the stack will look like this:

   low addresses
  ---------------
  |             |
  |     ...     |
  |             | ----- space for locals
  |-------------|
  |    arg2     | \
  |    arg1     |  >--- function arguments
  |    arg0     | /
  |-------------|
  | return addr | <---- %esp
  |-------------|
  | closure ptr | <---- caller's saved closure pointer
  |-------------|
  |   local2    | \
  |   local1    |  >--- caller's locals
  |   local0    | /
  |-------------|
  |     ...     |
  |-------------|
   high addresses

Now that we have some idea of the way our generated code will interact with its caller, we can start updating the code generator. First, we define how to emit assembly code implementing a passed code label.

; emit code for an individual label form
(define (emit-label-code label env)
  ; switch based on what kind of label it is
  (case (caadr label)
    ; for code labels, construct an environment and generate code under the
    ; appropriate label
    ((code)
     (let*
       ((code-form (cadr label))
        (name (car label))
        (args (cadr code-form))
        (captured (caddr code-form))
        (body (cadddr code-form))
        (inner-env
          (append (map list args
                       (map (lambda (i) `(var ,(* (- wordsize)
                                                  (+ 1 i))))
                            (iota (length args))))
                  env))
        (locals-start (- (* wordsize (+ 1 (length args))))))
       (emit "~a:" name)
       (compile-expr body locals-start inner-env)
       (emit "ret")))
    ))

If we were to test it, though, that implementation would fail. Since we're now dealing with more than just stack variables in our environment, we have to add a way to keep track of how to access something. As you might notice, emit-label-code populates the environment with pairs that look like (name (var -4)), rather than the (name -4) we were using previously. We're not going to deal with other types of variable yet, but we do have to change the implementations of compile-var-load and compile-let to handle this new format:

(define (compile-var-load v si env)
  (let ((value (cadr (assoc v env))))
    (case (car value)
      ; stack variable tracked by offset
      ((var) (emit "movl ~a(%esp), %eax" (cadr value)))
      )))

(define (compile-let bindings body si env)
  (let* ((stack-offsets (map (lambda (x) `(var ,(- si (* x wordsize))))
                             (range 0 (length bindings))))
         (names (map car bindings))
         (exprs (map cadr bindings))
         (inner-si (- si (* (length bindings) wordsize)))
         (inner-env (append (map list names stack-offsets) env)))
    ; evaluate exprs and assign them to stack locations
    (for-each (lambda (expr offset)
                (compile-expr expr inner-si env)
                (emit "movl %eax, ~a(%esp)" (cadr offset)))
              exprs stack-offsets)

    ; evaluate all body forms - this will leave the last one's output in %eax
    (for-each (lambda (form) (compile-expr form inner-si inner-env)) body)
    ))

Once we've defined how to emit code for a single label, we just have to link that with the larger compilation process by defining emit-labels. The contents aren't that complex - it basically just calls emit-label-code on each label, then generates the scheme_entry symbol used to run the program along with the normal preamble and clean-up instructions.

; emit code for the top-level labels form
(define (emit-labels labels body)
  (let* (; get an environment mapping each label to its asm string label
         (symbol-env (map (lambda (x) `(,(car x) (label ,(car x)))) labels)))

    ; emit text section preamble
    (emit ".text")
    (emit ".p2align 4,,15")
    (emit ".globl scheme_entry")

    ; emit code for each label
    (for-each
      (lambda (lbl) (emit-label-code lbl symbol-env))
      labels)

    ; emit the top-level form
    ; handle incoming call from C
    (emit "scheme_entry:")
    (emit "push %esi")
    (emit "push %edi")
    (emit "push %edx")
    (emit "movl 16(%esp), %esi")
    
    ; our code goes here
    (compile-expr body (- wordsize) symbol-env)
    
    ; restore state for return to C
    (emit "pop %edx")
    (emit "pop %edi")
    (emit "pop %esi")
    (emit "ret")
    ))

If you want, you can test the result to verify that it still works for all our previous test cases. Since the same result gets emitted if no labels are present, that makes sense. However, when the code involves a call to a lambda, it doesn't generate the right assembly (output manually indented for readability):

scheme@(guile-user)> (compile-program '((lambda (x) (primcall + 1 x)) 2))
.text
.p2align 4,,15
.globl scheme_entry

lbl0:
  movl $4, %eax
  movl %eax, -8(%esp)
  movl -4(%esp), %eax
  addl -8(%esp), %eax
  ret

scheme_entry:
  push %esi
  push %edi
  push %edx
  movl 16(%esp), %esi
  pop %edx
  pop %edi
  pop %esi
  ret

You can see it emits the code for our lambda's body correctly into lbl0. The argument is loaded from offset -4 and the first local starts at -8. However, it's missing the actual function call from scheme_entry because we haven't implemented the new funcall form yet. To do that, we follow the same process used for implementing the other special forms.

(define (funcall? x) (eq? (car x) 'funcall))

(define (compile-expr e si env)
  (cond
    ...
    ((funcall? e) (compile-funcall e si env))
    ))

Implementing compile-funcall is a bit tricky, though. Let's start with a basic version that doesn't have to worry about free variables. The generated code should:

  1. Evaluate all the function arguments, storing their outputs into the argument cells on the stack.
  2. Evaluate the call target
  3. Save a copy of the closure pointer
  4. Replace the closure pointer with that of the call target
  5. Advance %esp
  6. Call the new closure's code address
  7. Upon return, restore %esp
  8. Load the old closure pointer from the stack

While evaluating the function arguments and call target, we also have to be careful not to clobber any of the arguments we're building up on the stack. To do that, we just pass a modified version of the stack index that's been shifted so none of the allocations made by those evaluations will overwrite the call parameters. In code, the whole process looks like this:

(define (compile-funcall e si env)
  (let* ((orig-si si)
         (call-target (cadr e))

         ; stack offset that args start at
         (args-start (- si (* 2 wordsize))) ; space for return addr and closure

         (args (cddr e))

         ; build stack offsets for each argument
         ; results in a list of (arg-body stack-offset) pairs
         (args (map list args
                    (map (lambda (i) (- args-start (* wordsize i)))
                         (iota (length args)))))

         (eval-si (- args-start (* wordsize (length args)))))
    ; evaluate arguments from left to right, storing into arg cells
    (for-each
      (lambda (arg) (begin (compile-expr (car arg) eval-si env)
                           (emit "movl %eax, ~a(%esp)" (cadr arg))))
      args)

    ; evaluate operator
    (compile-expr call-target eval-si env)

    ; store current closure pointer and switch to the new closure
    (emit "movl %edx, ~a(%esp)" si)
    (emit "movl %eax, %edx")
    (emit "subl $~a, %edx" closure-tag)

    ; advance %esp and call the function
    (emit "subl $~a, %esp" (- si))
    (emit "call *(%edx)")

    ; restore the stack pointer afterwards and reload our current closure
    (emit "addl $~a, %esp" (- si))
    (emit "movl ~a(%esp), %edx" si)
  ))

We'll have to come back to this later to add tail-call recursion, but it'll work for now. There's one more thing we need to support before we can run our first test program: the closure special form. We need a way to actually construct closure objects so that the funcall implementation can load the code pointer out of them.

Fortunately for us, this is simple compared to everything else we've been doing. Just allocate some memory, fill the code pointer, and we're done! We'll have to update it later to support free variables, but even that won't be too bad.

(define (closure? x) (eq? (car x) 'closure))

(define (compile-var-load v si env)
  (let ((value (cadr (assoc v env))))
    (case (car value)
      ...
      ((label) (emit "movl $~a, %eax" (cadr value)))
      )))

(define (compile-closure label free-vars si env)
  ; construct closure object - fetch the label and store code ptr
  (emit-var-load (lookup lvar env) si env)
  (emit "movl %eax, 0(%esi)")

  ; build the tagged pointer
  (emit "movl %esi, %eax")
  (emit "orl $~a, %eax" closure-tag)

  ; advance allocation pointer
  (emit "addl $~a, %esi" wordsize))

(define (compile-expr e si env)
  (cond
    ...
    ((closure? e) (compile-closure (cadr e) (cddr e) si env))
    ))

At this point, we should be able to run a few simple tests, even though we don't have captured variables implemented yet.

scheme@(guile-user)> (compile-and-run '(let ((f (lambda (x) (primcall + 2 x)))) (f 7)))
9
scheme@(guile-user)> (compile-and-run '((lambda (x y) (primcall + x y)) 1 6))
7

Success! We can define functions using lambda, then call them and get output! Next, let's try to get free variables working.

Free/Captured Variables

Adding support for free variables doesn't actually add much complexity, even though it might seem difficult. We already have most of the infrastructure in place. If we want to enable capturing variables, we only need a few changes:

First off, we need to augment emit-label-code to include the captured variables in the environment. We can tag them with a new captured symbol, so we know to load them from the closure pointer instead of from the stack. When compiling a lambda's body form, free variable N will be stored in the environment as (captured [closure object offset for value]). We know the offset will be (N+1)*wordsize; the first element of the closure object is the code pointer, then 1*wordsize and on are the free variables from 0 to N.

; emit code for an individual label form
(define (emit-label-code label env)
  ; switch based on what kind of label it is
  (case (caadr label)
    ...
    ((code)
     (let*
       (...
        (inner-env
          (append
            ; arguments
            ...

            ; captured variables
            (map list captured
                 (map (lambda (i) `(captured ,(* wordsize (+ 1 i))))
                      (iota (length captured))))
            env))
        ...)
       ...))
    ))

Adding support for this to compile-var-load is also pretty easy:

(define (compile-var-load v si env)
  (let ((value (cadr (assoc v env))))
    (case (car value)
      ...
      ((captured) (emit "movl ~a(%edx), %eax" (cadr value)))
      )))

At this point, our compiled code should be able to fetch values from its captured environment. Now we just need to make sure we actually put them there. To support this, we need to augment compile-closure to fetch and store the values of captured variables. This isn't too hard; we just insert a series of variable loads while constructing the closure object. The captured values can't be expressions, so we know each of them can be loaded in one instruction.

The other thing we need to keep in mind is to enforce our memory-allocation invariants. We're allocating wordsize*(1+len(free_vars)) bytes, but we need to pad our allocation up to the next 2*wordsize-aligned size, so the next pointer we allocate can be properly tagged.

(define (alloc-align x) (logand (+ x 7) -8))

(define (compile-closure label free-vars si env)
  ...
  (for-each
    (lambda (free i)
      (compile-var-load free si env)
      (emit "movl %eax, ~a(%esi)" (* 4 (+ 1 i))))
    free-vars ; values for free
    (iota (length free-vars))) ; values for i
  ...
  ; advance allocation pointer
  (emit "addl $~a, %esi"
        (alloc-align (* wordsize (+ 1 (length free-vars))))))

The implementation of alloc-align uses some bitwise wizardry I pulled from the book "Hacker's Delight", which is composed entirely of useful tricks like that.

Anyway, let's try out the new language feature!

scheme@(guile-user)> (compile-and-run '(let ((make-foo (lambda (y) (lambda (x) (primcall + x y))))) ((make-foo 2) 3)))
5

Success! We can define and call the higher-order function make-foo, generating a new function that adds 2 to its input. This example actually tests both calls to functions without any free variables (like make-foo) as well as calls to functions that capture their environment (like the generated functions you get as its output).

Just to make sure we're on the same page, here's the annotated breakdown of what that program's compiled form actually involves:

.text
.p2align 4,,15
.globl scheme_entry

; Implementation of make-foo
lbl1:
  ; Construct a closure
  movl $lbl0, %eax      ; Load the code pointer (for the contained lambda)
  movl %eax, 0(%esi)    ; Store it into 0(closure_ptr)
  movl -4(%esp), %eax   ; Load the first argument (y)
  movl %eax, 4(%esi)    ; Store it into 4(closure_ptr), the first free var slot
  movl %esi, %eax       ; Move closure pointer from allocation ptr to eax
  orl $6, %eax          ; Tag the pointer with the closure type tag
  addl $8, %esi         ; Advance allocation pointer 2 words (1 code ptr + 1 free var)
  ret                   ; Return to caller

; Implementation of make-foo's contained lambda
lbl0:
  movl -4(%esp), %eax   ; Load the first argument
  movl %eax, -8(%esp)   ; Store it into a temp location
  movl 4(%edx), %eax    ; Load the first captured variable
  addl -8(%esp), %eax   ; Add the first arg's value to the captured var's value
  ret                   ; Return to the caller

; Main procedure
scheme_entry:
  ; Preamble - jump in from C
  push %esi
  push %edi
  push %edx
  movl 16(%esp), %esi

  ; Evaluate the lambda that's bound to make-foo
  movl $lbl1, %eax      ; Load the code pointer (for make-foo)
  movl %eax, 0(%esi)    ; Store it into 0(closure_ptr)
  movl %esi, %eax       ; Copy allocation ptr/closure ptr to eax
  orl $6, %eax          ; Tag the value in eax
  addl $8, %esi         ; Advance alloc ptr 2 words (1 code ptr + 1 padding)
  movl %eax, -4(%esp)   ; Store closure value to stack temporary (make-foo)

  ; Evaluate ((make-foo 2) 3)
  movl $12, %eax        ; Load the integer value 3 (shifted/tagged)
  movl %eax, -16(%esp)  ; Store into callee's argument location (outer_arg0)

  ; Evaluate the inner call target
  movl $8, %eax         ; Load the integer value 2 (shifted/tagged)
  movl %eax, -28(%esp)  ; Store into callee's argument location (inner_arg0)
  movl -4(%esp), %eax   ; Load make-foo value from the stack
  movl %edx, -20(%esp)  ; Store current closure pointer on the stack
  movl %eax, %edx       ; Load make-foo as current closure and strip the tag
  subl $6, %edx
  subl $20, %esp        ; Move stack pointer up above top-level local data
  call *(%edx)          ; Call to make-foo with inner_arg0 as parameter
  addl $20, %esp        ; Move stack pointer back down
  movl -20(%esp), %edx  ; Restore previous closure pointer
  ; At this point, eax is the return value of (make-foo 2)

  ; Execute the outer call
  ; We already have 3 stored in the parameter slot from earlier
  movl %edx, -8(%esp)   ; Store current closure pointer on the stack
  movl %eax, %edx       ; Move the result of (make-foo 2) into closure pointer
  subl $6, %edx         ; Strip the tag from it
  subl $8, %esp         ; Move stack pointer up above our local data
  call *(%edx)          ; Call to generated function with outer_arg0 as param
  addl $8, %esp         ; Restore stack pointer
  movl -8(%esp), %edx   ; Restore previous closure pointer
  ; At this point, eax is the return value of the outer call

  ; Postamble - clean up
  pop %edx
  pop %edi
  pop %esi
  ret

You might have noticed that the generated assembly is super inefficient. GCC or Clang would optimize something similar down to movl $20, %eax. We'll add optimizations of our own later on, but for now we're primarily concerned with getting the right answer. Computers are fast, and for the moment it's acceptable to trade away performance in favor of maximal clarity.

There's one last thing to add here if we want to achieve a compliant Scheme implementation, though: tail-call optimziation. When evaluating a recursive function like (define (sum xs accum) (if (null? xs) accum (sum (cdr xs) (+ (car xs) accum)))), the output of the recursive call (to sum, here) is used directly as the return value of the contained function. In such cases, the standard requires that we not perform a call, but instead directly jmp to our call target. Since the return value will be the same either way, there's no difference from the perspective of the caller.

This optimization also allows infinite recursion without overflowing the stack, as long as the infinite recursion chain consists only of tail calls.

Tail-Call Optimization

Actually implementing tail calls requires two main pieces:

  1. Figuring out when to do a tail-call and when to do a normal call
  2. Modifying funcall to replace our stack frame for a tail-call

While we could pass an is-tailcall value through along with the environment to achieve the first part, that could get hairy and might be error-prone. Instead, we'll use another code-transformation pass, converting funcall to tailcall in situations where a tail-call is appropriate. This cleanly separates the "should we do a tail call" logic from the actual code generation, and allows us to test the two independently.

After adding the new pass, the whole compilation process will look like this:

    input program
         |
         V
  ----------------
  |   annotate   |
  |     free     |
  |   variables  |
  ----------------
         |
         V
  ----------------
  |  convert to  |
  | labels form  |
  ----------------
         |
         V
  ----------------
  |  funcall to  |
  |   tailcall   |
  |  annotation  |
  ----------------
         |
         V
  ----------------
  |     code     |
  |   generator  |
  ----------------
         |
         V
      asm code

The implementation of the transformation pass isn't too complex. All we need to do is recursively traverse through all the forms which act as return values for their parent form, replacing funcall with tailcall. We don't have to worry about situations like (primcall car (lambda ...) foo), since all lambdas will have been replaced by label/closure pairs already. Unfortunately, we can't treat the root form as a candidate for tailcall conversion, since it needs to do cleanup before returning to C.

; Convert `funcall`s in tail-call position into `tailcall`s
;
; Accepts a `labels`-form program as input, and applies the described
; transformation. The two operations differ in their names, but are otherwise
; equivalent in structure. Since the input is in labels form, we don't need to
; traverse down to subtrees we know will never be tailcalls.
(define (xform-tailcall-opt pgm)
  (let* ((label-forms (cadr pgm))
         (body-form (caddr pgm))
         )

    ; recursively annotate forms that are the return value of their parent
    (define (annotate pgm)
      (cond
        ((not (list? pgm)) pgm) ; non-list stuff is unchanged
        ((null? pgm) pgm)

        ; bodies of if exprs are top-level, but cond isn't
        ((if? pgm) `(if ,(cadr pgm) ,@(map annotate (cddr pgm))))

        ; only last body form of a let expr is top-level
        ((let? pgm)
         (let ((rev (reverse pgm)))
           (reverse (cons (annotate (car rev))
                          (cdr rev)))))

        ; funcall turns into tailcall at top level
        ;
        ; arg forms aren't top-level, and so remain unchanged
        ((funcall? pgm) (cons 'tailcall (cdr pgm)))

        ; primcall args aren't tailcalls, so we don't need to do anything
        ((primitive-call? pgm) pgm)

        ; closures can't have full forms as arguments
        ((closure? pgm) pgm)
      ))

    (define (annotate-label lbl-form)
      ; annotate contents of label, but leave the wrapper alone
      (case (caadr lbl-form)
        ((code) ; code label
          (let ((code-form (cadr lbl-form)))
            `(,(car lbl-form)
               (code ,(cadr code-form)  ; args
                     ,(caddr code-form) ; free vars
                     ,(annotate (cadddr code-form)))))) ; annotated body

        (else lbl-form)
        )
      )

    `(labels ,(map annotate-label label-forms) ,body-form)))

(define (precompile-transform program)
  (xform-tailcall-opt
    (xform-labels (car (xform-annotate-free-vars program '())))))

Now we need to change compile-expr to handle the new kind of form. Since the generated code for tailcall and funcall are very similar, we can use compile-funcall to handle both of them, and just add a parameter for whether or not to perform tail-call optimization.

(define (tailcall? x) (eq? (car x) 'tailcall))

(define (compile-funcall e si env is-tailcall)
  ...)

(define (compile-expr e si env)
  (cond
    ...
    ((funcall? e) (compile-funcall e si env #f))
    ((tailcall? e) (compile-funcall e si env #t))
    ...
    ))

Once compile-expr knows how to handle the new type of form, we just have to implement the tail-call operation itself. Implementing tail calls is very similar to implementing normal function calls; the only differences are:

Since we're essentially replacing the words in our stack cells with the outputs of the argument evaluation, we also have to make sure we copy upwards along the stack. That way, even if the callee has more arguments than the caller, we can still ensure that the evaluated arguments are copied correctly, since the args stored lower in the stack will have already been copied by the time they get overwritten by the callee's args.

(define (compile-funcall e si env is-tailcall)
  (let* (...)
    ...
    (if is-tailcall
        (begin ; tailcall - copy args down and jump
          ; copy args
          (for-each
            (lambda (arg tgt)
              (begin (emit "movl ~a(%esp), %eax" (cadr arg))
                     (emit "movl %eax, ~a(%esp)" (- (- wordsize)
                                                    (* wordsize tgt)))))
            args (iota (length args)))

          (emit "jmp *(%edx)"))
        (begin ; no tailcall - use normal mechanism
          ...))
  ))

I'll omit the demonstration here, since we can't actually show the usefulness of this feature yet. Our let bindings aren't visible inside their own definitions and we don't have define, so we can't easily define the kind of recursive functions that would benefit from tail-call optimizations. Regardless, our Scheme can now define and call functions, pass functions to other higher-order functions, and other similar capabilities you'd expect from a functional programming language. Next time, we'll add support for complex constants using the quote form and start adding support for higher-level forms like let* or cond.