Macros

Go to the first, previous, next, last section, table of contents.

Inlab Scheme implements hygienic macros differently by introducing syntax-lambda expressions.

Macros

This appendix describes an extension to Scheme that allows programs to define and use new derived expression types. A derived expression type that has been defined using this extension is called a macro.

Derived expression types introduced using this extension have the syntax

(<keyword> <datum>*)
where <keyword> is an identifier that uniquely determines the expression type. This identifier is called the syntactic keyword, or simply keyword, of the macro. The number of the <datum>s, and their syntax, depends on the expression type.

Each instance of a macro is called a use of the macro. The set of rules, or more generally the procedure, that specifies how a use of a macro is transcribed into a more primitive expression is called the transformer of the macro.

The extension described here consists of three parts:

  • A set of expressions used to establish that certain identifiers are macro keywords, associate them with macro transformers, and control the scope within which a macro is defined,
  • a convenient pattern language that makes it easy to write transformers for most macros, and
  • a compatible low-level macro facility for writing macro transformers that cannot be expressed by the pattern language.

With this extension, there are no reserved identifiers. The syntactic keyword of a macro may shadow variable bindings, and local variable bindings may shadow keyword bindings. All macros defined using the pattern language are "hygienic" and "referentially transparent":

  • If a macro transformer inserts a binding for an identifier (variable or keyword), the identifier will in effect be renamed throughout its scope to avoid conflicts with other identifiers.
  • If a macro transformer inserts a free reference to an identifier, the reference refers to the binding that was visible where the transformer was specified, regardless of any local bindings that may surround the use of the macro.

This appendix is divided into three major sections. The first section describes the expressions and definitions used to introduce macros, i.e. to bind identifiers to macro transformers.

The second section describes the pattern language. This pattern language is sufficient to specify most macro transformers, including those for all the derived expression types from section Derived expression types. The primary limitation of the pattern language is that it is thoroughly hygienic, and thus cannot express macros that bind identifiers implicitly.

The third section describes a low-level macro facility that could be used to implement the pattern language described in the second section. This low-level facility is also capable of expressing non-hygienic macros and other macros whose transformers cannot be described by the pattern language, and is important as an example of a more powerful facility that can co-exist with the high-level pattern language.

The particular low-level facility described in the third section is but one of several low-level facilities that have been designed and implemented to complement the pattern language described in the second section. The design of such low-level macro facilities remains an active area of research, and descriptions of alternative low-level facilities will be published in subsequent documents.

Binding syntactic keywords

Define-syntax, let-syntax, and letrec-syntax are analogous to define, let, and letrec, but they bind syntactic keywords to macro transformers instead of binding variables to locations that contain values. Furthermore, there is no define-syntax analogue of the internal definitions described in section Internal definitions.

Rationale: As discussed below, the syntax and scope rules for definitions give rise to syntactic ambiguities when syntactic keywords are not reserved. Further ambiguities would arise if define-syntax were permitted at the beginning of a <body>, with scope rules analogous to those for internal definitions.

These new expression types and the pattern language described in section Pattern language are added to Scheme by augmenting the BNF in section Formal syntax with the following new productions. Note that the identifier ... used in some of these productions is not a metasymbol.

<expression> ==> <macro use>
     | <macro block>

<macro use> ==> (<keyword> <datum>*) <keyword> ==> <identifier>

<macro block> ==>
      (let-syntax (<syntax spec>*) <body>)
     | (letrec-syntax (<syntax spec>*) <body>)
<syntax spec> ==> (<keyword> <transformer spec>)
<transformer spec> ==>
      (syntax-rules (<identifier>*) <syntax rule>*)
<syntax rule> ==> (<pattern> <template>)
<pattern> ==> <pattern identifier>
     | (<pattern>*)
     | (<pattern>+ . <pattern>)
     | (<pattern>* <pattern> <ellipsis>)
     | <pattern datum>
<pattern datum> ==> <vector>
     | <string>
     | <character>
     | <boolean>
     | <number>
<template> ==> <pattern identifier>
     | (<template element>*)
     | (<template element>+ . <template>)
     | <template datum>
<template element> ==> <template>
     | <template> <ellipsis>
<template datum> ==> <pattern datum>
<pattern identifier> ==> <any identifier except ...>
<ellipsis> ==> <the identifier ...>
<command or definition> ==> <syntax definition>
<syntax definition> ==>
      (define-syntax <keyword> <transformer spec>)
     | (begin <syntax definition>*)

Although macros may expand into definitions in any context that permits definitions, it is an error for a definition to shadow a syntactic keyword whose meaning is needed to determine whether some definition in the group of top-level or internal definitions that contains the shadowing definition is in fact a definition, or is needed to determine the boundary between the group and the expressions that follow the group. For example, the following are errors:

(define define 3)
(begin (define begin list))
(let-syntax
  ((foo (syntax-rules ()
          ((foo (proc args ...) body ...)
           (define proc
             (lambda (args ...)
               body ...))))))
  (let ((x 3))
    (foo (plus x y) (+ x y))
    (define foo x)
    (plus foo x)))

syntax: syntax {let-syntax} <bindings> <body>

Syntax: <Bindings> should have the form

((<keyword> <transformer spec>) ...)
Each <keyword> is an identifier, each <transformer spec> is an instance of syntax-rules, and <body> should be a sequence of one or more expressions. It is an error for a <keyword> to appear more than once in the list of keywords being bound.

Semantics: The <body> is expanded in the syntactic environment obtained by extending the syntactic environment of the let-syntax expression with macros whose keywords are the <keyword>s, bound to the specified transformers. Each binding of a <keyword> has <body> as its region.

(let-syntax ((when (syntax-rules ()
                     ((when test stmt1 stmt2 ...)
                      (if test
                          (begin stmt1
                                 stmt2 ...))))))
  (let ((if #t))
    (when if (set! if 'now))
    if))                    =>  now
(let ((x 'outer))
  (let-syntax ((m (syntax-rules () ((m) x))))
    (let ((x 'inner))
      (m))))                =>  outer

syntax: syntax {letrec-syntax} <bindings> <body>

Syntax: Same as for let-syntax.

Semantics: The <body> is expanded in the syntactic environment obtained by extending the syntactic environment of the letrec-syntax expression with macros whose keywords are the <keyword>s, bound to the specified transformers. Each binding of a <keyword> has the <bindings> as well as the <body> within its region, so the transformers can transcribe expressions into uses of the macros introduced by the letrec-syntax expression.

(letrec-syntax
  ((or (syntax-rules ()
         ((or) #f)
         ((or e) e)
         ((or e1 e2 ...)
          (let ((temp e1))
            (if temp
                temp
                (or e2 ...)))))))
  (let ((x #f)
        (y 7)
        (temp 8)
        (let odd?)
        (if even?))
    (or x
        (let temp)
        (if y)
        y)))                =>  7

syntax: define-syntax <keyword> <transformer spec>

Syntax: The <keyword> is an identifier, and the <transformer spec> should be an instance of syntax-rules.

Semantics: The top-level syntactic environment is extended by binding the <keyword> to the specified transformer.

(define-syntax let*
  (syntax-rules ()
    ((let* () body1 body2 ...)
     (let () body1 body2 ...))
    ((let* ((name1 val1) (name2 val2) ...)
       body1 body2 ...)
     (let ((name1 val1))
       (let* ((name2 val2) ...)
         body1 body2 ...)))))

Pattern language

syntax: syntax-rules <literals> <syntax rule> ...

Syntax: <Literals> is a list of identifiers, and each <syntax rule> should be of the form

(<pattern> <template>)
where the <pattern> and <template> are as in the grammar above.

Semantics: An instance of syntax-rules produces a new macro transformer by specifying a sequence of hygienic rewrite rules. A use of a macro whose keyword is associated with a transformer specified by syntax-rules is matched against the patterns contained in the <syntax rule>s, beginning with the leftmost <syntax rule>. When a match is found, the macro use is transcribed hygienically according to the template.

Each pattern begins with the keyword for the macro. This keyword is not involved in the matching and is not considered a pattern variable or literal identifier.

Rationale: The scope of the keyword is determined by the expression or syntax definition that binds it to the associated macro transformer. If the keyword were a pattern variable or literal identifier, then the template that follows the pattern would be within its scope regardless of whether the keyword were bound by let-syntax or by letrec-syntax.

An identifier that appears in the pattern of a <syntax rule> is a pattern variable, unless it is the keyword that begins the pattern, is listed in <literals>, or is the identifier "...". Pattern variables match arbitrary input elements and are used to refer to elements of the input in the template. It is an error for the same pattern variable to appear more than once in a <pattern>.

Identifiers that appear in <literals> are interpreted as literal identifiers to be matched against corresponding subforms of the input. A subform in the input matches a literal identifier if and only if it is an identifier and either both its occurrence in the macro expression and its occurrence in the macro definition have the same lexical binding, or the two identifiers are equal and both have no lexical binding.

A subpattern followed by ... can match zero or more elements of the input. It is an error for ... to appear in <literals>. Within a pattern the identifier ... must follow the last element of a nonempty sequence of subpatterns.

More formally, an input form F matches a pattern P if and only if:

  • P is a pattern variable; or
  • P is a literal identifier and F is an identifier with the same binding; or
  • P is a pattern list (P1 ... Pn) and F is a list of n forms that match P1 through Pn, respectively; or
  • P is an improper pattern list (P1 P2 ... Pn . Q) and F is a list or improper list of n or more forms that match P1 through Pn, respectively, and whose nth "cdr" matches Q; or
  • P is of the form (P1 ... Pn Q <ellipsis>) where <ellipsis> is the identifier ... and F is a proper list of at least n elements, the first n of which match P1 through Pn, respectively, and each remaining element of F matches Q; or
  • P is a pattern datum and F is equal to P in the sense of the equal? procedure.

It is an error to use a macro keyword, within the scope of its binding, in an expression that does not match any of the patterns.

When a macro use is transcribed according to the template of the matching <syntax rule>, pattern variables that occur in the template are replaced by the subforms they match in the input. Pattern variables that occur in subpatterns followed by one or more instances of the identifier ... are allowed only in subtemplates that are followed by as many instances of .... They are replaced in the output by all of the subforms they match in the input, distributed as indicated. It is an error if the output cannot be built up as specified.

Identifiers that appear in the template but are not pattern variables or the identifier ... are inserted into the output as literal identifiers. If a literal identifier is inserted as a free identifier then it refers to the binding of that identifier within whose scope the instance of syntax-rules appears. If a literal identifier is inserted as a bound identifier then it is in effect renamed to prevent inadvertent captures of free identifiers.

(define-syntax let
  (syntax-rules ()
    ((let ((name val) ...) body1 body2 ...)
     ((lambda (name ...) body1 body2 ...)
      val ...))
    ((let tag ((name val) ...) body1 body2 ...)
     ((letrec ((tag (lambda (name ...)
                      body1 body2 ...)))
        tag)
      val ...))))
(define-syntax cond
  (syntax-rules (else =>)
    ((cond (else result1 result2 ...))
     (begin result1 result2 ...))
    ((cond (test => result))
     (let ((temp test))
       (if temp (result temp))))
    ((cond (test => result) clause1 clause2 ...)
     (let ((temp test))
       (if temp
           (result temp)
           (cond clause1 clause2 ...))))
    ((cond (test)) test)
    ((cond (test) clause1 clause2 ...)
     (or test (cond clause1 clause2 ...)))
    ((cond (test result1 result2 ...))
     (if test (begin result1 result2 ...)))
    ((cond (test result1 result2 ...)
           clause1 clause2 ...)
     (if test
         (begin result1 result2 ...)
         (cond clause1 clause2 ...)))))
(let ((=> #f))
  (cond (#t => 'ok)))       =>  ok

The last example is not an error because the local variable => is renamed in effect, so that its use is distinct from uses of the top level identifier => that the transformer for cond looks for. Thus, rather than expanding into

(let ((=> #f))
  (let ((temp #t))
    (if temp ('ok temp))))

which would result in an invalid procedure call, it expands instead into

(let ((=> #f))
  (if #t (begin => 'ok)))

A compatible low-level macro facility

Although the pattern language provided by syntax-rules is the preferred way to specify macro transformers, other low-level facilities may be provided to specify more complex macro transformers. In fact, syntax-rules can itself be defined as a macro using the low-level facilities described in this section.

The low-level macro facility described here introduces syntax as a new syntactic keyword analogous to quote, and allows a <transformer spec> to be any expression. This is accomplished by adding the following two productions to the productions in section Formal syntax and in section Binding syntactic keywords above.

<expression> ==> (syntax <datum>)
<transformer spec> ==> <expression>

The low-level macro system also adds the following procedures:

unwrap-syntax          identifier->symbol
identifier?            generate-identifier
free-identifier=?      construct-identifier
bound-identifier=?

Evaluation of a program proceeds in two logical steps. First the program is converted into an intermediate language via macro-expansion, and then the result of macro expansion is evaluated. When it is necessary to distinguish the second stage of this process from the full evaluation process, it is referred to as "execution."

Syntax definitions, either lexical or global, cause an identifier to be treated as a keyword within the scope of the binding. The keyword is associated with a transformer, which may be created implicitly using the pattern language of syntax-rules or explicitly using the low-level facilities described below.

Since a transformer spec must be fully evaluated during the course of expansion, it is necessary to specify the environment in which this evaluation takes place. A transformer spec is expanded in the same environment as that in which the program is being expanded, but is executed in an environment that is distinct from the environment in which the program is executed. This execution environment distinction is important only for the resolution of global variable references and assignments. In what follows, the environment in which transformers are executed is called the standard transformer environment and is assumed to be a standard Scheme environment.

Since part of the task of hygienic macro expansion is to resolve identifier references, the fact that transformers are expanded in the same environment as the program means that identifier bindings in the program can shadow identifier uses within transformers. Since variable bindings in the program are not available at the time the transformer is executed, it is an error for a transformer to reference or assign them. However, since keyword bindings are available during expansion, lexically visible keyword bindings from the program may be used in macro uses in a transformer.

When a macro use is encountered, the macro transformer associated with the macro keyword is applied to a representation of the macro expression. The result returned by the macro transformer replaces the original expression and is expanded once again. Thus macro expansions may themselves be or contain macro uses.

The syntactic representation passed to a macro transformer encapsulates information about the structure of the represented form and the bindings of the identifiers it contains. These syntax objects can be traversed and examined using the procedures described below. The output of a transformer may be built up using the usual Scheme list constructors, combining pieces of the input with new syntactic structures.

syntax: syntax <datum>

Syntax: The <datum> may be any external representation of a Scheme object.

Semantics: Syntax is the syntactic analogue of quote. It creates a syntactic representation of <datum> that, like an argument to a transformer, contains information about the bindings for identifiers contained in <datum>. The binding for an identifier introduced by syntax is the closest lexically visible binding. All variables and keywords introduced by transformers must be created by syntax. It is an error to insert a symbol in the output of a transformation procedure unless it is to be part of a quoted datum.

(symbol? (syntax x))        => #f
(let-syntax ((car (lambda (x) (syntax car))))
  ((car) '(0)))             => 0
(let-syntax
  ((quote-quote
    (lambda (x) (list (syntax quote) 'quote))))
  (quote-quote))            => quote
(let-syntax
  ((quote-quote
    (lambda (x) (list 'quote 'quote))))
  (quote-quote))            => error

The second quote-quote example results in an error because two raw symbols are being inserted in the output. The quoted quote in the first quote-quote example does not cause an error because it will be a quoted datum.

(let-syntax ((quote-me
              (lambda (x)
                (list (syntax quote) x))))
  (quote-me please))        => (quote-me please)
(let ((x 0))
  (let-syntax ((alpha (lambda (e) (syntax x))))
    (alpha)))               => 0
(let ((x 0))
  (let-syntax ((alpha (lambda (x) (syntax x))))
    (alpha)))               => error
(let-syntax ((alpha
              (let-syntax ((beta
                            (syntax-rules ()
                              ((beta) 0))))
                (lambda (x) (syntax (beta))))))
  (alpha))                  => error

The last two examples are errors because in both cases a lexically bound identifier is placed outside of the scope of its binding. In the first case, the variable x is placed outside its scope. In the second case, the keyword beta is placed outside its scope.

(let-syntax ((alpha (syntax-rules ()
                      ((alpha) 0))))
  (let-syntax ((beta (lambda (x) (alpha))))
    (beta)))                => 0
(let ((list 0))
  (let-syntax ((alpha (lambda (x) (list 0))))
    (alpha)))               => error

The last example is an error because the reference to list in the transformer is shadowed by the lexical binding for list. Since the expansion process is distinct from the execution of the program, transformers cannot reference program variables. On the other hand, the previous example is not an error because definitions for keywords in the program do exist at expansion time.

Note: It has been suggested that #'<datum> and #`<datum> would be felicitous abbreviations for (syntax <datum>) and (quasisyntax <datum>), respectively, where quasisyntax, which is not described in this appendix, would bear the same relationship to syntax that quasiquote bears to quote.

procedure: identifier? syntax-object

Returns #t if syntax-object represents an identifier, otherwise returns #f.

(identifier? (syntax x))    => #t
(identifier? (quote x))     => #f
(identifier? 3)             => #f

procedure: unwrap-syntax syntax-object

If syntax-object is an identifier, then it is returned unchanged. Otherwise unwrap-syntax converts the outermost structure of syntax-object into a data object whose external representation is the same as that of syntax-object. The result is either an identifier, a pair whose car and cdr are syntax objects, a vector whose elements are syntax objects, an empty list, a string, a boolean, a character, or a number.

(identifier? (unwrap-syntax (syntax x)))
                            => #t
(identifier? (car (unwrap-syntax (syntax (x)))))
                            => #t
(unwrap-syntax (cdr (unwrap-syntax (syntax (x)))))
                            => ()

procedure: free-identifier=? id1 id2

Returns #t if the original occurrences of id1 and id2 have the same binding, otherwise returns #f. free-identifier=? is used to look for a literal identifier in the argument to a transformer, such as else in a cond clause. A macro definition for syntax-rules would use free-identifier=? to look for literals in the input.

(free-identifier=? (syntax x) (syntax x))
                            => #t
(free-identifier=? (syntax x) (syntax y))
                            => r#f
(let ((x (syntax x)))
  (free-identifier=? x (syntax x)))
                            => #f
(let-syntax
  ((alpha
    (lambda (x)
      (free-identifier=? (car (unwrap-syntax x))
                         (syntax alpha)))))
  (alpha))                  => #f
(letrec-syntax
  ((alpha
    (lambda (x)
      (free-identifier=? (car (unwrap-syntax x))
                         (syntax alpha)))))
  (alpha))                  => #t

procedure: bound-identifier=? id1 id2

Returns #t if a binding for one of the two identifiers id1 and id2 would shadow free references to the other, otherwise returns #f. Two identifiers can be free-identifier=? without being bound-identifier=? if they were introduced at different stages in the expansion process. Bound-identifier=? can be used, for example, to detect duplicate identifiers in bound-variable lists. A macro definition of syntax-rules would use bound-identifier=? to look for pattern variables from the input pattern in the output template.

(bound-identifier=? (syntax x) (syntax x))
                            => #t
(letrec-syntax
  ((alpha
    (lambda (x)
      (bound-identifier=? (car (unwrap-syntax x))
                          (syntax alpha)))))
  (alpha))                  => #f

procedure: identifier->symbol id

Returns a symbol representing the original name of id. Identifier->symbol is used to examine identifiers that appear in literal contexts, i.e., identifiers that will appear in quoted structures.

(symbol? (identifier->symbol (syntax x)))
                            => #t
(identifier->symbol (syntax x))
                            => x

procedure: generate-identifier

procedure: generate-identifier symbol

Returns a new identifier. The optional argument to generate-identifier specifies the symbolic name of the resulting identifier. If no argument is supplied the name is unspecified.

Generate-identifier is used to introduce bound identifiers into the output of a transformer. Since introduced bound identifiers are automatically renamed, generate-identifier is necessary only for distinguishing introduced identifiers when an indefinite number of them must be generated by a macro.

The optional argument to generate-identifier specifies the symbolic name of the resulting identifier. If no argument is supplied the name is unspecified. The procedure identifier->symbol reveals the symbolic name of an identifier.

(identifier->symbol (generate-identifier 'x))
                            => x
(bound-identifier=? (generate-identifier 'x)
                    (generate-identifier 'x))
                            => #f
(define-syntax set*!
  ; (set*! (<identifier> <expression>) ...)
  (lambda (x)
    (letrec
      ((unwrap-exp
        (lambda (x)
          (let ((x (unwrap-syntax x)))
            (if (pair? x)
                (cons (car x)
                      (unwrap-exp (cdr x)))
                x)))))
      (let ((sets (map unwrap-exp
                       (cdr (unwrap-exp x)))))
        (let ((ids (map car sets))
              (vals (map cadr sets))
              (temps (map (lambda (x)
                            (generate-identifier))
                          sets)))
          `(,(syntax let) ,(map list temps vals)
            ,@(map (lambda (id temp)
                     `(,(syntax set!) ,id ,temp))
                   ids
                   temps)
            #f))))))

procedure: construct-identifier id symbol

Creates and returns an identifier named by symbol that behaves as if it had been introduced where the identifier id was introduced.

Construct-identifier is used to circumvent hygiene by creating an identifier that behaves as though it had been implicitly present in some expression. For example, the transformer for a structure definition macro might construct the name of a field accessor that does not explicitly appear in a use of the macro, but can be constructed from the names of the structure and the field. If a binding for the field accessor were introduced by a hygienic transformer, then it would be renamed automatically, so that the introduced binding would fail to capture any references to the field accessor that were present in the input and were intended to be within the scope of the introduced binding.

Another example is a macro that implicitly binds exit:

(define-syntax loop-until-exit
  (lambda (x)
    (let ((exit (construct-identifier
                 (car (unwrap-syntax x))
                 'exit))
          (body (car (unwrap-syntax
                      (cdr (unwrap-syntax x))))))
      `(,(syntax call-with-current-continuation)
        (,(syntax lambda)
         (,exit)
         (,(syntax letrec)
          ((,(syntax loop)
            (,(syntax lambda) ()
               ,body
               (,(syntax loop)))))
          (,(syntax loop))))))))
(let ((x 0) (y 1000))
  (loop-until-exit
   (if (positive? y)
       (begin (set! x (+ x 3))
              (set! y (- y 1)))
       (exit x))))          => 3000

Acknowledgements

The extension described in this appendix is the most sophisticated macro facility that has ever been proposed for a block-structured programming language. The main ideas come from Eugene Kohlbecker's PhD thesis on hygienic macro expansion [KOHLBECKER86], written under the direction of Dan Friedman [HYGIENIC], and from the work by Alan Bawden and Jonathan Rees on syntactic closures [BAWDEN88]. Pattern-directed macro facilities were popularized by Kent Dybvig's non-hygienic implementation of extend-syntax [DYBVIG87].

At the 1988 meeting of this report's authors at Snowbird, a macro committee consisting of Bawden, Rees, Dybvig, and Bob Hieb was charged with developing a hygienic macro facility akin to extend-syntax but based on syntactic closures. Chris Hanson implemented a prototype and wrote a paper on his experience, pointing out that an implementation based on syntactic closures must determine the syntactic roles of some identifiers before macro expansion based on textual pattern matching can make those roles apparent. William Clinger observed that Kohlbecker's algorithm amounts to a technique for delaying this determination, and proposed a more efficient version of Kohlbecker's algorithm. Pavel Curtis spoke up for referentially transparent local macros. Rees merged syntactic environments with the modified Kohlbecker's algorithm and implemented it all, twice [MACROSTHATWORK].

Dybvig and Hieb designed and implemented the low-level macro facility described above. Recently Hanson and Bawden have extended syntactic closures to obtain an alternative low-level macro facility. The macro committee has not endorsed any particular low-level facility, but does endorse the general concept of a low-level facility that is compatible with the high-level pattern language described in this appendix.

Several other people have contributed by working on macros over the years. Hal Abelson contributed by holding this report hostage to the appendix on macros.


Go to the first, previous, next, last section, table of contents.