Formal Syntax and Semantics

Go to the first, previous, next, last section, table of contents.

Formal syntax and semantics

This chapter provides formal descriptions of what has already been described informally in previous chapters of this report.

Formal syntax

This section provides a formal syntax for Scheme written in an extended BNF. The syntax for the entire language, including features which are not essential, is given here.

All spaces in the grammar are for legibility. Case is insignificant; for example, #x1A and #X1a are equivalent. <empty> stands for the empty string.

The following extensions to BNF are used to make the description more concise: <thing>* means zero or more occurrences of <thing>; and <thing>+ means at least one <thing>.

Lexical structure

This section describes how individual tokens (identifiers, numbers, etc.) are formed from sequences of characters. The following sections describe how expressions and programs are formed from sequences of tokens.

<Intertoken space> may occur on either side of any token, but not within a token.

Tokens which require implicit termination (identifiers, numbers, characters, and dot) may be terminated by any <delimiter>, but not necessarily by anything else.

<token> ==> <identifier> | <boolean> | <number>
     | <character> | <string>
     | ( | ) | #( | ' | `{} | , | , | .
<delimiter> ==> <whitespace> | ( | ) | " | ;
<whitespace> ==> <space or newline>
<comment> ==> ; \= <all subsequent characters up to a line break>
<atmosphere> ==> <whitespace> | <comment>
<intertoken space> ==> <atmosphere>*

<identifier> ==> <initial> <subsequent>*
     | <peculiar identifier>
<initial> ==> <letter> | <special initial>
<letter> ==> a | b | c | ... | z
<special initial> ==> ! | \$ | \% | \verb"&" | * | / | : | < | =
     | > | ? | \verb" " | \verb"_" | \verb"^"
<subsequent> ==> <initial> | <digit>
     | <special subsequent>
<digit> ==> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<special subsequent> ==> . | + | -
<peculiar identifier> ==> + | - | ...
<syntactic keyword> ==> <expression keyword>
     | else | => | define
     | unquote | unquote-splicing
<expression keyword> ==> quote | lambda | if
     | set! | begin | cond | and | or | case
     | let | let* | letrec | do | delay
     | quasiquote

<boolean> ==> #t | #f
<character> ==> #\ <any character>
     | #\ <character name>
<character name> ==> space | newline
<string> ==> " <string element>* "
<string element> ==> <any character other than " or \>
     | \" | \\

<number> ==> <num 2> | <num 8>
     | <num 10> | <num 16>

The following rules for <num R>, <complex R>, <real R>, <ureal R>, <uinteger R>, and <prefix R> should be replicated for R = 2, 8, 10, and 16. There are no rules for <decimal 2>, <decimal 8>, and <decimal 16>, which means that numbers containing decimal points or exponents must be in decimal radix.

<num R> ==> <prefix R> <complex R>
<complex R> ==> <real R> | <real R>  <real R>
     | <real R> + <ureal R> i | <real R> - <ureal R> i
     | <real R> + i | <real R> - i
     | + <ureal R> i | - <ureal R> i | + i | - i
<real R> ==> <sign> <ureal R>
<ureal R> ==> <uinteger R>
     | <uinteger R> / <uinteger R>
     | <decimal R>
<decimal 10> ==> <uinteger 10> <suffix>
     | . <digit 10>+ #* <suffix>
     | <digit 10>+ . <digit 10>* #* <suffix>
     | <digit 10>+ #+ . #* <suffix>
<uinteger R> ==> <digit R>+ #*
<prefix R> ==> <radix R> <exactness>
     | <exactness> <radix R>

<suffix> ==> <empty>
     | <exponent marker> <sign> <digit 10>+
<exponent marker> ==> e | s | f | d | l
<sign> ==> <empty>  | + |  -
<exactness> ==> <empty> | #i | #e

<radix 2> ==> #b
<radix 8> ==> #o
<radix 10> ==> <empty> | #d
<radix 16> ==> #x
<digit 2> ==> 0 | 1
<digit 8> ==> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
<digit 10> ==> <digit>
<digit 16> ==> <digit 10> | a | b | c | d | e | f

External representation

<Datum> is what the read procedure (section Input) successfully parses. Note that any string that parses as an <expression> will also parse as a <datum>.

<datum> ==> <simple datum> | <compound datum>
<simple datum> ==> <boolean> | <number>
     | <character> | <string> |  <symbol>
<symbol> ==> <identifier>
<compound datum> ==> <list> | <vector>
<list> ==> (<datum>*) | (<datum>+ . <datum>)
     | <abbreviation>
<abbreviation> ==> <abbrev prefix> <datum>
<abbrev prefix> ==> ' | ` | , | ,@
<vector> ==> #(<datum>*)

Expressions

<expression> ==> <variable>
     | <literal>
     | <procedure call>
     | <lambda expression>
     | <conditional>
     | <assignment>
     | <derived expression>
<literal> ==> <quotation> | <self-evaluating>
<self-evaluating> ==> <boolean> | <number>
     | <character> | <string>
<quotation> ==> '<datum> | (quote <datum>)
<procedure call> ==> (<operator> <operand>*)
<operator> ==> <expression>
<operand> ==> <expression>
<lambda expression> ==> (lambda <formals> <body>)
<formals> ==> (<variable>*) | <variable>
     | (<variable>+ . <variable>)
<body> ==> <definition>* <sequence>
<sequence> ==> <command>* <expression>
<command> ==> <expression>
<conditional> ==> (if <test> <consequent> <alternate>)
<test> ==> <expression>
<consequent> ==> <expression>
<alternate> ==> <expression> | <empty>

<assignment> ==> (set! <variable> <expression>)

<derived expression> ==>
      (cond <cond clause>+)
     | (cond <cond clause>* (else <sequence>))
     | (case <expression>
        <case clause>+)
     | (case <expression>
        <case clause>*
        (else <sequence>))
     | (and <test>*)
     | (or <test>*)
     | (let (<binding spec>*) <body>)
     | (let <variable> (<binding spec>*) <body>)
     | (let* (<binding spec>*) <body>)
     | (letrec (<binding spec>*) <body>)
     | (begin <sequence>)
     | (do (<iteration spec>*)
          (<test> <sequence>)
        <command>*)
     | (delay <expression>)
     | <quasiquotation>
<cond clause> ==> (<test> <sequence>)
     | (<test>)
     | (<test> => <recipient>)
<recipient> ==> <expression>
<case clause> ==> ((<datum>*) <sequence>)

<binding spec> ==> (<variable> <expression>)
<iteration spec> ==> (<variable> <init> <step>)
     | (<variable> <init>)
<init> ==> <expression>
<step> ==> <expression>

Quasiquotations

The following grammar for quasiquote expressions is not context-free. It is presented as a recipe for generating an infinite number of production rules. Imagine a copy of the following rules for D = 1, 2, 3, .... D keeps track of the nesting depth.

<quasiquotation> ==> <quasiquotation 1>
<template 0> ==> <expression>
<quasiquotation D> ==> `<template D>
     | (quasiquote <template D>)
<template D> ==> <simple datum>
     | <list template D>
     | <vector template D>
     | <unquotation D>
<list template D> ==> (<template or splice D>*)
     | (<template or splice D>+ . <template D>)
     | '<template D>
     | <quasiquotation D+1>
<vector template D> ==> #(<template or splice D>*)
<unquotation D> ==> ,<template D-1>
     | (unquote <template D-1>)
<template or splice D> ==> <template D>
     | <splicing unquotation D>
<splicing unquotation D> ==> ,<template D-1>
     | (unquote-splicing <template D-1>)

In <quasiquotation>s, a <list template D> can sometimes be confused with either an <unquotation D> or a <splicing unquotation D>. The interpretation as an <unquotation> or <splicing unquotation D> takes precedence.

Programs and definitions

<program> ==> <command or definition>*
<command or definition> ==> <command> | <definition>
<definition> ==> (define <variable> <expression>)
     | (define (<variable> <def formals>) <body>)
     | (begin <definition>*)
<def formals> ==> <variable>*
     | <variable>+ . <variable>

Formal semantics

This section provides a formal denotational semantics for the primitive expressions of Scheme and selected built-in procedures. The concepts and notation used here are described in [STOY77].

Note: The formal semantics section was written in LaTeX which is incompatible with TeXinfo. See pages 34--36 of [R4RS], the original document from which this was derived.

Abstract syntax

Domain equations

Semantic functions

Auxiliary functions

derived expression types

This section gives rewrite rules for the derived expression types. By the application of these rules, any expression can be reduced to a semantically equivalent expression in which only the primitive expression types (literal, variable, call, lambda, if, set!) occur.

(cond (<test> <sequence>)
      <clause 2> ...)
  ==  (if <test>
          (begin <sequence>)
          (cond <clause 2> ...))
(cond (<test>)
      <clause 2> ...)
  ==  (or <test> (cond <clause 2> ...))
(cond (<test> => <recipient>)
      <clause 2> ...)
  ==  (let ((test-result <test>)
            (thunk2 (lambda () <recipient>))
            (thunk3 (lambda () (cond <clause 2> ...))))
        (if test-result
            ((thunk2) test-result)
            (thunk3)))
(cond (else <sequence>))
  ==  (begin <sequence>)
(cond)
  ==  <some expression returning an unspecified value>
(case <key>
  ((d1 ...) <sequence>)
  ...)
  ==  (let ((key <key>)
            (thunk1 (lambda () <sequence>))
            ...)
        (cond ((<memv> key '(d1 ...)) (thunk1))
               ...))
(case <key>
  ((d1 ...) <sequence>)
  ...
  (else f1 f2 ...))
  ==  (let ((key <key>)
            (thunk1 (lambda () <sequence>))
            ...
            (elsethunk (lambda () f1 f2 ...)))
        (cond ((<memv> key '(d1 ...)) (thunk1))
               ...
              (else (elsethunk))))

where <memv> is an expression evaluating to the memv procedure.

(and)              ==  #t
(and <test>)        ==  <test>
(and <test 1> <test 2> ...)
  ==  (let ((x <test 1>)
            (thunk (lambda () (and <test 2> ...))))
        (if x (thunk) x))
(or)               ==  #f
(or <test>)         ==  <test>
(or <test 1> <test 2> ...)
  ==  (let ((x <test 1>)
            (thunk (lambda () (or <test 2> ...))))
        (if x x (thunk)))
(let ((<variable 1> <init 1>) ...)
  <body>)
  ==  ((lambda (<variable 1> ...) <body>) <init 1> ...)
(let* () <body>)
  ==  ((lambda () <body>))
(let* ((<variable 1> <init 1>)
       (<variable 2> <init 2>)
       ...)
  <body>)
  ==  (let ((<variable 1> <init 1>))
        (let* ((<variable 2> <init 2>)
               ...)
          <body>))
(letrec ((<variable 1> <init 1>)
         ...)
  <body>)
  ==  (let ((<variable 1> <undefined>)
            ...)
         (let ((<temp 1> <init 1>)
               ...)
           (set! <variable 1> <temp 1>)
           ...)
         <body>)

where <temp 1>, <temp 2>, ... are variables, distinct from <variable 1>, ..., that do not free occur in the original <init> expressions, and <undefined> is an expression which returns something that when stored in a location makes it an error to try to obtain the value stored in the location. (No such expression is defined, but one is assumed to exist for the purposes of this rewrite rule.) The second let expression in the expansion is not strictly necessary, but it serves to preserve the property that the <init> expressions are evaluated in an arbitrary order.

(begin <sequence>)
  ==  ((lambda () <sequence>))

The following alternative expansion for begin does not make use of the ability to write more than one expression in the body of a lambda expression. In any case, note that these rules apply only if <sequence> contains no definitions.

(begin <expression>)  ==  <expression>
(begin <command> <sequence>)
  ==  ((lambda (ignore thunk) (thunk))
       <command>
       (lambda () (begin <sequence>)))
The following expansion for do is simplified by the assumption that no <step> is omitted. Any do expression in which a <step> is omitted can be replaced by an equivalent do expression in which the corresponding <variable> appears as the <step>.

(do ((<variable 1> <init 1> <step 1>)
     ...)
    (<test> <sequence>)
  <command 1> ...)
  ==  (letrec ((<loop>
                (lambda (<variable 1> ...)
                  (if <test>
                      (begin <sequence>)
                      (begin <command 1>
                             ...
                             (<loop> <step 1> ...))))))
        (<loop> <init 1> ...))

where <loop> is any variable which is distinct from <variable 1>, ..., and which does not occur free in the do expression.

(let <variable 0> ((<variable 1> <init 1>) ...)
  <body>)
  ==  ((letrec ((<variable 0> (lambda (<variable 1> ...)
                             <body>)))
          <variable 0>)
       <init 1> ...)

(delay <expression>) == (<make-promise> (lambda () <expression>))

where <make-promise> is an expression evaluating to some procedure which behaves appropriately with respect to the force procedure; see section Control features.


Go to the first, previous, next, last section, table of contents.