YCP Logo Assignment 4: Abstract Syntax Trees

Due: Tuesday, November 17th by 11:59 PM

Update, Oct 29th - link to Scheme lecture notes

Update, Nov 6th - additional methods must be added to the Token class, due date extended to Nov 17th

Update, Nov 10th - mention that quoted literals need not be handled yet

Getting Started

Copy the following files into the src/edu/ycp/cs340/scheme folder of your YCP_Scheme project in your Eclipse workspace:

ASTBuilder.java

ASTNode.java

ASTNodeType.java

Main.java

Symbol.java

TreeNode.java

TreePrinter.java

After you have copied these files, refresh the project by right-clicking on YCP_Scheme in the Package Explorer, and choosing Refresh.

Also, add the following methods to the Token class:

@Override
public Symbol getChild(int index) {
        throw new IndexOutOfBoundsException();
}

@Override
public int getNumChildren() {
        return 0;
}

@Override
public void addChild(Symbol child) {
        throw new UnsupportedOperationException("Can't add children to a Token");
}

Your Task

Your task is to complete the implementation of the ASTBuilder class, which translates parse trees of Scheme programs into Abstract Syntax Trees (ASTs). This class defines a method

public ASTNode build(Symbol parseTree)

which takes a node in a parse tree and returns an AST representing the parse tree. Code to translate a PROGRAM parse tree node into a PROGRAM AST node is provided.

An AST contains the same semantic information as a parse tree, but in a more concise form.

The ASTNode class is used to represent the nodes of the AST. It is very similar to the Symbol, Token, and Nonterminal interfaces/classes, which are used to represent nodes in the parse tree.

The ASTNodeType enumeration represents the various types of nodes in an AST. It defines the following values

Value Meaning
PROGRAM AST node representing the overall program
VAR_REF AST node representing a variable reference (an expression which is an identifier)
LITERAL AST node representing a literal integer, string, or boolean value
FUNCTION AST node representing a function (lambda expression)
FORMAL_PARAM AST node representing a formal parameter to a function
FUNCTION_APPLICATION AST node representing a function application expression
DEFINITION AST node representing a definition
IF AST node representing an if expression
LET AST node representing a let expression
LET_PAIR AST node representing a let pair
AND AST node representing an and expression
OR AST node representing an or expression
NOT AST node representing a not expression

Note: you do not need to handle quoted literals, expressions of the form

( quote atom )

or

( quote list )

Handling quoted literals will be an extra-credit option in the next assignment.

Example

Consider the following Scheme program (consisting of a single function definition):

(define fact
  (lambda (n)
    (if (= n 1)
        1
        (* n (- n 1)))))

The description of Assignment 3 showed a possible parse tree for this program, which is quite large and complicated.

Here is a textual representation of an AST for the same program:

PROGRAM
+--DEFINITION("fact")
   +--FUNCTION
      +--FORMAL_PARAM("n")
      +--IF
         +--FUNCTION_APPLICATION
         |  +--VAR_REF("=")
         |  +--VAR_REF("n")
         |  +--LITERAL(1)
         +--LITERAL(1)
         +--FUNCTION_APPLICATION
            +--VAR_REF("*")
            +--VAR_REF("n")
            +--FUNCTION_APPLICATION
               +--VAR_REF("-")
               +--VAR_REF("n")
               +--LITERAL(1)

Note how much less complicated the AST is compared to the parse tree.

Consider the following simple scheme program consisting of two definitions and a function application:

(define make-add (lambda (n) (lambda (x) (+ x n))))
(define add1 (make-add 1))
(add1 2)

This program could be represented by the following AST:

PROGRAM
+--DEFINITION("make-add")
|  +--FUNCTION
|     +--FORMAL_PARAM("n")
|     +--FUNCTION
|        +--FORMAL_PARAM("x")
|        +--FUNCTION_APPLICATION
|           +--VAR_REF("+")
|           +--VAR_REF("x")
|           +--VAR_REF("n")
+--DEFINITION("add1")
|  +--FUNCTION_APPLICATION
|     +--VAR_REF("make-add")
|     +--LITERAL(1)
+--FUNCTION_APPLICATION
   +--VAR_REF("add1")
   +--LITERAL(2)

Note that each top-level item is a direct child of the PROGRAM node in the AST.

Hints

This section contains hints on how to construct each kind of AST node.

It will probably be helpful to review the Scheme lecture notes in order to understand the semantics of the various features of the Scheme language.

PROGRAM

PROGRAM AST nodes are constructed from parse tree nodes representing the overall program.

An AST representing each top-level item in the program should be added as a direct child of the PROGRAM AST node.

[Code is provided to generate PROGRAM AST nodes.]

VAR_REF

VAR_REF AST nodes are constructed from IDENTIFIER tokens in the parse tree, when they are used as expressions.

When you construct a VAR_REF AST node, you should call the setValue method on it, passing the String containing the identifier of the variable being referred to. (This is the lexeme of the IDENTIFIER token.)

In order to get the lexeme of a token, you will need to cast the Symbol into a Token. This is safe as long as the symbol's symbol type is IDENTIFIER.

LITERAL

LITERAL AST nodes are constructed from INTEGER_LITERAL, STRING_LITERAL, and BOOLEAN_LITERAL tokens in the parse tree.

When you construct a LITERAL AST node, you should convert the lexeme of the token into an appropriate value. E.g., an INTEGER_LITERAL token's lexeme should be converted into an Integer object.

The Java method Integer.parseInt(String) is useful for converting a string containing a sequence of digit characters into an Integer value.

You will need to convert the lexeme of a BOOLEAN_LITERAL token into one of two Java values, either Boolean.TRUE or Boolean.FALSE, depending on the lexeme of the token. (I.e., a token with the lexeme "#t" should be converted to Boolean.TRUE as its literal value.)

FUNCTION, FORMAL_PARAM

FUNCTION AST nodes are constructed from parse nodes representing lambda expressions.

A FUNCTION AST node should have as its children

  • zero or more FORMAL AST nodes, each representing one of the formal parameters (identifiers in the formals list) of the lambda expression
  • an AST node representing the expression which is the body of the lambda expression

Each FORMAL AST node should contain, as its value (set with the setValue method) the lexeme of the IDENTIFIER token representing the formal parameter.

FUNCTION_APPLICATION

FUNCTION_APPLICATION AST nodes are constructed from parse nodes representing function applications.

Each sub-expression in the function application should be converted into an AST node, and added as a child of the FUNCTION_APPLICATION AST node.

DEFINITION

DEFINITION AST nodes are constructed from parse nodes representing definitions.

The DEFINITION node should contain as its value the lexeme of the IDENTIFIER appearing in the definition.

The DEFINITION AST node should have a single child, which is the AST constructed from the expression appearing in the definition.

IF, AND, OR, NOT

IF, AND, OR, and NOT AST nodes are constructed from if expression, and expressions, or expressions, and not expressions, respectively.

Each sub-expression appearing in the expression should be converted into an AST node and added as a child.

LET, LET_PAIR

LET AST nodes are constructed from parse nodes representing let expressions.

The children of a LET AST node should be:

  • one LET_PAIR AST node representing each let pair in the list of let pairs in the let expression
  • a single AST node representing the expression (body) appearing in the let expression

Each LET_PAIR AST node should contain, as its value, the lexeme of the IDENTIFIER in the let pair. It should also have a single child AST node representing the expression in the let pair.

Testing

The Main class has been updated to print a textual representation of the AST for each scheme program read from the keyboard. You can type Control-D (Linux) or Control-Z (Windows) in the Eclipse Console window to signal that you are done typing the Scheme program.

Submitting

Export your completed Eclipse project to a zip file by right-clicking on the name of the project (YCP_Scheme) and choosing Export->General->Archive File.

Upload the zip file to the submission server as assign4. The URL of the server is

https://camel.ycp.edu:8443/

IMPORTANT: after uploading, you should download a copy of your submission and double-check it to make sure that it contains the correct files. You are responsible for making sure your submission is correct. You may receive a grade of 0 for an incorrectly submitted assignment.