YCP Logo Assignment 3: Abstract Syntax Trees

Due: Friday, November 7th by 11:59 PM

Abstract Syntax Trees

In Assignment 2, you implemented a recursive-descent parser for the YCPL language. The parse trees constructed by your parser precisely match the grammar rules used to derive the input strings accepted by the parser, and contain all of the nonterminal and terminal symbols involved in the derivation.

In order to implement an interpreter or compiler for a language, it is generally possible to use the parse tree as a representation of the program. However, the full parse tree often contains more information than is necessary, and its structure may not be particularly convenient. For these reasons, interpreters and compilers typically construct an abstract syntax tree (AST) as a more compact representation of the program. An abstract syntax tree is, as the name suggests, a higher-level representation of the program which simplifies the structure of the program, and discards extraneous details.

Your Task

Your task is to modify your parser to construct abstract syntax trees representing expressions.

Because YCPL is a functional language, expressions are the most important construct in the language. Therefore, its ASTs are based on expressions.

Each type of expression in YCPL is embodied by a grammar rule. Your ASTs should support the following types of expressions:

Assignment expression

Grammar rule:

expression →
IDENTIFIER ASSIGNMENT expression

An assignment expression represents an assignment of a value to a variable. The identifier and the sub-expression should be attached to the assignment expression AST node.

Function call expression

Grammar rule:

expression →
IDENTIFIER LEFT_PARENTHESIS opt_arg_list RIGHT_PARENTHESIS

Represents a call to a function. The identifier (function name) and list of argument expressions should be attached to the function call expression AST node.

Function expression

Grammar rule:

expression →
KEYWORD("func") LEFT_PARENTHESIS opt_param_list RIGHT_PARENTHESIS LEFT_BRACE statement RIGHT_BRACE

A function expression represents a function. The list of parameters should be attached to the function expression AST node. Also, the expression that is a child of the statement should be attached. (This expression is the body of the function.)

If/then/else expression

Grammar rule:

expression →
KEYWORD("if") LEFT_PARENTHESIS expression RIGHT_PARENTHESIS KEYWORD("then") expression KEYWORD("else") expression

An if/then/else expression represents an if/then/else construct. The expressions representing the condition, the "then" expression, and the "else" expression, should each be attached to the if/then/else expression AST node.

Integer literal expression

Grammar rule:

expression →
INTEGER_LITERAL

Represents a literal integer value. The value of the literal integer should be stored in the integer literal expression AST node.

Variable reference expression

Grammar rule:

expression →
IDENTIFIER

A variable reference expression represents a variable reference. The identifier of the referenced variable should be attached to the variable reference expression AST node.

Printing the AST for each expression

A YCPL program is a sequence of 0 or more statements. Each statement consists of an expression, followed by a semicolon.

You should change your parser so that, for each statement in the program, you build the AST for the statement's expression, and print a tree representation of the expression.

For example: say that the input program contains a single statement, which assigns a function to a variable:

fact ::= func(n)
  {
    if (=(n, 1))
      then 1
      else *(n, fact(-(n, 1)));
  };

Here is the full parse tree for the assignment expression:

EXPRESSION
+--IDENTIFIER("fact")
+--ASSIGN("::=")
+--EXPRESSION
+--FUNC("func")
+--LPAREN("(")
+--OPT_PARAM_LIST
|  +--PARAM_LIST
|     +--IDENTIFIER("n")
+--RPAREN(")")
+--LBRACE("{")
+--STATEMENT
|  +--EXPRESSION
|  |  +--IF("if")
|  |  +--LPAREN("(")
|  |  +--EXPRESSION
|  |  |  +--IDENTIFIER("=")
|  |  |  +--LPAREN("(")
|  |  |  +--OPT_ARG_LIST
|  |  |  |  +--ARG_LIST
|  |  |  |     +--EXPRESSION
|  |  |  |     |  +--IDENTIFIER("n")
|  |  |  |     +--COMMA(",")
|  |  |  |     +--ARG_LIST
|  |  |  |        +--EXPRESSION
|  |  |  |           +--INT_LITERAL("1")
|  |  |  +--RPAREN(")")
|  |  +--RPAREN(")")
|  |  +--THEN("then")
|  |  +--EXPRESSION
|  |  |  +--INT_LITERAL("1")
|  |  +--ELSE("else")
|  |  +--EXPRESSION
|  |     +--IDENTIFIER("*")
|  |     +--LPAREN("(")
|  |     +--OPT_ARG_LIST
|  |     |  +--ARG_LIST
|  |     |     +--EXPRESSION
|  |     |     |  +--IDENTIFIER("n")
|  |     |     +--COMMA(",")
|  |     |     +--ARG_LIST
|  |     |        +--EXPRESSION
|  |     |           +--IDENTIFIER("fact")
|  |     |           +--LPAREN("(")
|  |     |           +--OPT_ARG_LIST
|  |     |           |  +--ARG_LIST
|  |     |           |     +--EXPRESSION
|  |     |           |        +--IDENTIFIER("-")
|  |     |           |        +--LPAREN("(")
|  |     |           |        +--OPT_ARG_LIST
|  |     |           |        |  +--ARG_LIST
|  |     |           |        |     +--EXPRESSION
|  |     |           |        |     |  +--IDENTIFIER("n")
|  |     |           |        |     +--COMMA(",")
|  |     |           |        |     +--ARG_LIST
|  |     |           |        |        +--EXPRESSION
|  |     |           |        |           +--INT_LITERAL("1")
|  |     |           |        +--RPAREN(")")
|  |     |           +--RPAREN(")")
|  |     +--RPAREN(")")
|  +--SEMICOLON(";")
+--RBRACE("}")

It's a bit complex, isn't it?

Here is the AST for the same assignment expression:

ASSIGNMENT_EXPR("fact")
+--FUNCTION_EXPR
   +--IF_THEN_ELSE_EXPR
      +--FUNC_CALL_EXPR("=")
      |  +--VAR_REF_EXPR("n")
      |  +--INT_LITERAL_EXPR(1)
      +--INT_LITERAL_EXPR(1)
      +--FUNC_CALL_EXPR("*")
         +--VAR_REF_EXPR("n")
         +--FUNC_CALL_EXPR("fact")
            +--FUNC_CALL_EXPR("-")
               +--VAR_REF_EXPR("n")
               +--INT_LITERAL_EXPR(1)

Notice that even though it is much simpler, the AST still contains the essential information about the assignment expression.

Hints

Starting code

If you choose to, you may use the following classes to implement your AST:

Converting each statement's expression to an AST

My implementation used the following code to take the parse tree for the entire program (translation unit), and then convert each statement's expression to an AST, printing each AST:

ParseNode translationUnit = parser.parseTranslationUnit();

NonterminalParseNode stmtList = (NonterminalParseNode) translationUnit;

// For each statement in the translation unit (list of statements)...
while (stmtList.getNumChildren() > 0) {
        // Get the statement
        NonterminalParseNode stmt = (NonterminalParseNode)stmtList.getChild(0);

        // Get the expression from the statement
        NonterminalParseNode expr = (NonterminalParseNode) stmt.getChild(0);

        System.out.println("Parse tree:");
        TreePrinter treePrinter = new TreePrinter();
        treePrinter.print(expr);

        // Convert the expression to an AST.
        ASTBuilder astBuilder = new ASTBuilder();
        ASTNode ast = astBuilder.buildAST(expr);

        // Print the AST
        System.out.println("AST:");
        ASTPrinter astPrinter = new ASTPrinter();
        astPrinter.print(ast);

        // Go to next statement (if any)
        stmtList = (NonterminalParseNode) stmtList.getChild(1);
}

The buildAST method of the ASTBuilder class is responsible for translating parse trees into ASTs. You will need to implement a similar class.

Submitting

Submit a zip file containing your complete project (all source files, along with whatever other files are needed to compile them) to the submission server as assign3. The URL of the server is

https://camel.ycp.edu:8443/