YCP Logo Assignment 2: Recursive Descent Parsing

Updated: Wednesday, Oct 15th

Your implementation should print out the parse tree after an input file is successfully parsed; see example below.

Updated: Monday, Oct 20th

Added a file containing a sample source file. See section Testing below.

Due: Friday, Oct 24th by 11:59 PM

Your Task

Write a recursive descent parser for the following grammar:

translation_unit :=


        | statement translation_unit

statement :=

          expression SEMICOLON

expression :=


        | IDENTIFIER

        | IDENTIFIER ASSIGNMENT expression


          LEFT_BRACE statement RIGHT_BRACE

          KEYWORD("then") expression
          KEYWORD("else") expression

opt_arg_list :=

        | arg_list

arg_list :=


        | expression COMMA arg_list

opt_param_list :=


        | param_list

param_list :=


        | IDENTIFIER COMMA param_list

This grammar is specified in a slightly different form than the context-free grammars we have looked at:

  1. Nonterminal symbols are lower case words: e.g., statement
  2. Each terminal symbol in the input string is a token (sequence of characters treated as a lexical unit), not just a single character.
  3. Terminal symbols are in all upper case: e.g., INTEGER_LITERAL. The terminal symbols correspond to the token types described in Assignment 1.
  4. Some of the terminal symbols are annotated with double-quoted strings, e.g., KEYWORD("if"). This indicates a keyword token where the lexeme (the text appearing in the actual YCPL program) is "if". In other words, the parser should not allow a keyword token with lexeme "func" if it is expanding a production that requires a keyword token with the lexeme "if".

As it parses the input source code, your parser should build a parse tree of the entire input. Your Parser class should have a main method which prompts the user for a filename, attempts to parse the code in the file, and (if successful), prints out the parse tree.

Example: say that the file funcCall.ycpl contains the following code:

+(fact(123), 3);

When executed on funcCall.ycpl, the parser should produce something like the following output:

|  |  +--IDENTIFIER("+")
|  |  +--LPAREN("(")
|  |  +--OPT_ARG_LIST
|  |  |  +--ARG_LIST
|  |  |     +--EXPRESSION
|  |  |     |  +--IDENTIFIER("fact")
|  |  |     |  +--LPAREN("(")
|  |  |     |  +--OPT_ARG_LIST
|  |  |     |  |  +--ARG_LIST
|  |  |     |  |     +--EXPRESSION
|  |  |     |  |        +--INT_LITERAL("123")
|  |  |     |  +--RPAREN(")")
|  |  |     +--COMMA(",")
|  |  |     +--ARG_LIST
|  |  |        +--EXPRESSION
|  |  |           +--INT_LITERAL("3")
|  |  +--RPAREN(")")
|  +--SEMICOLON(";")

This is a textual representation of the parse tree.

Your output may vary: for example, you might have chosen different names for terminal and nonterminal symbols, and you might have structured your grammar differently (for example, if you left-factored it.)

If the input program cannot be parsed, your program should print an error message explaining which token in the input caused the failure to parse, with its line number. Example: say that the file badInput.ycpl has the following input:

addone ::= func(n) {
  +(n, 1};

My parser implementation produced the following error message:

At line 2: Parse error: expected RPAREN, found RBRACE


General Hints

Your approach will be very similar to the parser you implemented in Lab 4. The lecture notes on recursive descent parsing should be useful.

You may wish to use the following classes to reperesent your parse trees:

Modify these classes as appropriate.

Note that in the ParseNodeType enumeration, the parse node types corresponding to terminal symbols must exactly match the names you defined for your TokenType enumeration when you implemented your lexical analyzer.

Lookahead (peek)

Your lexer should support a peek operation that looks ahead to see what the next token in the input string is, without consuming it.

See the Lexer class in Lab 4.

Argument lists and parameter lists

When expanding an opt_arg_list or opt_param_list nonterminal, peek ahead one token. If the next token in the input string is a right parenthesis, choose the epsilon production.

When expanding an arg_list or param_list nonterminal, the parser will need to decide, after parsing the initial expression or identifer, whether to continue to consume more expressions or identifiers. You can make this determination by peeking ahead one token. If the next token in the input string is a comma, then continue the list. Otherwise, complete the production.

The Parser class's main method

The main method of the Parser class should look something like this:

public static void main(String[] args) throws IOException {
        Scanner keyboard = new Scanner(System.in);

        System.out.print("Read which file? ");
        String fileName = keyboard.next();

        Lexer lexer = new Lexer(new BufferedReader(new FileReader(fileName)));
        Parser parser = new Parser(lexer);

        ParseNode translationUnit = parser.parseTranslationUnit();

        TreePrinter treePrinter = new TreePrinter();

        System.out.println("Successful parse!");

This code assumes that your lexical analyzer class is called Lexer, and can be instantiated by passing a reference to a BufferedReader object as the argument to its constructor. It also assumes that the Parser class's constructor takes a reference to a Lexer object as a parameter.

Printing the parse tree

The TreePrinter.java class can be used with the suggested ParseNode classes. You should be able to adapt this class fairly easily to work with your own parse tree classes.


You can test your parser on the following source file:


It should parse successfully, and should result in the creation of a parse tree that contains eight top-level statements.


Submit a zip file containing your complete project (all source files, along with whatever other files are needed to compile them) to the submission server as assign2. The URL of the server is