CS 340 - Assignment 4

Due: Monday, October 31st by 11:59 PM

Updated: Monday, October 29th (link to fixed parserTest.bin file, extend due date)

Recursive Descent Parsing

In this assignment you will implement a parser for the YCPL programming language.

Parsing is the problem of, given a grammar and an input string, constructing the syntax tree of a derivation of the input string.

Recursive descent parsing is a technique where each nonterminal symbol in the grammar is implemented by a function (method) in the parser.  The job of a nonterminal's function is to apply one of the productions having that nonterminal on the left-hand side, consuming 0 or more input symbols.  We call this recursive descent because the right-hand side of the applied production may have further nonterminal symbols which require (possively recursive) function calls to resolve.

We examined recursive descent parsing in Lecture 8.  ExpressionParserSolution.zip contains example code that implements a recursive descent parser for an expression grammar.  (The parser you construct in this assignment will be similar.)

YCPL Grammar

Here is a grammar for the YCPL language:
translation_unit :=

epsilon

| statement translation_unit

statement :=

expression SEMICOLON

expression :=

INTEGER_LITERAL

| IDENTIFIER

| IDENTIFIER ASSIGNMENT expression

| IDENTIFIER LEFT_PARENTHESIS opt_arg_list RIGHT_PARENTHESIS

| KEYWORD("func") LEFT_PARENTHESIS opt_param_list RIGHT_PARENTHESIS
LEFT_BRACE statement RIGHT_BRACE

| KEYWORD("if") LEFT_PARENTHESIS expression RIGHT_PARENTHESIS
KEYWORD("then") expression
KEYWORD("else") expression

opt_arg_list :=

epsilon

| arg_list

arg_list :=

expression

| expression COMMA arg_list

opt_param_list :=

epsilon

| param_list

param_list :=

IDENTIFIER

| IDENTIFIER COMMA param_list

This grammar is specified in a slightly different form than the context-free grammars we have looked at:

  1. Nonterminal symbols are lower case words: e.g., statement

  2. Each nonterminal in the input string is a token (sequence of characters treated as a lexical unit), not just a single character.

  3. Terminal symbols are in all upper case: e.g., INTEGER_LITERAL.  The terminal symbols correspond to the members of the TokenType enumeration in Assignment 2.

  4. Some of the terminal symbols are annotated with double-quoted strings, e.g., KEYWORD("if").  This indicates a keyword token where the lexeme (the text appearing in the actual YCPL program) is "if".  In otherwords, the parser should not allow a keyword token with lexeme "func" if it is expanding a production that requires a keyword token with the lexeme "if".

Getting Started

The assignment is designed to be completed within the Eclipse Java IDE.

Download CS340_Assign4.zip.  Within Eclipse, choose File->Import...->Existing Projects into Workspace.  Click Select archive file, Browse..., choose CS340_Assign2.zip, and click Finish.  You should see a new project called ycpl in the Package Explorer.

You will need to modify LexerImpl.java so that it contains the changes you made as part of Assignment 2.

Your Task

Your task is to implement the parseStatement method of the ParserImpl class.  (This will involve implementing parsing methods for other nonterminals.)

Each parsing method should construct a syntax tree (using instances of the ParseNode class; see below) which exactly match the productions of the YCPL grammar used in the derivation of the input program.  The grammar is structured so that there is exactly one leftmost derivation of any legal input program, so you don't need to worry about ambiguity.

The Lexer interface

The parser uses a Lexer object to handle the job of reading tokens from the input YCPL program.

You will use three Lexer methods:

hasMoreTokens() - returns true if there are more input tokens to read, false if the lexer has reached the end of input

peek() - return the next input token without actually consuming it

get() - return the next input token, and consume it

ParseNode, parsing methods

The ParseNode abstract base class represents one node in the syntax tree.  It has two subclasses, NonterminalParseNode and TerminalParseNode.

The ParseNodeType enumerated type defines the possible kinds of parse nodes.  The members of the enumeration correspond to token types (types of terminal symbols) and types of nonterminal symbols.

The NonterminalParseNode class represents a single nonterminal symbol.  It may have 0 or more children, which can be either NonterminalParseNodes or TerminalParseNodes.

The TerminalParseNode class represents a single terminal symbol.  It contains a reference to a single Token, since input tokens consitute the terminal symbols of the YCPL language.

In general, each parser method you write will expand a single nonterminal symbol, and will operate as follows:

  1. Create a new instance of NonterminalParseNode

  2. Based on the sequence of tokens generated by the lexer, read terminal symbols and expand nonterminal symbols.  In some cases, you will need to decide, based on the sequence of tokens, which of several possible productions to apply.  Attach the parse node generated for each terminal and nonterminal symbol to the nonterminal parse node generated in step 1.

  3. Return a reference to the parse node created in step 1.

See the parseTranslationUnit method for an example.

Hints

In general, if you are expanding a nonterminal, and either there are no more tokens to read, or the next token is not a token that can legally start the right-hand side of a non-epsilon production, then choose the epsilon production.

When expanding the opt_arg_list or opt_param_list nonterminals, and you need to choose which production to use, if there are no more tokens or the next token will be a RIGHT_PARENTHESIS, choose the epsilon production.

Throw a SyntaxException if you detect a token that is not legal for the production you are applying.  The first argument to the constructor is a message describing the error.  The second argument is a reference to the Lexer object the ParserImpl object is using.

Testing

Updated Oct 29th:

There is a bug in the tests distributed with the original CS340_Assign4.zip.

To fix the bug, download the updated parserTest.bin and copy it into the junit/edu/ycp/cs340/ycpl folder, then right-click on the ycpl project and choose the Refresh menu item.

To test your parser implementation, run the unit tests by right clicking ParserTest.java and choosing Run As->JUnit Test.  The unit tests work by invoking your parser to parse YCPL statements and then checking to see if the correct syntax tree is generated.

When a unit test fails, the output in the Console window will show the expected syntax tree and the actual syntax tree generated by your parser implementation.  For example, when parsing the YCPL statement

42;

you might see the following output:

Expected syntax tree differs from actual syntax tree:

Expected:

STATEMENT
EXPRESSION
INTEGER_LITERAL("42")
SEMICOLON(";")

Actual:

STATEMENT
EXPRESSION
INTEGER_LITERAL("42")

In the case above, the actual syntax tree is incorrect because the STATEMENT node does not have the SEMICOLON terminal node as a child.

Submitting

When you are done:

Export your project to a zip file.  In Eclipse, right-click on the project ycpl in the Package Explorer, and choose Export...->Archive File.  Enter the name/path of the zip file you want to save your project in.  Click Finish.

Upload your saved zip file to the Marmoset server as Project 4.