October 31st by 11:59 PM
Updated: Monday, October 29th (link to fixed parserTest.bin file, extend due date)
In this assignment you will implement a parser for the YCPL programming language.
Parsing is the problem of, given a grammar and an input string, constructing the syntax tree of a derivation of the input string.
Recursive descent parsing is a technique where each nonterminal symbol in the grammar is implemented by a function (method) in the parser. The job of a nonterminal's function is to apply one of the productions having that nonterminal on the left-hand side, consuming 0 or more input symbols. We call this recursive descent because the right-hand side of the applied production may have further nonterminal symbols which require (possively recursive) function calls to resolve.
We examined recursive descent parsing in Lecture 8. ExpressionParserSolution.zip contains example code that implements a recursive descent parser for an expression grammar. (The parser you construct in this assignment will be similar.)
| statement translation_unit
| IDENTIFIER ASSIGNMENT expression
| IDENTIFIER LEFT_PARENTHESIS opt_arg_list RIGHT_PARENTHESIS
| KEYWORD("func") LEFT_PARENTHESIS opt_param_list RIGHT_PARENTHESIS
LEFT_BRACE statement RIGHT_BRACE
| KEYWORD("if") LEFT_PARENTHESIS expression RIGHT_PARENTHESIS
| expression COMMA arg_list
| IDENTIFIER COMMA param_list
This grammar is specified in a slightly different form than the context-free grammars we have looked at:
Nonterminal symbols are lower case words: e.g., statement
Each nonterminal in the input string is a token (sequence of characters treated as a lexical unit), not just a single character.
Terminal symbols are in all upper case: e.g., INTEGER_LITERAL. The terminal symbols correspond to the members of the TokenType enumeration in Assignment 2.
Some of the terminal symbols are annotated with double-quoted strings, e.g., KEYWORD("if"). This indicates a keyword token where the lexeme (the text appearing in the actual YCPL program) is "if". In otherwords, the parser should not allow a keyword token with lexeme "func" if it is expanding a production that requires a keyword token with the lexeme "if".
The assignment is designed to be completed within the Eclipse Java IDE.
Download CS340_Assign4.zip. Within Eclipse, choose File->Import...->Existing Projects into Workspace. Click Select archive file, Browse..., choose CS340_Assign2.zip, and click Finish. You should see a new project called ycpl in the Package Explorer.
You will need to modify LexerImpl.java so that it contains the changes you made as part of Assignment 2.
Your task is to implement the parseStatement method of the ParserImpl class. (This will involve implementing parsing methods for other nonterminals.)
Each parsing method should construct a syntax tree (using instances of the ParseNode class; see below) which exactly match the productions of the YCPL grammar used in the derivation of the input program. The grammar is structured so that there is exactly one leftmost derivation of any legal input program, so you don't need to worry about ambiguity.
The parser uses a Lexer object to handle the job of reading tokens from the input YCPL program.
You will use three Lexer methods:
hasMoreTokens() - returns true if there are more input tokens to read, false if the lexer has reached the end of input
peek() - return the next input token without actually consuming it
get() - return the next input token, and consume it
The ParseNode abstract base class represents one node in the syntax tree. It has two subclasses, NonterminalParseNode and TerminalParseNode.
The ParseNodeType enumerated type defines the possible kinds of parse nodes. The members of the enumeration correspond to token types (types of terminal symbols) and types of nonterminal symbols.
The NonterminalParseNode class represents a single nonterminal symbol. It may have 0 or more children, which can be either NonterminalParseNodes or TerminalParseNodes.
The TerminalParseNode class represents a single terminal symbol. It contains a reference to a single Token, since input tokens consitute the terminal symbols of the YCPL language.
In general, each parser method you write will expand a single nonterminal symbol, and will operate as follows:
Create a new instance of NonterminalParseNode
Based on the sequence of tokens generated by the lexer, read terminal symbols and expand nonterminal symbols. In some cases, you will need to decide, based on the sequence of tokens, which of several possible productions to apply. Attach the parse node generated for each terminal and nonterminal symbol to the nonterminal parse node generated in step 1.
Return a reference to the parse node created in step 1.
See the parseTranslationUnit method for an example.
In general, if you are expanding a nonterminal, and either there are no more tokens to read, or the next token is not a token that can legally start the right-hand side of a non-epsilon production, then choose the epsilon production.
When expanding the opt_arg_list or opt_param_list nonterminals, and you need to choose which production to use, if there are no more tokens or the next token will be a RIGHT_PARENTHESIS, choose the epsilon production.
Throw a SyntaxException if you detect a token that is not legal for the production you are applying. The first argument to the constructor is a message describing the error. The second argument is a reference to the Lexer object the ParserImpl object is using.
Updated Oct 29th:
There is a bug in the tests distributed with the original CS340_Assign4.zip.
To fix the bug, download the updated parserTest.bin and copy it into the junit/edu/ycp/cs340/ycpl folder, then right-click on the ycpl project and choose the Refresh menu item.
To test your parser implementation, run the unit tests by right clicking ParserTest.java and choosing Run As->JUnit Test. The unit tests work by invoking your parser to parse YCPL statements and then checking to see if the correct syntax tree is generated.
When a unit test fails, the output in the Console window will show the expected syntax tree and the actual syntax tree generated by your parser implementation. For example, when parsing the YCPL statement
you might see the following output:
Expected syntax tree differs from actual syntax tree:
In the case above, the actual syntax tree is incorrect because the STATEMENT node does not have the SEMICOLON terminal node as a child.
When you are done:
Export your project to a zip file. In Eclipse, right-click on the project ycpl in the Package Explorer, and choose Export...->Archive File. Enter the name/path of the zip file you want to save your project in. Click Finish.
Upload your saved zip file to the Marmoset server as Project 4.