YCP Logo Assignment 3: Parsing

Due: Tuesday, Oct 13th by 11:59 PM

In this assignment, you will implement a recursive-descent parser for the subset of Scheme you designed a grammar for in Assignment 1.

Getting Started

Start Eclipse. Rename your YCP_Scheme project from Assignment 2 by right-clicking the project name in the Package Explorer choosing Refactor->Rename, and changing the name in the dialog. (For example, you could rename it to YCP_Scheme_Assign2.)

Download CS340_Assign3.zip. Import it into your Eclipse workspace (File->Import->General->Existing projects into workspace->Archive File.

You should see a project called YCP_Scheme in the Package Explorer. Copy your LexicalAnalyzerImpl class from your old YCP_Scheme project (the one you renamed) into the new one.

In LexicalAnalyzerImpl.java, you will need to change all occurrences of TokenType to SymbolType. (Edit->Find will allow you to rename all occurrences in the file automatically.)

Your Task

Your task is to implement the parse method of the Parser class so that it finds a derivation of the input program using the productions of your Scheme grammar, if a derivation exists.

The approach you use will be similar to the one you used in Lab 2.

You should add new enumeration values to the SymbolType enumeration to represent each of the nonterminal symbols in your grammar.

Preparing your grammar

Fixing mistakes

You will need to fix any mistakes in your grammar before you start working on the parser. Here are two options:

  1. Fix the grammar yourself. If you resubmit a corrected version of Assignment 1 sometime before this assignment is due, I will give you up to to 75% of full credit for Assignment 1. (You can use this option to improve your grade if you received less than 75/100.) Email me if you resubmit assignment 1 so I will know to look for it.
  2. You can ask me to give you a correct grammar. (In this case you will not receive additional credit for Assignment 1.)

Other grammar adjustments

Make sure that your grammar does not use left recursion, since top-down parsers cannot handle left-recursive grammars.

For example, if you have

<program> := <top-level-item>
           | <program> <top-level-item>

you should change it to

<program> := <top-level-item>
           | <top-level-item> <program>

Additional grammar rules

Add the following productions to your grammar:

<expression> := <and-expression>

<expression> := <or-expression>

<expression> := <not-expression>

<and-expression> := "(" "and" <expression> <expression> ")"

<or-expression> := "(" "or" <expression> <expression> ")"

<not-expression> := "(" "not" <expression> ")"

Parsing Scheme

One issue you will face in implementing your parser is that a single token of lookahead may not be sufficient for the grammar to predict which production should be applied.

There are two places in particular where two tokens of lookahead may be needed:

  1. expanding a top-level-item
  2. expanding an expression

Parsing a top-level-item

In the case of a top-level-item, the parser may notice that the token returned by peek is an LPAREN. In this case, the parser does not know whether to expand the top-level-item into an expression or a definition, since both of those could legally result in a string of terminal symbols beginning with an LPAREN.

Here is the problem faced by the parser, with a caret ("^") indicating the possible expansions of each production:

<top-level-item> := ^ <definition>

<top-level-item> := ^ <expression>

<definition> := ^ "(" "define" ...

<expression> := ^ "(" ...several terminal symbols are legal here...

In this case, a second token of lookahead allows the parser to make the correct decision. If the token that follows the next token is DEFINE_KEYWORD, then expanding the top-level-item into a definition is the correct production to apply. Otherwise, the top-level-item should be expanded into an expression.

In the Parser class, you will notice that the lexer field is an instance of PushbackLexicalAnalyzer. This object supports the peek and next methods as usual, but it adds an additional method called putBack. The putBack method pushes a token back into the lexer object in much the same way that the unread method pushes a character back into a PushbackReader.

Here is an example showing how you can get the second lookahead token:

// get first lookahead token
Token nextTok = lexer.peek();
if (nextTok == null) { throw new ParseException("unexpected end of input"); }

// temporarily consume first lookahead token
lexer.next();

// get second lookahead token
Token nextNextTok = lexer.peek();
if (nextNextTok == null) { throw new ParseException("unexpected end of input"); }

// "unconsume" the first lookahead token
lexer.putBack(nextTok);

// now the parser can use the second lookahead token
// to make a decision
if (nextNextTok.getSymbolType() == SymbolType.DEFINE_KEYWORD) {
        ...

Parsing an expression

A similar issue arises when parsing an expression: several legal expansions (e.g., lambda-expression, if-expression, function-application, etc.) can begin with an LPAREN token.

The parser can use a second token of lookahead to figure out which production to apply. Notice that in most expressions which begin with an LPAREN, the LPAREN is immediately followed by a keyword, e.g. LAMBDA_KEYWORD for a lambda-expression, IF_KEYWORD for an if-expression, etc. The only exception is a function application, so if the second lookahead token is anything other than a keyword, the parser may assume a function application is being expanded.

Testing

You can test your parser by executing the Main class (right-click, Run As->Java Application.

This program will use your lexical analyzer and parser to build a parse tree of the input program. Type Control-D (Linux) or Control-Z (Windows) to indicate that you are done typing input.

Here is the program output on a fairly simple input (user input in bold):

(print "Hello, world")
PROGRAM
+--TOP_LEVEL_ITEM
   +--EXPRESSION
      +--FUNCTION_APPLICATION_EXPRESSION
         +--LPAREN("(")
         +--EXPRESSION
         |  +--IDENTIFIER("print")
         +--ARG_LIST
         |  +--EXPRESSION
         |  |  +--STRING_LITERAL(""Hello, world"")
         |  +--ARG_LIST
         +--RPAREN(")")

An example with two top level items:

(define add +)
(add 2 3)
PROGRAM
+--TOP_LEVEL_ITEM
|  +--DEFINITION
|     +--LPAREN("(")
|     +--DEFINE_KEYWORD("define")
|     +--IDENTIFIER("add")
|     +--EXPRESSION
|     |  +--IDENTIFIER("+")
|     +--RPAREN(")")
+--PROGRAM
   +--TOP_LEVEL_ITEM
      +--EXPRESSION
         +--FUNCTION_APPLICATION_EXPRESSION
            +--LPAREN("(")
            +--EXPRESSION
            |  +--IDENTIFIER("add")
            +--ARG_LIST
            |  +--EXPRESSION
            |  |  +--INTEGER_LITERAL("2")
            |  +--ARG_LIST
            |     +--EXPRESSION
            |     |  +--INTEGER_LITERAL("3")
            |     +--ARG_LIST
            +--RPAREN(")")

Here is a more complicated example (user input in bold):

(define fact
  (lambda (n)
    (if (= n 1)
        1
        (* n (- n 1)))))
PROGRAM
+--TOP_LEVEL_ITEM
   +--DEFINITION
      +--LPAREN("(")
      +--DEFINE_KEYWORD("define")
      +--IDENTIFIER("fact")
      +--EXPRESSION
      |  +--LAMBDA_EXPRESSION
      |     +--LPAREN("(")
      |     +--LAMBDA_KEYWORD("lambda")
      |     +--LPAREN("(")
      |     +--FORMALS_LIST
      |     |  +--IDENTIFIER("n")
      |     |  +--FORMALS_LIST
      |     +--RPAREN(")")
      |     +--EXPRESSION
      |     |  +--IF_EXPRESSION
      |     |     +--LPAREN("(")
      |     |     +--IF_KEYWORD("if")
      |     |     +--EXPRESSION
      |     |     |  +--FUNCTION_APPLICATION_EXPRESSION
      |     |     |     +--LPAREN("(")
      |     |     |     +--EXPRESSION
      |     |     |     |  +--IDENTIFIER("=")
      |     |     |     +--ARG_LIST
      |     |     |     |  +--EXPRESSION
      |     |     |     |  |  +--IDENTIFIER("n")
      |     |     |     |  +--ARG_LIST
      |     |     |     |     +--EXPRESSION
      |     |     |     |     |  +--INTEGER_LITERAL("1")
      |     |     |     |     +--ARG_LIST
      |     |     |     +--RPAREN(")")
      |     |     +--EXPRESSION
      |     |     |  +--INTEGER_LITERAL("1")
      |     |     +--EXPRESSION
      |     |     |  +--FUNCTION_APPLICATION_EXPRESSION
      |     |     |     +--LPAREN("(")
      |     |     |     +--EXPRESSION
      |     |     |     |  +--IDENTIFIER("*")
      |     |     |     +--ARG_LIST
      |     |     |     |  +--EXPRESSION
      |     |     |     |  |  +--IDENTIFIER("n")
      |     |     |     |  +--ARG_LIST
      |     |     |     |     +--EXPRESSION
      |     |     |     |     |  +--FUNCTION_APPLICATION_EXPRESSION
      |     |     |     |     |     +--LPAREN("(")
      |     |     |     |     |     +--EXPRESSION
      |     |     |     |     |     |  +--IDENTIFIER("-")
      |     |     |     |     |     +--ARG_LIST
      |     |     |     |     |     |  +--EXPRESSION
      |     |     |     |     |     |  |  +--IDENTIFIER("n")
      |     |     |     |     |     |  +--ARG_LIST
      |     |     |     |     |     |     +--EXPRESSION
      |     |     |     |     |     |     |  +--INTEGER_LITERAL("1")
      |     |     |     |     |     |     +--ARG_LIST
      |     |     |     |     |     +--RPAREN(")")
      |     |     |     |     +--ARG_LIST
      |     |     |     +--RPAREN(")")
      |     |     +--RPAREN(")")
      |     +--RPAREN(")")
      +--RPAREN(")")

Note that whether or not your output is similar to the output above depends on how you specified your grammar. You should check the parse tree output and verify that its structure is correct according to the derivation that the parser should be using to derive the input program.

Submitting

Export your completed Eclipse project to a zip file by right-clicking on the name of the project (YCP_Scheme) and choosing Export->General->Archive File.

Upload the zip file to the submission server as assign3. The URL of the server is

https://camel.ycp.edu:8443/

IMPORTANT: after uploading, you should download a copy of your submission and double-check it to make sure that it contains the correct files. You are responsible for making sure your submission is correct. You may receive a grade of 0 for an incorrectly submitted assignment.