YCP Logo Lab 1: Lexical Analysis

Assigned: Sept 17th

Getting Started

Download CS340_Lab1.zip and save it somewhere (e.g., the Desktop).

Start Eclipse. Import the zip file into Eclipse (File->Import->General->Existing projects into workspace->Archive file.)

You should see a project called CS340_Lab1 in the Package Explorer.

Your Task

Your task is to implement the next and peek methods in the LexicalAnalyzerImpl class. This class implements a lexical analyzer for a simple calculator application. (This lab is very similar to Assignment 2, and should serve as a useful model for that assignment.)

The next method reads and consumes a single token of input from the source text, and returns a Token object representing the token. The next method never returns null: if called when there are no more tokens to be read, it throws a LexicalAnalyzerException.

The peek method "peeks" ahead at the next token in the source text without consuming it. If called when there are no more tokens to be read, it returns null.

Token class

An instance of the Token class represents one token read from the source text.

A Token instance stores three pieces of information:

  1. the token type (see "TokenTypes" below)
  2. the lexeme (as an instance of String)
  3. the line number on which the token occurred in the source text (as an int)

Token Types

The calculator language has the following kinds of tokens. (These are specified in the TokenType enumeration).

  • SET_KEYWORD - the token with the lexeme "set"
  • IDENTIFIER - a token whose lexeme is any sequence of one or more letters (as determined by the Character.isLetter method) other than "set"
  • INT_LITERAL - any sequence of one or more digits (as determined by the Character.isDigit method)
  • PLUS_OP - the token with the lexeme "+"
  • MINUS_OP - the token with the lexeme "-"
  • MULT_OP - the token with the lexeme "*"
  • DIV_OP - the token with the lexeme "/"
  • ASSIGN_OP - the token with the lexeme "="

Testing

Two programs are included to help you test your lexer implementation.

The Test program uses your lexer to read tokens from the keyboard. Information about the tokens read is printed to System.out.

Example run (user input in bold):

set a = 42
SET_KEYWORD("set")@1
IDENTIFIER("a")@1
ASSIGN_OP("=")@1
INT_LITERAL("42")@1
3 - 2 - 1
INT_LITERAL("3")@2
MINUS_OP("-")@2
INT_LITERAL("2")@2
MINUS_OP("-")@2
INT_LITERAL("1")@2
a + b * 3
IDENTIFIER("a")@3
PLUS_OP("+")@3
IDENTIFIER("b")@3
MULT_OP("*")@3
INT_LITERAL("3")@3
end of input

Note that "end of input" is printed (and the program exits) when the user types Control-D (Linux systems) or Control-Z (Windows systems.)

The Calculator program implements a simple integer calculator which reads infix expressions (one per line of input) and evaluates them. Variables may be assigned using the syntax

set identifier = int-literal

where identifier is an indentifier and int-literal is an integer literal.

Example run (user input in bold):

set a = 4
4
set b = 5
5
a + b * 3
19
3 - 2 - 1
0
64 / 4 / 2
8

Type Control-D (Linux) or Control-Z (Windows) to exit the program.