Due: Monday, September 23rd by 11:59 PM

Updated 9/19 - Correction for the str_literal token type: changed the due date

Getting Started

Download CS340_Assign03.zip.

You will need to edit the file Lexer.rb.

Your Task

In this assignment, you will implement lexical analyzer for a simple programming language called MiniLang.

MiniLang has the following kinds of tokens:

Token type Description
var The var keyword.
func The func keyword.
identifier A letter (A-Z or a-z) or an underscore followed by 0 or more letters, digits, or underscores. Examples: hello, abc123, _yeah
str_literal A double quote ("), followed by zero or more occurrences of either (1) a non-double quote, non-backslash character or (2) a backslash (\) followed by any character, ending with a double quote. Examples: "hello world", "We are the knights who say \"Ni\""
int_literal A sequence of digits (0-9), optionally preceded by a minus (-)
op_plus The '+' character
op_minus The '-' character
op_mul The '*' character
op_div The '/' character
op_exp The '^' character
op_assign ':=' (assignment)
lparen The '(' character
rparen The ')' character
lcurly The '{' character
rcurly The '}' character
comma The ',' character
semicolon The ';' character

The Lexical Analyzer

The lexical analyzer is a Ruby program.

You will modify the source file Lexer.rb, which includes an array of patterns corresponding to the various token types documented above. Initially this array will look like this:

# Add additional token types as necessary.
# Note that these patterns will be checked in order, so you
# will need to think about how to order them.
        [/^var\b/, :var],
        [/^[A-Za-z_][A-Za-z_0-9]*\b/, :identifier],

Your task is to add entries to the PATTERNS array to support additional token types. The var and identifier token types are already handled - you will just need to add support for the other token types.

Ruby's regular expression syntax is borrowed from Perl. Here is a link to the documentation for Perl regular expressions:



To test your lexer, run the program LexerTest.rb. This program reads text from standard input, uses a Lexer object to read tokens, and prints the type and lexeme of each token.

Run LexerTest.rb using the command

ruby LexerTest.rb < inputfile

An example input file test.in is provided. It contains the following text:

func add(a, b) {
        var sum;
        sum := a + b;
        return sum;

When LexerTest.rb is run with this file as input, the output should be:

func: func
identifier: add
lparen: (
identifier: a
comma: ,
identifier: b
rparen: )
lcurly: {
var: var
identifier: sum
semicolon: ;
identifier: sum
op_assign: :=
identifier: a
op_plus: +
identifier: b
semicolon: ;
identifier: return
identifier: sum
semicolon: ;
rcurly: }


To submit, run the command

make submit

from the directory containing the lexical analyzer files. Type your Marmoset username and password when prompted.

You can also create a zip file containing the lexical analyzer files, and upload it to Marmoset as assign03 using the web interface: