Due: Thursday, September 13th by 11:59 PM

Your Task

In this assignment, you will implement lexical analyzer for a simple programming language called MiniLang.

MiniLang has the following kinds of tokens:

Token type Description
ident_or_keyword A letter (A-Z or a-z) or an underscore followed by 0 or more letters, digits, or underscores. Examples: hello, abc123, _yeah
string_literal A double quote ("), followed by zero or more occurrences of either (1) a non-double quote character or (2) a backslash (\) followed by any character, ending with a double quote. Examples: "hello world", "We are the knights who say \"Ni\""
int_literal A sequence of digits (0-9), optionally preceded by a minus (-)
op_plus The '+' character
op_minus The '-' character
op_mul The '*' character
op_div The '/' character
lparen The '(' character
rparen The ')' character
comma The ',' character
semi The ';' character

The Lexical Analyzer

The lexical analyzer is a Ruby program. We will be learning about Ruby later in the course. For this assignment, you only need to modify the program to add regular expressions for each token type described above.

Here is the starting version of the program:

#! /usr/bin/ruby

        [/^[A-Za-z_][A-Za-z_0-9]*/, :ident_or_keyword],
        # add additional token types here...

STDIN.each_line do |line|
        while !line.empty?
                found = false
                PATTERNS.each do |pat|
                        if m = pat[0].match(line)
                                puts "#{pat[1]}:#{m[0]}"
                                line = m.post_match().lstrip()
                                found = true
                raise "Unrecognized token: #{line}" if !found

You can download this as a file:


To run the program, use the command

ruby Lexer.rb < inputFile

where inputFile is a file containing MiniLang tokens.

Your task is to add entries to the PATTERNS array to support additional token types. The ident_or_keyword token type is already handled - you will just need to add support for the other token types.

Ruby's regular expression syntax is borrowed from Perl. Here is a link to the documentation for Perl regular expressions:



You can check your solution against the expected output using the following test cases:


There are 5 test input files in the t directory. Each is accompanied by a corresponding output file in the oracle directory. Here is how you might test your solution:

ruby Lexer.rb < t/expr.mlang > actual.out
diff actual.out oracle/expr.out

If your Lexer.rb produced the expected output (it should), then the diff command will not produce any output.

You can try the commands above with the other input and expected output files (such as t/sum.mlang and oracle/sum.out.)


Submit your Lexer.rb file to marmoset as assign2:


IMPORTANT: after uploading, you should download a copy of your submission and double-check it to make sure that it contains the correct file(s). You are responsible for making sure your submission is correct. You may receive a grade of 0 for an incorrectly submitted assignment.