CS 340 - Lecture 1

Introduce myself

Go over syllabus

Course outline

Formal languages

Language = set of strings of symbols

a string is a sequence of 0 or more symbols

string containing 0 symbols is the empty string, denoted ε (epsilon)

Set of possible symbols is the "alphabet"

Language can be infinite

Why study formal languages?

Simplified model of computation

Specify syntax of programming languages

Regular languages

Simple but useful model of computation

In most programming languages, the set of legal tokens is a regular language

Two formalisms for specifying a regular langauge:

Regular expression - a "generator" for a regular language

Finite automaton - a "recognizer" for a regular language

Anyone seen the movie "Tron"?

Regular expressions

A regular expression "generates" all possible strings in a regular language

Specifying a regular expression

Any symbol in the alphabet is a regular expression generating itself

The special symbol ε (epsilon) is a regular expression representing the empty string

Rules for combining regular expressions

More complex regular expressions are built out of simpler regular expressions.

Note that parentheses can be used for grouping.

Concatenation

If x and y are regular expressions generating regular languages Lx and Ly, then xy is a regular expression generating the language containing all strings resulting from concatenating one member of Lx with one member of Ly.

Disjunction

If x and y are regular expressions generating regular languages Lx and Ly, then (x|y) is a regular expression generating the language Lx ∪ Ly.  (I.e., the union of Lx and Ly.)

Repetition

If x is a regular expression generating regular language Lx, then (x)* is a regular expression generating the language of all strings formed by concatenating 0 or more of members of Lx.  (Kleene star operator.)

If x is a regular expression generating regular language Lx, then (x)+ is a regular expression generating the language of all strings formed by concatenating 1 or more of members of Lx.  (Kleene plus operator.)

This may seem complicated, but some examples should clarify things.

In the examples below, the alphabet is {a, b}.

Regular expression
Language (set of strings)
a
{ a }
aa
{ aa }
a*
{ ε, a, aa, aaa, ... }
aa*
{ a, aa, aaa, ... }
a+
{ a, aa, aaa, ...}
ba+
{ ba, baa, baaa, ...}
(ba)+
{ ba, baba, bababa, ...}
(a|b)
{ a, b }
a|b*
{ a, ε, b, bb, bbb, ... }
(a|b)*
{ ε, a, b, aa, ab, ba, bb, ... }
aa(ba)*bb
{ aabb, aababb, aabababb, ... }

Note that the repetition operators (* and +) bind more tightly than the disjunction operator (|).  Parentheses are sometimes necessary to ensure that a repetition operator applies to the regular expression you intend.

Lab exercise

Run RegeXeX on the problem set at the following URL

http://faculty.ycp.edu/~dhovemey/fall2007/cs340/labs/08292007

Finite automata

A finite automaton is a "recognizer" for the strings of a regular language.

You can think of a finite automaton as a machine that takes a string of symbols as input and answers "yes" or "no", depending on whether or not the input string is a member of the regular language the automaton recognizes.

Here is a finite automaton that recognizes the language generated by the regular expression aa(ba)*bb

finite automaton