Introduce myself

Go over syllabus

Course outline

Language = set of strings of symbols

a string is a sequence of 0 or more symbols

string containing 0 symbols is the empty
string, denoted ε (epsilon)

Set of possible symbols is the "alphabet"

Language can be infinite

Why study formal languages?

Simplified model of computation

Specify syntax of programming languages

Simple but useful model of computation

In most programming languages, the set of legal tokens is a regular
language

Two formalisms for specifying a regular langauge:

Regular expression - a "generator" for a regular language

Finite automaton - a "recognizer" for a regular language

Anyone seen the movie "Tron"?

A regular expression "generates" all possible strings in a regular language

Specifying a regular expression

Any symbol in the alphabet is a regular expression generating itself

The special symbol ε
(epsilon) is a
regular expression representing the empty string

Rules for combining regular expressions

More complex regular expressions are built out of simpler regular expressions.

Note that parentheses can be used for
grouping.

Concatenation

If x
and y are regular expressions
generating regular languages Lx
and Ly, then xy is a regular
expression generating the language containing all strings resulting
from concatenating one member of Lx
with one member of Ly.

Disjunction

If x and y are regular expressions generating regular languages Lx and Ly, then (x|y) is a regular expression generating the language Lx ∪ Ly. (I.e., the union of Lx and Ly.)

Repetition

If x is a regular expression generating
regular language Lx, then (x)*
is a regular expression generating the language of all strings formed
by concatenating 0 or more of members of Lx. (Kleene star operator.)

If x is a regular expression generating
regular language Lx, then (x)+
is a regular expression generating the language of all strings formed
by concatenating 1 or more of members of Lx. (Kleene plus operator.)

This may seem complicated, but some examples should clarify things.

In the examples below, the alphabet is {a, b}.

Regular expression

Language (set of strings)

a

{ a }

aa

{ aa }

a*

{ ε, a, aa, aaa, ... } aa*

{ a, aa, aaa, ... }

a+

{ a, aa, aaa, ...}

ba+

{ ba, baa, baaa, ...}

(ba)+

{ ba, baba, bababa, ...}

(a|b)

{ a, b }

a|b*

{ a, ε, b, bb, bbb, ... }

(a|b)*

{ ε, a, b, aa, ab, ba, bb, ... } aa(ba)*bb

{ aabb, aababb, aabababb, ... }

Note that the repetition operators (* and +) bind more tightly than the disjunction operator (|). Parentheses are sometimes necessary to ensure that a repetition operator applies to the regular expression you intend.

Run RegeXeX on the problem set at the following URL

http://faculty.ycp.edu/~dhovemey/fall2007/cs340/labs/08292007

A finite automaton is a "recognizer" for the strings of a regular
language.

You can think of a finite automaton as a machine that takes a string
of symbols as input and answers "yes" or "no", depending on whether or
not the input string is a member of the regular language the automaton
recognizes.

Here is a finite automaton that recognizes the language generated by the regular expression aa(ba)*bb