It is possible to convert freely between regular expressions, deterministic finite automata, and nondeterministic finite automata

Given one, we can convert it to any of the other forms

In general, any regular
expression X can be converted to an equivalent NFA called NFA_{X}
containing a single start state and a single accepting state

Case 1: sequence of symbols

output is sequence of states with transitions accepting those symbols

e.g., the regular expression abba yields the NFA

e.g., the regular expression ε yields the NFA

Case 2: disjunction

if A and B are regular
expressions whose equivalent NFAs are NFA_{A} and
NFA_{B}, then the we can construct an NFA called NFA_{A|B}
that accepts the language generated by A|B as follows:

create start and accepting
states of NFA_{A|B}

use ε-transitions
to connect the start state of NFA_{A|B} to
the start states of NFA_{A} and
NFA_{B}

change start states
of NFA_{A} and
NFA_{B} so that they are no longer start states

use ε-transitions
to connect accepting states of NFA_{A}
and
NFA_{B} to the accepting state of NFA_{A|B}

change accepting states
of NFA_{A}
and
NFA_{B} so that they are no longer accepting states

e.g., the NFA recognizing the language generated by abba|bab:

Case 3: repetition

If A is a regular expression
whose equivalent NFA is NFA_{A}, then we can
construct an NFA called NFA_{A*} which accepts the
languge generated by A* as follows:

create start and accepting
states of NFA_{A*}

create an ε-transition from
start state to accepting state of NFA_{A*}

create an ε-transition from
accepting state to start state of NFA_{A*}

create ε-transition
from start state of NFA_{A*} to start
state of NFA_{A}

change start state
of NFA_{A} so that it is not a start state

create ε-transitions from
accepting state of NFA_{A} to accepting
state of NFA_{A*}

change accepting state
of NFA_{A} so that it is not an accepting
state

e.g., construct NFA that recognizes language generated by (abba|bab)*

Case 4: concatenation

if A and B are regular
expressions whose equivalent NFAs are NFA_{A} and
NFA_{B}, then the we can construct an NFA called NFA_{AB}
that accepts the language generated by AB as follows:

create start and accepting
states of NFA_{AB}

create ε-transition from
start state of NFA_{AB} to start state
of NFA_{A}

change start state
of NFA_{A} so it is not a start state

create ε-transition from
accepting state of NFA_{A} to start state
of NFA_{B}

change accepting state
of NFA_{A} so it is not an accepting state

change start state
of NFA_{B} so it is not a start state

create ε-transition from
accepting state of NFA_{B} to accepting
state of NFA_{AB}

change accepting state
of NFA_{B} so it is not an accepting state

e.g.: construct NFA that recognizes (a|b)c

first part: (a|b)

second part: c

overall NFA: (a|b)c

Here is a sketch of the algorithm to convert an NFA into a DFA:

Rule: in an NFA, if two states are connected by an ε-transition, then they are equivalent.

Define a table mapping sets of NFA states to corresponding DFA states.

-- this function converts an NFA to a DFA

function Convert_NFA_To_DFA() {

work list := new empty queue

Start := set of NFA states equivalent to NFA start state

enqueue Start on to work list

while (work list is not empty) {

dequeue a set of NFA states S from the work list

if (S has not been processed yet) {

mark S as processed

D = Map_NFA_States_To_DFA_State(S)

for each symbol Y in alphabet {

T = set of states reachable on Y

E = Map_NFA_States_To_DFA_State(T)

create DFA transition from D to E on symbol Y

enqueue T on to the work list

}

}

}

mark the first DFA state created as the DFA start state

}

-- this function returns the DFA state correpsonding to a set of NFA states, creating the DFA state if necessary

function Map_NFA_States_To_DFA_States(U) {

if (table contains entry for U) {

return the DFA state in table corresponding to U

}

create new DFA state F in table corresponding to U

if (U contains an NFA accepting state) {

make F an accepting state

}

return F

}

Example: convert the NFA produced by translating the regular expression (aa|ab)* into a DFA.

Input NFA:

Output DFA:

Table of NFA state sets to DFA states:

NFA state set DFA state {0, 1, 4, 7} 0 {2, 5} 1 {0, 1, 3, 4, 7} 2 {0, 1, 4, 6, 7} 3

The algorithm to convert a DFA to a regular expression is left as an exercise for the reader :-)

Given the existence of all 3 algorithms (regexp -> NFA, NFA -> DFA, DFA -> regexp), we can easily see that regular expressions, DFAs, and NFAs are equivalent:

Ok, you say, we have 3 equivalent formalisms for describing regular languages. What's the big deal?

It turns out that the algorithms described in this lecture have great practical importance in the implementation of programming languages.

The rules describing how to form the tokens of a programming language are, for all well-known programming languages, a regular language. For example, in C, an identifier token must begin with a letter, after which we can have any sequence of letters, digits, or underscore characters.

Let us consider an extended regular expression syntax where

[A-Z] means any capital letter

[a-z] means any lower case letter

[0-9] means any digit

So, a regular expression describing C identifiers is

([A-Z]|[a-z])([A-Z]|[a-z]|[0-9]|_)*

Similar regular expressions can be constructed for other kinds of tokens, such as numeric literals, string literals, etc. Regular expressions are a very convenient format for specifying the lexical structure of a language.Once we have defined a regular expression for each kind of token, we can combine all of the regular expressions using disjunction into a single regular expression that generates all of the tokens of the language.

An implementation of a programming language must take a source program as input and translate it into executable form. The first phase of this process, called the scanner, takes the sequence of characters in the source program and turns them into a sequence of tokens.We can simplify this part of the programming language implementation using a tool called a scanner generator. A scanner generator allows the language designer to specify the legal tokens of the language using regular expressions. Then, the scanner generator translates the regular expressions into an NFA, which is further translated into a DFA. DFAs have the nice characteristic that they can be implemented as a table-driven state machine which can process any sequence of characters quickly while using a finite amount of memory. The scanner generator creates source code for a table-driven DFA which recognizes the language described by the original regular expressions.

One popular scanner generator which creates C/C++ source code for fast scanners is flex: