CS 340 - Lecture 5

Context-free languages and pushdown automata

The strings in a context-free language can be generated by a context-free grammar.  We also discussed the problem of parsing: given an input string and a context-free grammar, find a derivation (sequence of productions) that derive the input string from the grammar's start symbol.  Parsing is thus a problem of recognizing whether or not an input string is a member of the language defined by a grammar.  Various parsing algorithms exist to solve this problem.

You might wonder whether or not there is a kind of automaton that can serve as a recognizer for context-free languages.  Pushdown automata (PDAs) are a class of automaton that are powerful enough to serve as recognizers for context-free languages.

Pushdown automata are similar to finite automata: they consist of states and transitions.  However, an infinitely long "tape" is added to the automaton.  The tape can serve as storage for data that the automaton needs to remember.  In any state, the automaton chooses to do one of three things:

  1. Read a symbol from the input string

  2. Push a symbol on the tape

  3. Pop a symbol from the tape

Symbols are written to and read from the tape in last-in, first-out order.  In other words, the symbol popped from the tape by the automaton will be the one most recently pushed.  This is exactly equivalent to a stack data structure.

Let's describe a PDA that recognizes the language

anbn

(All strings of the form n a's followed by n b's, for n >= 0.)

Note that the PDA a special symbol Δ, which is not part of the language's alphabet, is used in two ways

  1. It is pushed onto the tape as the automaton's first action

  2. It is appended to the input string, meaning that it will be the last symbol read from the input string

Here is the state diagram for the PDA:


The PDA works as follows:

The special symbol Δ is pushed onto the tape

If the first symbol read is Δ, then the string is accepted (the empty string is a member of the language)

For each a symbol read, an a symbol is pushed onto the tape

When the first b symbol is encountered, the PDA attempts to pop a matching a symbol from the tape

Subsequent b symbols must be matched with a symbols popped from the tape

When the terminating Δ symbol is read, a matching Δ symbol must be popped, in which case the input string is accepted

Any time an unexpected symbol is encountered (from the input string or the tape), the input string is rejected.

The PDA shown above is a deterministic pushdown automaton (D-PDA) because each state has only one transition per input symbol.  Note that nondeterministic pushdown automata (N-PDAs) are possible; they may states that have multiple transitions on the same input symbol.  Interestingly, the power of D-PDAs and N-PDAs is not equivalent; N-PDAs can recognize some languages that D-PDAs cannot.  (Contrast this with DFAs and NFAs, which have exactly the same expressive power.)

Deterministic context-free languages are the subset of context-free languages that can be recognized by a D-PDA.  An N-PDA can recognize any context-free language, and thus are more general.

Limits to the expressive power of context-free languages

We saw that some interesting languages were not regular languages.  For example, arbitrary palindromes and other languages with balanced symbols are not regular.

Context-free languages are a more powerful class of languages, and do include languages with balanced constructs.  However, are there languages that are not context-free?

The answer is yes.  An example is the language

anbncn

i.e., the language containing strings of the form n repetitions of a, followed by n repetitions of b, followed by n repetitions of c, for n >= 0.

Turing Machines

A Turing machine (TM) is the most powerful kind of automaton that we will discuss.  In fact, Turing machines are capable of solving any problem that can be solved by computation.  (There are some problems that are not solvable by a computation, as we will see shortly.)

Turing machines are named after Alan Turing, the mathematician who invented them.

Turning machines are similar to PDAs: they consist of states and transitions, and use an infinite tape for storage.  Unlike PDAs, however, the use of the tape by a Turing machine is not limited to pushing and popping symbols: it can move the tape either left or right after writing a symbol on the tape.  Unlike a PDA, the tape is used as BOTH the input string and the temporary storage.  In addition, when the Turing machine terminates we can consider the contents of the tape to be output.  In this way, a Turning machine is more than a recognizer for strings in a language.

A Turing machine, like a PDA, has a "tape head" indicating the current location on the tape that the Turing machine is looking at.  The symbol underneath the tape head is used to determine which transition will be followed.

Each transition in the state diagram of a Turing machine is labeled with three symbols

(i, o, d)

"i" is an input symbol.  The transition will be taken if the symbol underneath the tape head matches this symbol.

"o" is an output symbol.  If the transition is taken, then this symbol is written to the location underneath the tape head, overwriting whatever symbol was there previously.

"d" is a direction: L (left) or R (right).  If the transition is taken the tape head is moved one position in the specified direction.

The Turing machine completes its computation if it reaches a state labeled "Halt".

Turing machine that can recognize the language anbncn

Here is a description of a Turing machinen that can recognize the language  anbncn, which we have noted is not a context-free language.

The Turing machine will operate by scanning from left to right, replacing one set of a,b,c symbols with upper case symbols A,B,C.  When it reaches the Δ symbol marking the end of the string, it will "rewind" from right to left to work on the next set of a,b,c symbols.  If no more a symbols remain in the string, it will scan from left to right to verify that no more b or c symbols remain.  If the verification succeeds (no b or c symbols are encountered), then the original string has been accepted as a member of the language.

Example: we will start out with a tape that looks like this, representing the string "aabbcc".  (The tape head is positioned at the underlined symbol.)

aabbccΔ

Here is how the TM progresses:

AabbccΔ

AabbccΔ

AaBbccΔ

AaBbc

AaBbCcΔ

AaBbCcΔ

("rewind" by moving left until the Turing machine encounters an A symbol)

AaBbCcΔ    (move right)

AaBbCcΔ    (replace one more a,b,c set with A,B,C)

AABbCcΔ

AABbCcΔ

AABBC

AABBCcΔ

AABBCCΔ

("rewind" by moving left until the Turing machine encounters an A symbol)

AABBCCΔ    (move right)

AABBCCΔ    (because a B was encountered, there are no more a symbols, so we verify that no b or c symbols remain)

AABBCCΔ

AABBC

AABBCCΔ

AABBCCΔ    (Halt)

If at any step in the process the Turing machine encounters an unexpected symbol, then it does not halt (and is considered to have rejected the original input string.)

Here is a state diagram of this Turing Machine:


Turing Completeness

It may seem surprising, but Turing machines have been shown to be at least as powerful as every "reasonable" known model of computation.  For example, if we wanted to we could translate a C++ or Java program into a Turing machine.

Any model of computation that can be translated into an equivalent Turing machine is said to be Turing complete.