The strings in a context-free language can be generated by a
context-free grammar. We also discussed the problem of
parsing:
given an input string and a context-free grammar, find a derivation
(sequence of productions) that derive the input string from the
grammar's start symbol. Parsing is thus a problem of
recognizing
whether or not an input string is a member of the language defined by a
grammar. Various parsing algorithms exist to solve this
problem.
You might wonder whether or not there is a kind of automaton
that
can serve as a recognizer for context-free languages. Pushdown automata
(PDAs) are a
class of automaton that are powerful enough to serve as recognizers for
context-free languages.
Pushdown automata are similar to finite automata: they consist
of
states and transitions. However, an infinitely long "tape" is
added to the automaton. The tape can serve as storage for
data
that the automaton needs to remember. In any state, the
automaton
chooses to do one of three things:
Read a symbol from the input string
Push a symbol on the tape
Pop a symbol from the tape
Symbols are written to and read from the tape in last-in,
first-out
order. In other words, the symbol popped from the tape by the
automaton will be the one most recently pushed. This is
exactly
equivalent to a stack
data structure.
Let's describe a PDA that recognizes the language
a^{n}b^{n}
(All strings of the form n a's followed by n b's, for n
>= 0.)
Note that the PDA a special symbol Δ, which is not part of the
language's alphabet, is used in two ways
It is pushed onto the tape as the automaton's first action
It is appended to the input string, meaning that it will be the last symbol read from the input string
Here is the state diagram for the PDA:
The PDA works as follows:
The special symbol Δ is
pushed onto the
tape
If the first symbol read is
Δ, then the
string is accepted (the empty string is a member of the language)
For each a
symbol read, an a
symbol is
pushed onto the tape
When the first b symbol is
encountered, the PDA
attempts to pop a matching a
symbol from the tape
Subsequent b symbols must be
matched with a
symbols popped from the tape
When the terminating Δ
symbol is read, a
matching Δ symbol must be popped, in which case the input string is
accepted
Any time an unexpected symbol is encountered (from the input
string
or the tape), the input string is rejected.
The PDA shown above is a deterministic
pushdown automaton (D-PDA) because each state has only one
transition per input symbol. Note that nondeterministic
pushdown
automata (N-PDAs) are possible; they may states that have multiple
transitions on the same input symbol. Interestingly, the
power of
D-PDAs and N-PDAs is not equivalent; N-PDAs can recognize some
languages that D-PDAs cannot. (Contrast this with DFAs and
NFAs,
which have exactly the same expressive power.)
Deterministic context-free languages are the subset of
context-free
languages that can be recognized by a D-PDA. An N-PDA can
recognize any context-free language, and thus are more general.
We saw that some interesting languages were not regular
languages. For example, arbitrary palindromes and other
languages
with balanced symbols are not regular.
Context-free languages are a more powerful class of languages,
and
do include languages with balanced constructs. However, are
there
languages that are not context-free?
The answer is yes. An example is the language
a^{n}b^{n}c^{n}
i.e., the language containing strings of the form n
repetitions of
a, followed by n repetitions of b, followed by n repetitions of c, for
n >= 0.
A Turing machine (TM) is the most powerful kind of automaton
that we
will
discuss. In fact, Turing machines are capable of solving any
problem that can be solved by computation. (There are some
problems that are not solvable by a computation, as we will see
shortly.)
Turing machines are named after Alan Turing,
the
mathematician who invented them.
Turning machines are similar to PDAs: they consist of states
and
transitions, and use an infinite tape for storage. Unlike
PDAs,
however, the use of the tape by a Turing machine is not limited to
pushing and popping symbols: it can move the tape either left or right
after writing a symbol on the tape. Unlike a PDA, the tape is
used as BOTH the input string and the temporary storage. In
addition, when the Turing machine terminates we can consider the
contents of the tape to be output. In this way, a Turning
machine
is more than a recognizer for strings in a language.
A Turing machine, like a PDA, has a "tape head" indicating the
current location on the tape that the Turing machine is looking
at. The symbol underneath the tape head is used to determine
which transition will be followed.
Each transition in the state diagram of a Turing machine is
labeled
with three symbols
(i, o, d)
"i" is an input
symbol. The
transition will be taken if the symbol underneath the tape head matches
this symbol.
"o" is an output
symbol. If the
transition is taken, then this symbol is written to the location
underneath the tape head, overwriting whatever symbol was there
previously.
"d" is a direction: L (left)
or R
(right). If the transition is taken the tape head is moved
one
position in the specified direction.
The Turing machine completes its computation if it reaches a
state
labeled "Halt".
Here is a description of a Turing machinen that can recognize the language a^{n}b^{n}c^{n}, which we have noted is not a context-free language.
The Turing machine will operate by scanning from left to
right,
replacing one set of a,b,c
symbols with upper case symbols A,B,C.
When it reaches the Δ symbol marking the end of the string, it will
"rewind" from right to left to work on the next set of a,b,c symbols.
If no more a
symbols remain in the string, it will scan from left to right to verify
that no more b
or c
symbols remain. If the
verification succeeds (no b
or
c symbols
are encountered),
then the original string has been accepted as a member of the language.
Example: we will start out with a
tape that looks like this, representing the string "aabbcc".
(The
tape head is positioned at the underlined symbol.)
aabbccΔ
Here is how the TM progresses:
AabbccΔ
AabbccΔ
AaBbccΔ
AaBbccΔ
AaBbCcΔ
AaBbCcΔ
("rewind" by moving left until the Turing machine encounters
an A
symbol)
AaBbCcΔ
(move right)
AaBbCcΔ
(replace one more a,b,c set with A,B,C)
AABbCcΔ
AABbCcΔ
AABBCcΔ
AABBCcΔ
AABBCCΔ
("rewind" by moving left until the Turing machine encounters
an A symbol)
AABBCCΔ
(move right)
AABBCCΔ (because a B was encountered, there are no more a symbols, so we verify that no b or c symbols remain)
AABBCCΔ
AABBCCΔ
AABBCCΔ
AABBCCΔ (Halt)
If at any step in the process the Turing machine encounters an
unexpected symbol, then it does not halt (and is considered to have
rejected the original input string.)
Here is a state diagram of this Turing Machine:
It may seem surprising, but Turing machines have been shown to
be at
least as powerful as every "reasonable" known model of
computation. For
example, if we wanted to we could translate a C++ or Java program into
a Turing machine.
Any model of computation that can be translated into an equivalent Turing machine is said to be Turing complete.