Copyright (C) 2005-2008, David H. Hovemeyer

This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

A tree is a data structure consisting of *nodes* and *edges*.
There is a single *root node* of the tree. Every node in
the
tree has 0 or more *children*: an edge exists between the *parent*
node and each child node. Each node has exactly one parent,
except the root, which does not have a parent. Another property
of trees is that there are no *cycles*: by following edges from
parent to child, it is not possible to visit the
same node more than once. Nodes which have no children are
called *leaf nodes*.

Example tree:

In this example, the root is colored red and the leaves are colored green.

A tree is a *recursive data structure* because each child of a
node
in the tree is a tree in its own right:

Because of this property, many of the important algorithms to access and manipulate trees are most easily expressed using recursion.

When representing a tree which will have a small, fixed number of
children per node, the edges of the tree are generally
represented as a direct reference from the parent node to
the child node. For example, trees with at most two
children per node are called *binary trees*. Here is
one way that a binary tree might be represented in Java:

class BinaryTreeNode<E> {

E payload;

BinaryTreeNode<E> left, right;

}

class BinaryTree<E> {

BinaryTreeNode<E> root;

}

Each node in the tree is an instance of the **BinaryTreeNode**
class.
References to a node's children are stored in the **left** and
**right** fields. The **BinaryTree** class simply
stores a reference to the root node of the tree.

For trees in which the nodes may have a large number of children,
it is often better to represent the tree using a *first child/next
sibling*
representation. In this representation, each node contains a link
to its first child and its next *sibling*:

The solid and dashed black lines represent the tree edges.
The solid lines (black and blue) represent the actual direct links
from parent to child or sibling to sibling.
The advantage of this representation is that a node can have
any number of children using only two fields per node.
The disadvantage is that adding children or finding the
*n*'th child requires more work, since the children of
each node are essentially stored in a singly-linked list.

Here is a Java node class for trees represented in this way:

class TreeNode<E> {

E payload;

TreeNode<E> firstChild;

TreeNode<E> nextSibling;

}

Usually, the empty tree is represented using the value **null**.
Therefore, if a child or sibling is **null**, that means that the
child or sibling does not exist.

Sometimes it is advantageous to use a special *sentinel* node
to represent the empty tree. Use of a sentinel node can
simplify the implementation of some of the balanced search tree
algorithms we will see later.

A node's *ancestors* include the node itself, its parent,
its parent's parents, and so forth up to the root of the tree.
Its *descendents* include the node itself, its children,
its children's children, and so forth down to the leaves.
It should be obvious that the root of a tree is the ancestor of
every node in the tree.

The *proper ancestors* and *proper descendents* of a node
are defined the same way, except that they do not include the node
itself.

A *traversal* of a tree is an algorithm that *visits* each
node in the tree exactly once. Visiting a node just means that
some action will be performed on the node; the exact nature of the
visitation will generally be performed using a *functor*.
(Use of functors is an important technique in Generic Programming.)

Traversals are generally specified (and often implemented) recursively. Two common traversals are pre-order and post-order:

both pre and post order traversals start at the root

preorderTraversal(node) {

if (node does not exist) return;

visit(node);

for each child of node

preorderTraversal(child);

}

postorderTraversal(node) {

if (node does not exist) return;

for each child of node

postorderTraversal(child);

visit(node);

}

A level-order traversal visits each *level* of the tree
in order: first the root, then each child of the root, then
each grand-child, etc. down to the lowest level of the tree.
It is defined using a queue:

q = new queue of nodes

q.enqueue(root);

while (q is not empty) {

node = q.dequeue();

visit(node);

for each child of node

q.enqueue(child)

}

This traversal is also known as a *breadth-first* traversal.

The *in-order* traversal is a special traversal defined only for
binary trees. In a binary tree, each node has a left subtree and
a right subtree (each or both of which may be empty).

in-order traversal starts at the root

inorderTraversal (node) {

if (node does not exist) return;

inorderTraversal(left child of node);

visit(node);

inorderTraversal(right child of node);

}