Trees

Copyright (C) 2005-2008, David H. Hovemeyer

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

A tree is a data structure consisting of nodes and edges.  There is a single root node of the tree.  Every node in the tree has 0 or more children: an edge exists between the parent node and each child node.  Each node has exactly one parent, except the root, which does not have a parent.  Another property of trees is that there are no cycles: by following edges from parent to child, it is not possible to visit the same node more than once.  Nodes which have no children are called leaf nodes.

Example tree:

In this example, the root is colored red and the leaves are colored green.

A tree is a recursive data structure because each child of a node in the tree is a tree in its own right:

Because of this property, many of the important algorithms to access and manipulate trees are most easily expressed using recursion.

Representing trees

When representing a tree which will have a small, fixed number of children per node, the edges of the tree are generally represented as a direct reference from the parent node to the child node.  For example, trees with at most two children per node are called binary trees.  Here is one way that a binary tree might be represented in Java:

class BinaryTreeNode<E> {
E payload;
BinaryTreeNode<E> left, right;
}

class BinaryTree<E> {
BinaryTreeNode<E> root;
}

Each node in the tree is an instance of the BinaryTreeNode class.  References to a node's children are stored in the left and right fields.  The BinaryTree class simply stores a reference to the root node of the tree.

For trees in which the nodes may have a large number of children, it is often better to represent the tree using a first child/next sibling representation.  In this representation, each node contains a link to its first child and its next sibling:

The solid and dashed black lines represent the tree edges.  The solid lines (black and blue) represent the actual direct links from parent to child or sibling to sibling.  The advantage of this representation is that a node can have any number of children using only two fields per node.  The disadvantage is that adding children or finding the n'th child requires more work, since the children of each node are essentially stored in a singly-linked list.

Here is a Java node class for trees represented in this way:

class TreeNode<E> {
E payload;
TreeNode<E> firstChild;
TreeNode<E> nextSibling;
}

Representing the empty tree

Usually, the empty tree is represented using the value null.  Therefore, if a child or sibling is null, that means that the child or sibling does not exist.

Sometimes it is advantageous to use a special sentinel node to represent the empty tree.  Use of a sentinel node can simplify the implementation of some of the balanced search tree algorithms we will see later.

Tree terminology

A node's ancestors include the node itself, its parent, its parent's parents, and so forth up to the root of the tree.  Its descendents include the node itself, its children, its children's children, and so forth down to the leaves.  It should be obvious that the root of a tree is the ancestor of every node in the tree.

The proper ancestors and proper descendents of a node are defined the same way, except that they do not include the node itself.

Tree Traversals

A traversal of a tree is an algorithm that visits each node in the tree exactly once.  Visiting a node just means that some action will be performed on the node; the exact nature of the visitation will generally be performed using a functor.  (Use of functors is an important technique in Generic Programming.)

Traversals are generally specified (and often implemented) recursively.  Two common traversals are pre-order and post-order:

both pre and post order traversals start at the root

preorderTraversal(node) {
if (node does not exist) return;
visit(node);
for each child of node
preorderTraversal(child);
}

postorderTraversal(node) {
if (node does not exist) return;
for each child of node
postorderTraversal(child);
visit(node);
}

A level-order traversal visits each level of the tree in order: first the root, then each child of the root, then each grand-child, etc. down to the lowest level of the tree.  It is defined using a queue:

q = new queue of nodes
q.enqueue(root);
while (q is not empty) {
node = q.dequeue();
visit(node);
for each child of node
q.enqueue(child)
}

This traversal is also known as a breadth-first traversal.

The in-order traversal is a special traversal defined only for binary trees.  In a binary tree, each node has a left subtree and a right subtree (each or both of which may be empty).

in-order traversal starts at the root

inorderTraversal (node) {
if (node does not exist) return;
inorderTraversal(left child of node);
visit(node);
inorderTraversal(right child of node);
}