CMPU 102 - Assignment 10 Due: by 11:59 PM on Friday, December 9th

$Revision: 1.5 $

Dictionary Data Structures

In this lab, you will implement simplified versions of several dictionary data structures.

Each dictionary data structure implementation is a class that implements the Dictionary interface:

public interface Dictionary {
    public boolean add(String key);
    public boolean contains(String key);

These methods are defined to have the following behavior:

public boolean add(String key)
Add the given key to the dictionary.  Return true if successful, or false if the key already exists in the dictionary.

public boolean contains(String key)
Returns true if the given key is in the dictionary, or false if the key is not in the dictionary.

As a simplification, String objects are used as the only kind of data that can be stored in a dictionary.  Another simplification is that an operation to delete a key from the dictionary is not supported.

Your Task

Your task is to implement the BinarySearchTree, HashTable, and UnorderedList classes, each of which implements the Dictionary interface.

The BinarySearchTree should implement the dictionary operations using a binary search tree.  You do not need to implement any balancing algorithm.  There is one important restriction: rather than implementing insertion using a recursive method, you should use a loop to find the correct place to insert a new node in the tree.  The reason is that the dictionary will contain a large number of values, and calling a recursive method with too many levels of recursion can lead to a StackOverflowError.  When comparing two String objects, you can use their compareTo method to determine the order of the two Strings:

public int compareStrings(String lhs, String rhs) {
  return lhs.compareTo(rhs);

In the above example, compareStrings will return a negative value if lhs is less than rhs, 0 if they are equal, and a positive value if lhs is greater than rhs.

The HashTable class should implement the dictionary operations using a hash table, handling collisions using chaining.  You should initially allocate a small hash table of 10 buckets, and expand the hash table any time the load factor becomes higher than .75.  The load factor is computed as follows:

load factor = ( number of items in hash table / number of hash buckets )

You can use the hash method included in the class to compute a hash code for a particular String object based on the current size of your hash table.  Use the equals method to determine if two String values are the same.

The UnorderedList class should implement the dictionary operations using an unordered list data structure: either an array or a singly-linked list.  If you use an array, start the array at 10 elements and expand the array whenever necessary to store a new element.

Testing Your Project

Test your dictionary classes using the provided TestDriver class.  This class has a main method which takes three command line arguments.  The first argument indicates which dictionary class to use, and should be either bst, hash, or list.  The second and third arguments are names of files containing a list of words, one word per line.  The second file should contain a subset of the words in the first file.  The test driver works by first inserting all of the words in the first file into a dictionary, the checking the dictionary to make sure it contains (1) all of the words in the second (subset) file, and (2) all of the words in the first (complete) file.

Three large text files are provided for testing in the "data" subdirectory:

contains approximately 25,000 words, in sorted order

contains the same words as "words.txt", but in random order

a randomly chosen subset of about 10% of the words in "words.txt" and "randomWords.txt"

So to execute the hash table dictionary implementation using the random words file, you would use the following arguments:

hash data/randomWords.txt data/subset.txt

You can run the TestDriver program with specified arguments by right-clicking on "", choosing "Run->Run...", choosing the "Arguments" tab, and changing the value of the "Program Arguments" field.

When you run the test driver program you will see output like the following:

Testing HashTable
Read full and subset files in 160 ms
Built dictionary in 35 ms
Checked subset in 5 ms
Checked full dictionary in 11 ms

Performance Analysis

Once you have finished implementing the three dictionary classes, open the file "report.txt" and answer the questions it contains.  You can open and edit this file from within Eclipse.  In the report, you will run the test driver on each combination of dictionary classes and input files and then comment on the performance of the different data structures.

Getting Started, Submitting

To get started, import into your Eclipse workspace.

When you are ready to submit, run the following commands from a terminal window:

cd eclipse-workspace
submit102 dictionary