CMPU 375 - Project 3 |
Due: by 11:59 PM on Wednesday, November 30th |

$Revision: 1.6 $

*Hamming code* is a technique for encoding fixed sized chunks
of binary data so that it is possible to correct single bit errors.

To encode *n* bits of data using a Hamming code requires
*n* bits for the original data, plus

ceiling(log_{2}(n))

*parity bits*. The data and parity bits are arranged in
a special way. Assuming that in a string of *n* bits
the bits are numbered from 1 to *n*, all of the bits that
are powers of 2 are the parity bits, and all other bits
are data bits. For example, say we want to encode bytes
of data (8 bits)
using a Hamming code. We will need 12 bits total.
Bits 1, 2, 4, and 8 are the parity bits. The remaining bits
3, 5, 6, 7, 9, 10, 11, 12 are the data bits.

Here is how the parity bits are computed. Parity bit
*n* is computed by starting at position *n*
and alternately checking *n* bits, skipping *n* bits,
checking *n* bits, and so forth for all of the bits
in the word. So, parity bit 1 stores the parity for
bits 1, 3, 5, 7, 9, and 11. Parity bit 2 stores the parity
for bits 2-3, 6-7, 10-11.

Here is a concrete example. Say we are encoding the binary data 01000001. (We will assume that bits are numbered with position 1 at the left and 8 at the right.) The first step is to copy each bit into a corresponding data bit position:

_ _ 0 _ 1 0 0 _ 0 0 0 1

First we will compute parity bit 1, which records the parity for the bits shown in red.

P _ 0 _ 1 0 0 _ 0 0 0 1

The parity bit should have the value 1, since exactly 1 of the data bits is set to 1. Next, parity bit 2:

1 P 0 _ 1 0 0 _ 0 0 0 1

In this case, all of the data bits are 0, so the parity bit is 0. Next, parity bit 4:

1 0 0 P 1 0 0 _ 0 0 0 1

2 data bits are 1, so the parity is 0. Finally, parity bit 8:

1 0 0 0 1 0 0 P 0 0 0 1

One data bit is 1, so the parity is 1. After computing the parity bits, the encoded byte data has the value

1 0 0 0 1 0 0 1 0 0 0 1

Detecting and correcting a single bit error is surprisingly simple: recompute each parity bit the same way as in the encoding process. (Using even parity, the parity bit and the data bits it stores the parity for should sum to 0.) If the parity does not check (sums to 1), one of the data bits (or the parity bit) has been corrupted. The exact bit in the encoded bit string that is corrupted can be found by summing the positions of the failed parity bits. For example, if parity bits 2 and 4 do not check, then the error is in the bit at position 6.

Why does this work? Each data bit contributes to a unique combination of parity bits, where the parity bits are located at the same positions as the digits of binary representation of the position of the data bit.

Parity Bit 8 - - - - - - - * * * * * Parity Bit 4 - - - * * * * - - - - * Parity Bit 2 - * * - - * * - - * * - Parity Bit 1 * - * - * - * - * - * - Data bit 1 2 3 4 5 6 7 8 9 10 11 12

To correct a 1 bit error, simply sum the positions of the parity bits that are incorrect, and toggle the bit at the position indicated by the sum. Note that only 1 bit errors can be detected and corrected.

After checking the parity bits and correcting a 1 bit error, the original byte value can be recovered by copying the data bits at positions 3, 5, 6, 7, 9, 10, 11, 12 into consecutive positions in a single byte.

The file hammingcode.zip contains the code
for the project.
You can import it into Eclipse, or you can just unpack it and use the
**ant** command to compile the source code.

Your task is to implement the **encode**
and **decode** methods of the **HammingEncoderDecoder** class.
The **encode** method takes a byte value and returns a short value in which
the byte is encoded along with its parity bits. The **decode**
method takes an encoded short value and returns the original byte value,
correcting a single bit error if necessary.

You can test your implementation using the **HammingCode** class.
It has a **main** method that will encode and decode files.
To run it from the command line, first add the "bin" directory of
the project to your **CLASSPATH** environment variable:
from the command line you can run the commands

cd hammingcode/bin setenv CLASSPATH `pwd`:$CLASSPATH

After setting the CLASSPATH, you can invoke the HammingCode program using the following commands:

java edu.vassar.cs.cs375.hammingcode.HammingCode -encodeinputFileoutputFilejava edu.vassar.cs.cs375.hammingcode.HammingCode -encodeNoisyinputFileoutputFilejava edu.vassar.cs.cs375.hammingcode.HammingCode -decodeinputFileoutputFile

The -encode and -encodeNoisy options take a input file and encode it, saving the result in an output file. Using -encodeNoisy introduces a 1 bit error in the encoded 12 bit words with a 25% probability per word.

Note: by default, a different random seed will be used on every execution when the -encodeNoisy option is given. If you want to get repeatable results, you can add the command line option-Dhamming.seed=12345to the command line immediately after "java". (You can specify any seed value, not just 12345.)

The -decode option takes an encoded input file and decodes it to an output file. You should try encoding and decoding some input files of various sizes and contents and make sure that when decoded their contents are identical to the original versions.

Additional information about Hamming Codes may be found at the following sites:

- http://www.ee.unb.ca/tervo/ee4253/hamming.htm
- http://www.cs.fiu.edu/~downeyt/cop3402/hamming.html
- http://en.wikipedia.org/wiki/Hamming_code

Many additional sites can be found by a google search for "hamming code".

When you are done, submit by running the command

submit375 hammingcode