CMPU 375 - Project 3 Due: by 11:59 PM on Wednesday, November 30th

\$Revision: 1.6 \$

## Project 3: Error correction with Hamming Codes

Hamming code is a technique for encoding fixed sized chunks of binary data so that it is possible to correct single bit errors.

To encode n bits of data using a Hamming code requires n bits for the original data, plus

ceiling(log2(n))

parity bits.  The data and parity bits are arranged in a special way.  Assuming that in a string of n bits the bits are numbered from 1 to n, all of the bits that are powers of 2 are the parity bits, and all other bits are data bits.  For example, say we want to encode bytes of data (8 bits) using a Hamming code.  We will need 12 bits total.  Bits 1, 2, 4, and 8 are the parity bits.  The remaining bits 3, 5, 6, 7, 9, 10, 11, 12 are the data bits.

## Encoding

Here is how the parity bits are computed.  Parity bit n is computed by starting at position n and alternately checking n bits, skipping n bits, checking n bits, and so forth for all of the bits in the word.  So, parity bit 1 stores the parity for bits 1, 3, 5, 7, 9, and 11.  Parity bit 2 stores the parity for bits 2-3, 6-7, 10-11.

Here is a concrete example.  Say we are encoding the binary data 01000001.  (We will assume that bits are numbered with position 1 at the left and 8 at the right.)  The first step is to copy each bit into a corresponding data bit position:

_ _ 0 _ 1 0 0 _ 0 0 0 1

First we will compute parity bit 1, which records the parity for the bits shown in red.

P _ 0 _ 1 0 0 _ 0 0 0 1

The parity bit should have the value 1, since exactly 1 of the data bits is set to 1.  Next, parity bit 2:

1 P 0 _ 1 0 0 _ 0 0 0 1

In this case, all of the data bits are 0, so the parity bit is 0.  Next, parity bit 4:

1 0 0 P 1 0 0 _ 0 0 0 1

2 data bits are 1, so the parity is 0.  Finally, parity bit 8:

1 0 0 0 1 0 0 P 0 0 0 1

One data bit is 1, so the parity is 1.  After computing the parity bits, the encoded byte data has the value

1 0 0 0 1 0 0 1 0 0 0 1

## Error Detection, Correction, and Decoding

Detecting and correcting a single bit error is surprisingly simple: recompute each parity bit the same way as in the encoding process.  (Using even parity, the parity bit and the data bits it stores the parity for should sum to 0.)  If the parity does not check (sums to 1), one of the data bits (or the parity bit) has been corrupted.  The exact bit in the encoded bit string that is corrupted can be found by summing the positions of the failed parity bits.  For example, if parity bits 2 and 4 do not check, then the error is in the bit at position 6.

Why does this work?  Each data bit contributes to a unique combination of parity bits, where the parity bits are located at the same positions as the digits of binary representation of the position of the data bit.

 Parity Bit 8 - - - - - - - * * * * * Parity Bit 4 - - - * * * * - - - - * Parity Bit 2 - * * - - * * - - * * - Parity Bit 1 * - * - * - * - * - * - Data bit 1 2 3 4 5 6 7 8 9 10 11 12

To correct a 1 bit error, simply sum the positions of the parity bits that are incorrect, and toggle the bit at the position indicated by the sum.  Note that only 1 bit errors can be detected and corrected.

After checking the parity bits and correcting a 1 bit error, the original byte value can be recovered by copying the data bits at positions 3, 5, 6, 7, 9, 10, 11, 12 into consecutive positions in a single byte.

## The Project

The file hammingcode.zip contains the code for the project.  You can import it into Eclipse, or you can just unpack it and use the ant command to compile the source code.

Your task is to implement the encode and decode methods of the HammingEncoderDecoder class.  The encode method takes a byte value and returns a short value in which the byte is encoded along with its parity bits.  The decode method takes an encoded short value and returns the original byte value, correcting a single bit error if necessary.

You can test your implementation using the HammingCode class.  It has a main method that will encode and decode files.  To run it from the command line, first add the "bin" directory of the project to your CLASSPATH environment variable: from the command line you can run the commands

cd hammingcode/bin
setenv CLASSPATH `pwd`:\$CLASSPATH

After setting the CLASSPATH, you can invoke the HammingCode program using the following commands:

java edu.vassar.cs.cs375.hammingcode.HammingCode -encode inputFile outputFile
java edu.vassar.cs.cs375.hammingcode.HammingCode -encodeNoisy inputFile outputFile
java edu.vassar.cs.cs375.hammingcode.HammingCode -decode inputFile outputFile

The -encode and -encodeNoisy options take a input file and encode it, saving the result in an output file.  Using -encodeNoisy introduces a 1 bit error in the encoded 12 bit words with a 25% probability per word.

 Note: by default, a different random seed will be used on every execution when the -encodeNoisy option is given.  If you want to get repeatable results, you can add the command line option -Dhamming.seed=12345 to the command line immediately after "java".  (You can specify any seed value, not just 12345.)

The -decode option takes an encoded input file and decodes it to an output file.  You should try encoding and decoding some input files of various sizes and contents and make sure that when decoded their contents are identical to the original versions.