YCP Logo Assignment 1: Embarrassing Parallelism

Due: Tuesday, February 3rd by 11:59 PM

Getting Started

Download CS497_Assign1.zip. Extract the contents of the archive into a directory.

Using a Unix shell, use the cd command to navigate into the directory containing the extracted contents.

Using a text editor, open the file sumu16_par.c.

When you run the make command, all of the programs in this directory will be compiled.

Your Task

The program sumu16_par.c takes a single command line argument, which is a file containing a large number of unsigned 16-bit integer values (stored in binary format). When executed, the program reads the file into memory, and then computes the sum of all of the values.

Your task is to modify this program to use two threads to perform an equivalent computation in parallel.

Extra credit: support a command line option to perform the computation using any number of threads. Each machine in the cluster has a quad core CPU, so 4-way parallelism is possible.

The basic idea is to use two worker threads, each of which computes the sum of half of the array. When both workers have completed, their individual sums can be added to produce the overall sum. [If you do the extra credit, then given n threads, each thread should compute the sum of a region of the array 1/n the size of the overall array.]

On each node in the cluster is a directory


containing several example data files for you to use in testing and benchmarking your program.

For example, you could run the command

./sumu16_par /usr/local/share/cs497_data/1G.dat

to run the program on a file containing 2^30 (about one billion) data values.


Collect running times for both the sequential (sumu16) and parallel (sumu16_par) on each of the following files:

File Number of data elements
256M.dat 268,435,456
512M.dat 536,870,912
1G.dat 1,073,741,824
2G.dat 2,147,483,648

Here is an example of running the two-way parallel program on the 1G.dat data file:

[dhovemey@marvin]$ time ./sumu16_par /usr/local/share/cs497_data/1G.dat
File is 2147483648 bytes in size
Using 2 workers
Sum is 35184138343749

real    0m1.064s
user    0m1.048s
sys     0m1.020s

The real time is the time you should include in your benchmarking data.

You can log into any of the cluster machines (hitchhiker01, hitchhiker02, etc.) to run your benchmark; find one that is idle.

Important: Before doing an "official" benchmarking run on a particular data file, do at least one "warm-up" run. If the operating system has to spend time reading the contents of the file into memory from disk, that time will dominate the computation time.

Use a spreadsheet to plot your data as follows:

  • The X-axis is the number of data elements
  • The Y-axis is the running time
  • Plot one series for the sequential times and one series for the two-way parallel times
  • Extra credit: If your program supports an arbitrary number of threads, plot a third series for four-way parallel times

You should end up with a plot that looks something like this:


Save the spreadsheet file with both your raw data and your running time plot in the directory containing your source code.

Question: did the two-way parallel version of the program execute twice as fast as the sequential version? If not, what are some reasons why the program failed to achieve a speedup proportional to the number of threads? [You don't have to submit an answer to this question: just think about it.]


From the directory containing your source files and your spreadsheet file, run the command

make submit

When prompted, enter your Marmoset username and password. You should see a message indicating that the submission was successfully uploaded to the server.

Important: You should log into the server and download your submitted files. Check to make sure that the files you submitted were the ones you intended. The server URL is