Due: Tuesday, February 3rd by 11:59 PM
Download CS497_Assign1.zip. Extract the contents of the archive into a directory.
Using a Unix shell, use the cd command to navigate into the directory containing the extracted contents.
Using a text editor, open the file sumu16_par.c.
When you run the make command, all of the programs in this directory will be compiled.
The program sumu16_par.c takes a single command line argument, which is a file containing a large number of unsigned 16-bit integer values (stored in binary format). When executed, the program reads the file into memory, and then computes the sum of all of the values.
Your task is to modify this program to use two threads to perform an equivalent computation in parallel.
Extra credit: support a command line option to perform the computation using any number of threads. Each machine in the cluster has a quad core CPU, so 4-way parallelism is possible.
The basic idea is to use two worker threads, each of which computes the sum of half of the array. When both workers have completed, their individual sums can be added to produce the overall sum. [If you do the extra credit, then given n threads, each thread should compute the sum of a region of the array 1/n the size of the overall array.]
On each node in the cluster is a directory
containing several example data files for you to use in testing and benchmarking your program.
For example, you could run the command
to run the program on a file containing 2^30 (about one billion) data values.
Collect running times for both the sequential (sumu16) and parallel (sumu16_par) on each of the following files:
File Number of data elements 256M.dat 268,435,456 512M.dat 536,870,912 1G.dat 1,073,741,824 2G.dat 2,147,483,648
Here is an example of running the two-way parallel program on the 1G.dat data file:
[dhovemey@marvin]$ time ./sumu16_par /usr/local/share/cs497_data/1G.dat File is 2147483648 bytes in size Using 2 workers Sum is 35184138343749 real 0m1.064s user 0m1.048s sys 0m1.020s
The real time is the time you should include in your benchmarking data.
You can log into any of the cluster machines (hitchhiker01, hitchhiker02, etc.) to run your benchmark; find one that is idle.
Important: Before doing an "official" benchmarking run on a particular data file, do at least one "warm-up" run. If the operating system has to spend time reading the contents of the file into memory from disk, that time will dominate the computation time.
Use a spreadsheet to plot your data as follows:
You should end up with a plot that looks something like this:
Save the spreadsheet file with both your raw data and your running time plot in the directory containing your source code.
Question: did the two-way parallel version of the program execute twice as fast as the sequential version? If not, what are some reasons why the program failed to achieve a speedup proportional to the number of threads? [You don't have to submit an answer to this question: just think about it.]
From the directory containing your source files and your spreadsheet file, run the command
When prompted, enter your Marmoset username and password. You should see a message indicating that the submission was successfully uploaded to the server.
Important: You should log into the server and download your submitted files. Check to make sure that the files you submitted were the ones you intended. The server URL is