# Lecture 18: Fork/Join Parallelism

Example code for this lecture: ForkJoin.zip

## Fork/join parallelism

Fork/join parallelism is a style of parallel programming useful for exploiting the parallelism inherent in divide and conquer algorithms on shared memory multiprocessors.

The idea is quite simple: a larger task can be divided into smaller tasks whose solutions can then be combined. As long as the smaller tasks are independent, they can be executed in parallel.

Example: finding the maximum value in a large array of integers. Each task is given three data:

• array to search
• start and end indices to search

The computation starts with the creation of a "root task", which represents the overall problem.

```rootTask = new Task(arr, 0, arr.length)
```

Assume that task class has a compute method which searches the specified range. The compute method will check to see how many elements are in the range it needs to search. If the number of elements in the range is below a threshold value, then a sequential computation is performed. Otherwise, the range is split into two sub-ranges containing half of the elements. Subtasks are created to search the sub-ranges. The maximum of the maximum values in the subranges is the maximum value in the overall range.

Pseudocode for the compute method:

```// returns maximum integer in range
int compute() {
if ((end - start) <= THRESHOLD) {
return result of sequential search for maximum element in range
}

// find middle index in range
int mid = start + (end - start) / 2

left = new Task(array, start, mid)
right = new Task(array, mid, end)

left.fork()
right.fork()

// wait for parallel tasks to complete, and get their results
int leftMax = left.join()
int rightMax = right.join()

// max of sub-ranges is max of overall range
return maximum of leftMax, rightMax
}
```

### Choosing a threshold

One of the main things to consider when implementing an algorithm using fork/join parallelism is chosing the threshold which determines whether a task will execute a sequential computation rather than forking parallel sub-tasks.

If the threshold is too large, then the program might not create enough tasks to fully take advantage of the available processors/cores.

If the threshold is too small, then the overhead of task creation and management could become significant.

In general, some experimentation will be necessary to find an appropriate threshold value.

In practice, fork/join frameworks use a thread pool in which a fixed number of threads are created. Each thread has a queue of tasks that are awaiting a chance to execute. When a task is started (forked), it is added to the queue of the thread that is executing its parent task.

By using a thread pool with work stealing, a fork/join framework can allow a relatively fine-grained division of the problem, but only create the minimum number of threads needed to fully exploit the available CPU cores. Typically, the thread pool will have one thread per available CPU core.

### The java.util.concurrent fork/join framework

As of version 7, Java has a high-performance fork/join framework. It is based on the fork/join framework described in Doug Lea's paper.

(We are using Java version 6, but an implementation exists in the form of jsr166y.jar.)

Task classes are subclasses of RecursiveTask<T> if they have a compute method which returns type T, or RecursiveAction if they have a compute method which does not return a value.

Here is an example of a Task for finding the maximum integer in an array of integers. Note that it closely resembles the pseudo-code described above.

```class FindMaxTask extends RecursiveTask<Integer> {

private int[] arr;
private int start, end;

public FindMaxTask(int[] arr, int start, int end) {
this.arr = arr;
this.start = start;
this.end = end;
}

@Override
protected Integer compute() {
if (end - start <= THRESHOLD) {
// number of elements is at or below the threshold - compute directly
return computeDirectly();
} else {
// number of elements is above the threshold - split into subtasks
int mid = start + (end-start) / 2;

// invoke the tasks in parallel and wait for them to complete
invokeAll(left, right);

// maximum of overall range is maximum of sub-ranges
return Math.max(left.join(), right.join());
}
}

private Integer computeDirectly() {
int max = Integer.MIN_VALUE;
for (int i = start; i < end; i++) {
if (arr[i] > max) {
max = arr[i];
}
}
return max;
}
}
```

Note that the invokeAll method may be called to execute two or more tasks in parallel (waiting for them to complete). The join method waits for a task to complete (if it has not completed yet) and returns the result of its compute method.

To do the overall computation on an entire array, we just need to create a root task and execute it using a ForkJoinPool object:

```int[] data = some array of integer values

ForkJoinPool pool = new ForkJoinPool();

System.out.println("Max is " + result);
```

### Parallel Merge Sort

As a more realistic example, let's consider using fork/join parallelism to speed up sorting. Merge sort is a good candidate for fork/join because it is a divide and conquer algorithm: at each step, the sorting of the halves of the array can be done in parallel. We'll leave the merge as a sequential step, so it may limit our parallel speedup somewhat.

How do we set the parallel threshold? One fairly obvious idea is that the threshold should be set to the size of the array divided by the number of available CPU cores. That way, if we have 4 cores, each will be assigned one quarter of the overall array to sort sequentially.

Here's the code for the merge sort task class:

```class MergeSortTask extends RecursiveAction {
private int[] arr;
private int start, end;
private int threshold;

public MergeSortTask(int[] arr, int start, int end, int threshold) {
this.arr = arr;
this.start = start;
this.end = end;
this.threshold = threshold;
}

@Override
protected void compute() {
if (end - start <= threshold) {
// sequential sort
Arrays.sort(arr, start, end);
return;
}

// Sort halves in parallel
int mid = start + (end-start) / 2;
invokeAll(
);

// sequential merge
sequentialMerge(mid);
}

private void sequentialMerge(int mid) {
... code ...
}
}
```

We've omitted the implementation of the sequentialMerge method, but it's just a straightforward sequential merge.

Important note: notice that in the sequential case, when the problem size is at or below the threshold, we are calling the Arrays.sort method, which is a very efficient sequential sorting algorithm. For the best performance, it is important to choose the most efficient sequential algorithm possible.

A complete array can be sorted as follows:

```int[] data = some array of integer values
int threshold = data.length / Runtime.getRuntime().availableProcessors();