Fork/Join framework

What’s the fork/join framework

The fork/join framework was designed to recursively split a parallelizable task into smaller tasks and then combine the results of each subtask to produce the overall result.

Basic Components

RecursiveTask

As start section, a ForkJoinPool contain worker threads used to run subtasks. To submit tasks to this pool, you have to create a subclass of RecursiveTask<V> where V is the type of the result produced by the parallelized task (and each of its subtasks) or of RecursiveAction if the task returns no result (it could be updating other nonlocal structures, though).

UML of RecursiveTask

To define RecursiveTasks you need only implement its single abstract method, compute:

protected abstract R compute();

This method defines both the logic of splitting the task at hand into subtasks and the algorithm to produce the result of a single subtask when it’s no longer possible or convenient to further divide it. For this reason an implementation of this method often resembles the following pseudocode:

if (task is small enough or no longer divisible) {
    compute task sequentially
} else {
    split task in two subtasks
    call this method recursively possibly further splitting each subtask
    wait for the completion of all subtasks
    combine the results of each subtask
}

Divide and conquer algorithms

A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

divide and conquer algorithm

work-stealing algorithm

The work-stealing algorithm is used to redistribute and balance the tasks among the worker threads in the pool.

work-stealing algorithm

Best practices for using the fork/join framework

  • Invoking the join method on a task blocks the caller until the result produced by that task is ready. For this reason, it’s necessary to call it after the computation of both subtasks has been started. Otherwise, you’ll end up with a slower and more complex version of your original sequential algorithm because every subtask will have to wait for the other one to complete before starting.
  • The invoke method of a ForkJoinPool shouldn’t be used from within a RecursiveTask . Instead, you should always call the methods compute or fork directly; only sequential code should use invoke to begin parallel computation.