quickpool  1.5.0
An easy-to-use, header-only work stealing thread pool in C++11
quickpool

build status Codacy Badge codecov License: MIT Documentation DOI

Fast and easy parallel computing in C++11

Why quickpool?

Developer friendly

The library consists of a single header file with permissive license. It requires only C++11 and is otherwise self-contained. Just drop quickpool.hpp in your project folder and enjoy.

User friendly API

Loops can be nested, see the examples below. All functions dispatch to a global thread pool instantiated only once with as many threads as there are cores. Optionally, one can create a local ThreadPool exposing the functions above. See also the API documentation.

Cutting edge algorithms

All scheduling uses work stealing synchronized by cache-aligned atomic operations.

The thread pool assigns each worker thread a task queue. The workers process first their own queue and then steal work from others. The algorithm is lock-free in the standard case where only a single thread pushes work to the pool.

Parallel loops assign each worker part of the loop range. When a worker completes its own range, it steals half the range of another worker. This perfectly balances the load and only requires a logarithmic number of steals (= points of contention). The algorithm uses double-wide compare-and-swap, which is lock-free on most modern processor architectures.

Examples

Thread pool

push([] { /* some work */ });
wait(); // wait for current jobs to finish
auto f = async([] { return 1 + 1; }); // get future for result
// do something else ...
auto result = f.get(); // waits until done and returns result

Both push() and async() can be called with extra arguments passed to the function.

auto work = [] (const std::string& title, int i) {
std::cout << title << ": " << i << std::endl;
};
push(work, "first title", 5);
async(work, "other title", 99);
wait();

Parallel loops

Existing sequential loops are easy to parallelize:

std::vector<double> x(10, 1);
// sequential version
for (int i = 0; i < x.size(); ++i)
x[i] *= 2;
// parallel version
parallel_for(0, x.size(), [&] (int i) { x[i] *= 2; };
// sequential version
for (auto& xx : x)
xx *= 2;
// parallel version
parallel_for_each(x, [] (double& xx) { xx *= 2; };

The loop functions automatically wait for all jobs to finish, but only when called from the main thread.

Nested parallel loops

It is possible to nest parallel for loops, provided that we don't need to wait for inner loops.

std::vector<double> x(10, 1);
// sequential version
for (int i = 0; i < x.size(); ++i) {
for (int j = 4; j < 9; j++) {
x[i] *= j;
}
}
// parallel version
parallel_for(0, x.size(), [&] (int i) {
// *important*: capture i by copy
parallel_for(4, 9, [&, i] (int j) { x[i] *= j; }); // doesn't wait
}; // does wait

Local thread pool

A ThreadPool can be set up manually, with an arbitrary number of threads. When the pool goes out of scope, all threads joined.

ThreadPool pool(2); // thread pool with two threads
pool.push([] {});
pool.async([] {});
pool.wait();
pool.parallel_for(2, 5, [&] (int i) {});
auto x = std::vector<double>{10};
pool.parallel_for_each(x, [] (double& xx) {});
quickpool::parallel_for
void parallel_for(int begin, int end, UnaryFunction &&f)
computes an index-based parallel for loop.
Definition: quickpool.hpp:1072
quickpool::parallel_for_each
void parallel_for_each(Items &items, UnaryFunction &&f)
computes a iterator-based parallel for loop.
Definition: quickpool.hpp:1089
quickpool::async
auto async(Function &&f, Args &&... args) -> std::future< decltype(f(args...))>
executes a job asynchronously the global thread pool.
Definition: quickpool.hpp:1023
quickpool::ThreadPool::parallel_for
void parallel_for(int begin, int end, UnaryFunction f)
computes an index-based parallel for loop.
Definition: quickpool.hpp:893
quickpool::ThreadPool::wait
void wait(size_t millis=0)
waits for all jobs currently running on the thread pool. Has no effect when called from threads other...
Definition: quickpool.hpp:926
quickpool::ThreadPool::parallel_for_each
void parallel_for_each(Items &items, UnaryFunction f)
computes a iterator-based parallel for loop.
Definition: quickpool.hpp:915
quickpool::wait
void wait()
waits for all jobs currently running on the global thread pool. Has no effect when not called from ma...
Definition: quickpool.hpp:1032
quickpool::ThreadPool::async
auto async(Function &&f, Args &&... args) -> std::future< decltype(f(args...))>
executes a job asynchronously on the global thread pool.
Definition: quickpool.hpp:872
quickpool::ThreadPool::push
void push(Function &&f, Args &&... args)
pushes a job to the thread pool.
Definition: quickpool.hpp:858
quickpool::push
void push(Function &&f, Args &&... args)
Free-standing functions (main API)
Definition: quickpool.hpp:1010