|
quickpool
1.8.0
An easy-to-use, header-only work stealing thread pool in C++11
|
Fast and easy parallel computing in C++
The library consists of a single header file with permissive license. It requires only C++11 and is otherwise self-contained. Just drop quickpool.hpp in your project folder and enjoy.
push(f, args...) schedules a task running f(args...) with no return, async(f, args...) schedules a task running f(args...) and returns an std::future,wait() waits for all scheduled tasks to finish,parallel_for(b, e, f) runs f(i) for all b <= i < e,parallel_for_each(x, f) runs f(*it) for all iterators std::begin(x) <= it < std::end(x).Loops can be nested, see the examples below. All functions dispatch to a global thread pool instantiated only once with as many threads as there are cores. Optionally, one can create a local ThreadPool exposing the functions above. See also the API documentation.
All scheduling uses work stealing synchronized by cache-aligned atomic operations.
The thread pool assigns each worker thread a task queue. The workers process first their own queue and then steal work from others. The algorithm is lock-free in the standard case where only a single thread pushes work to the pool.
Parallel loops assign each worker part of the loop range. When a worker completes its own range, it steals half the range of another worker. This perfectly balances the load and only requires a logarithmic number of steals (= points of contention). The algorithm uses double-wide compare-and-swap, which is lock-free on most modern processor architectures.
Both push() and async() can be called with extra arguments passed to the function.
Existing sequential loops are easy to parallelize:
The loop functions automatically wait for all jobs to finish, but only when called from the main thread.
It is possible to nest parallel for loops, provided that we don't need to wait for inner loops.
A ThreadPool can be set up manually, with an arbitrary number of threads. When the pool goes out of scope, all threads joined.
Unit tests are enabled by default when configuring the project:
The test suite exercises task submission, async(), parallel_for(), parallel_for_each(), nested loops, exception paths, queue growth, and concurrent task producers.
Optional sanitizer builds are available through CMake options:
For ThreadSanitizer, use -DQUICKPOOL_TEST_THREAD_SANITIZE=ON instead. The regular sanitizer and ThreadSanitizer options cannot be enabled at the same time.
The benchmark suite is optional and has no external dependencies:
For a quick smoke run, use:
The output is comma-separated and covers task submission, parallel_for(), nested loops, uneven loop bodies, and parallel_for_each() on std::vector and std::list.