Hi, I just wrote A quick introduction to data parallelism in Julia!
For a quick flavor of the tutorial, here is the table of contents:
- Getting julia and libraries
- Starting julia
- Starting julia with multiple worker processes
- Mapping
- Practical example: Stopping time of Collatz function
- Iterator comprehensions
- Pre-defined reductions
- Practical example: Maximum stopping time of Collatz function
- OnlineStats.jl
- Manual reductions
- Parallel findmin/findmax with
@reduce() do
- Parallel findmin/findmax with
ThreadsX.reduce
(tedious!)- Histogram with reduce
- Practical example: Histogram of stopping time of Collatz function
- Quick notes on
@threads
and@distributed
- Next steps
Although Julia since the release of 1.3 has been a wonderful playground for parallel computing, there is no easy-to-access entry-level resource for data-parallel programming. As a result, Q&A in discourse often focuses on non-composable @threads
or low-level @spawn
and sometimes with some sub-optimal/questionable coding patterns (please no locks/atomics for sum!). I’ve been building up tooling for data parallelism in JuliaFolds but now it’s a bit scattered across a few packages and hard to get a big picture of it. So, I’m hoping that a quick tutorial is helpful for this.
If you have a question, feedback, request for new topics, or any comment, please feel free to post it here or suggest a change on GitHub or open an issue!