ANN: Data parallelism tutorial

tkf · October 6, 2020, 2:04am

Hi, I just wrote A quick introduction to data parallelism in Julia!

For a quick flavor of the tutorial, here is the table of contents:

Getting julia and libraries

Starting julia

Starting julia with multiple worker processes

Mapping

Practical example: Stopping time of Collatz function

Iterator comprehensions

Pre-defined reductions

Practical example: Maximum stopping time of Collatz function

OnlineStats.jl

Manual reductions

Parallel findmin/findmax with @reduce() do

Parallel findmin/findmax with ThreadsX.reduce (tedious!)

Histogram with reduce

Practical example: Histogram of stopping time of Collatz function

Quick notes on @threads and @distributed

Next steps

Although Julia since the release of 1.3 has been a wonderful playground for parallel computing, there is no easy-to-access entry-level resource for data-parallel programming. As a result, Q&A in discourse often focuses on non-composable @threads or low-level @spawn and sometimes with some sub-optimal/questionable coding patterns (please no locks/atomics for sum!). I’ve been building up tooling for data parallelism in JuliaFolds but now it’s a bit scattered across a few packages and hard to get a big picture of it. So, I’m hoping that a quick tutorial is helpful for this.

If you have a question, feedback, request for new topics, or any comment, please feel free to post it here or suggest a change on GitHub or open an issue!

johnh · October 6, 2020, 7:05am

A READMEfile on the top level Juliafolds page would be useful.
Not being rude but if I land on a page like that and here is not a clear description of what I can find in the repositories I will rapidly leave. Sorry - really not being rude.

tkf · October 6, 2020, 7:29am

That’s a good suggestion! Thanks! I don’t know how to use README for organization profile page so I just tweaked the pinned packages and added a quick description.

(From a quick googling, I can find how to use README for usesr profile but it looks like it’s not usable for organization as of Feb 2020.)

ImreSamu · October 6, 2020, 7:41am

Thank you !

I also like this table Libraries for parallelism in Julia

request

Would it be possible to add more visualisation ?

imho:

it can be helpful to understand new paradigm ideas or concepts

My favorite example:
Golang Visualization examples (divan.dev)

tkf · October 6, 2020, 8:19am

Haha, thanks. That’s me playing with Franklin.jl as it’s the first time I use it. It’s a fantastic tool for something like this.

Thanks for the link. Wow, there are tons of cool visuals. I’ll watch the GopherCon talk!

By visualization, do you mean some kind of task or call tree? For example, parallel collect tutorial in Transducers.jl has this low-tech diagram:

          [1,2,3,4] <------------- append!!([1,2], [3,4]) == [1,2,3,4]
         /         \
    [1,2]           [3,4] <------- append!!([3], [4]) == [3, 4]
   /     \         /     \
 [1]     [2]     [3]     [4] <---- append!!([], [4]) == [4]
 / \     / \     / \     / \
[] [1]  [] [2]  [] [3]  [] [4]

Are you thinking something like this?

I think it’s definitely a good idea to have a low-level explanation like this in a different more in-depth tutorial. But I am a bit torn for mentioning something like this the introduction-level tutorial. Ideally, a casual user does not have to care about a low-level execution strategy. But I don’t think the current version is perfect in this respect since I talk about implementation a bit for explaining how @floop works.

Also, I think it’s important to note that concurrency is not parallelism. In a way, concurrency is all about controlling how things are executed. So, it is important to know about how tasks interact and having a mental model via visualization is very useful. On the other hand, (high-level) data parallelism is all about not thinking about how things are executed and let the libraries (and possibly the compilers) find an adequate execution strategy.

Obviously, there are multiple ways to start understanding data parallelism, and starting from a very low-level description could be a good option for some people. Probably the best way is to provide a few different tutorials for people with different tastes.

PetrKryslUCSD · October 6, 2020, 4:56pm

This was precisely my point in https://discourse.julialang.org/t/the-juliadebug-repo-how-to-be-helpful-to-newbies-and-old-hands-as-well/47618: I really don’t see a way of posting a general information splash at the top of the organization landing page. That is a very unfortunate omission which makes organizations on github much less useful than they could be.

This was the reason I conceived of the JuliaPDE organization as consisting of a single repo, the survey of the PDE-development landscape. This makes the only repo act as a kind of guide (readme).

ImreSamu · October 6, 2020, 8:49pm

By visualization, do you mean some kind of task or call tree?

nothing concrete …
Something : Eye Candy + Useful
So the first impression should be : it is interesting + Cool + I want to learn …

So my original comment … just a trigger for brainstorming …

tkf · October 6, 2020, 9:00pm

Yeah, I totally get it! That’s kinda why I put the plots about Collatz conjecture even though it’s nothing to do with parallelism

(OK, to be honest, the initial plan was to discuss how to use basesize for load-balancing parallel reduction. Since I knew the stopping time ≈ compute time is unpredictable and variable, I thought it might be interesting to show the effect of basesize on the run-time performance. It turned out the stopping time does not have “fat enough” tail to let me discuss meaningful load-balancing.)

etorkia · October 8, 2020, 6:33pm

thanks for sharing…the tutorial looks awesome.

Topic		Replies	Views
Looking for advice on threading General Usage	6	1622	January 15, 2020
Notes on lock-free programming Teaching & Outreach	6	1179	May 27, 2021
Notes on multithreading with Julia Teaching & Outreach parallel , multithreading	5	1285	June 29, 2020
Some introductory Julia codes for HPC! Julia at Scale tutorials	0	146	October 16, 2024
Simple Parallel Examples for Embarrassingly Simple Problems Julia at Scale	29	7324	April 23, 2021

ANN: Data parallelism tutorial

Related topics