Recommended way to do work stealing in Julia 1.0?

question
performance

#1

I have a Python system which is a mixture of C++ code, pandas/numpy code running on top of BLAS/MKL, and plain Python code binding the whole thing together. “It ain’t a pretty sight”.

It can be viewed as a nested tree (actually DAG) of tasks, with some tasks invoking several dozen sub-tasks, with a total of hundreds of tasks, which may grow to thousands as the project develops.

I got this to work with Python multi-processing by picking a specific level in the tasks tree to distribute across multiple CPUs, but this is sub-optimal as there is variance between the execution times of the sub-trees. Also getting shared-memory Pandas data frames is possible, but very fragile.

It is obvious I’m pushing the platform beyond what it is intended for… So I am looking for alternatives. I have been tracking Julia for a while and now it as just released v1.0 it sounded like it might be a practical choice. The libraries I make use of mostly do basic linear algebra stuff, nothing fancy, so porting shouldn’t be that hard. Using a single language instead of having to dance between my python code, pandas/numpy, and custom C++ extensions sounds really tempting.

What I need is almost exactly is described in https://www.youtube.com/watch?v=YdiZa0Y3F3c - however, looking at https://github.com/JuliaLang/julia/pull/22631 I see this didn’t make it to v1.0. However, reading through https://docs.julialang.org/en/v1/manual/parallel-computing/index.html I see that the low-level building blocks seem to exist (I’m not interested in async I/O issues - my project is almost pure computations).

So: What is the current Julia v1.0 best practice for writing a system that tries to compute a DAG of compute tasks across multiple threads, using shared memory between the tasks for optimal performance, with depth-first work stealing?

In particular, should I:

  • Stick with my Python system (which mostly works) until PR22631 is merged into a Julia point release. This would be reasonable for my project if it happens in the next, say, 1 or 2 quarters. If it wouldn’t be merged within, say, a year, then just waiting for it wouldn’t make sense.

  • Roll my own system? I understand this was done in https://juliaimages.github.io/latest/imagefiltering.html for example. I only expect hundreds or thousands of not-too-fast tasks so a naive thread-safe priority queue might not be a bottleneck for me. And there are probably other relaxations I can take advantage of with a tailored system. But if Julia will provide a built-in solution “soon”, this work would be a pure waste.

  • Use some package that provides a similar functionality? Come to think of it, it isn’t clear to me why PR22631 is a PR instead of a package building on top of the mechanisms already provided by Julia v1.0. This is not to say I don’t appreciate such a generic, high-performance production-quality package would be far from trivial. It is certainly beyond the scope of what I can afford creating within my own project.

    [Edit] The reason this is a PR is because a ton of basic Julia functionality isn’t thread-safe. A PR is therefore needed to make things work, regardless of the scheduler issue. This makes me appreciate the immense task the PR needs to tackle; multi-threading is very basic and adding it to a language at a late design stage is extremely difficult.

    Given the current lack of type safety of basic constructs in Julia v1.0, I wouldn’t use it to write multi-threaded code for anything other than exploratory purposes until the PR is merged.

    Even when the PR is merged, Julia lacks any mechanisms for thread-safety in user code or libraries. In particular there is no language assistance in avoiding data races. There’s a good reason languages such as rust (and maybe pony) resorted to enforcing thread-safety at the language level. Writing multi-threaded code in languages that lack such safety features (such as C++) is notoriously difficult even for experts - and most scientific code is not written by software development experts. However I doubt such language extensions are practical in Julia v1.x.

  • Some other option?


Crash using Channels/Conditions with @threads
#2

Maybe not exactly what you are looking for (as it uses distributed memory) but there is a Julia package to run computations represented as directed acyclic graphs efficiently: https://github.com/JuliaParallel/Dagger.jl


#3

Thanks, Dagger looks nice! I might use it for distributing coarse-grained work across a cluster. The lack of just-shared-memory restricts its applicability for me, though. I have large-ish data sets (100s of MBs) which don’t have a natural partition into shards, so are much better suited for a shared rather than a distributed memory implementation. And I have plenty of machines with >50 threads to play with…


#4

It would be nice if there were something like Apache-airflow or Nextflow in Julia. It looks like JuliaRun is probably comparable, but it’s hard to justify paying for something like that when there are these other free alternatives.


#5

JuliaRun seems to be an overkill for what I’m looking for… But it is an interesting option if I want to scale things to a cluster setting - and if they have some sort of free licenses for academic projects.


#6

PR22631 (aka partr) will almost certainly be merged before the end of the year. However it will not be the default task runtime for some time after that. There is still a good bit of work left to do before parallel tasks are a well tested and stable thing, and broadly available in a binary release. Your use case would be helpful to that end, if you can share the code together with validation logic and are willing to help us debug. :slight_smile:

Edit: whoops! I didn’t notice the date on this one.