I have a Python system which is a mixture of C++ code, pandas/numpy code running on top of BLAS/MKL, and plain Python code binding the whole thing together. “It ain’t a pretty sight”.
It can be viewed as a nested tree (actually DAG) of tasks, with some tasks invoking several dozen sub-tasks, with a total of hundreds of tasks, which may grow to thousands as the project develops.
I got this to work with Python multi-processing by picking a specific level in the tasks tree to distribute across multiple CPUs, but this is sub-optimal as there is variance between the execution times of the sub-trees. Also getting shared-memory Pandas data frames is possible, but very fragile.
It is obvious I’m pushing the platform beyond what it is intended for… So I am looking for alternatives. I have been tracking Julia for a while and now it as just released v1.0 it sounded like it might be a practical choice. The libraries I make use of mostly do basic linear algebra stuff, nothing fancy, so porting shouldn’t be that hard. Using a single language instead of having to dance between my python code, pandas/numpy, and custom C++ extensions sounds really tempting.
What I need is almost exactly is described in https://www.youtube.com/watch?v=YdiZa0Y3F3c - however, looking at https://github.com/JuliaLang/julia/pull/22631 I see this didn’t make it to v1.0. However, reading through https://docs.julialang.org/en/v1/manual/parallel-computing/index.html I see that the low-level building blocks seem to exist (I’m not interested in async I/O issues - my project is almost pure computations).
So: What is the current Julia v1.0 best practice for writing a system that tries to compute a DAG of compute tasks across multiple threads, using shared memory between the tasks for optimal performance, with depth-first work stealing?
In particular, should I:
Stick with my Python system (which mostly works) until PR22631 is merged into a Julia point release. This would be reasonable for my project if it happens in the next, say, 1 or 2 quarters. If it wouldn’t be merged within, say, a year, then just waiting for it wouldn’t make sense.
Roll my own system? I understand this was done in https://juliaimages.github.io/latest/imagefiltering.html for example. I only expect hundreds or thousands of not-too-fast tasks so a naive thread-safe priority queue might not be a bottleneck for me. And there are probably other relaxations I can take advantage of with a tailored system. But if Julia will provide a built-in solution “soon”, this work would be a pure waste.
Use some package that provides a similar functionality? Come to think of it, it isn’t clear to me why PR22631 is a PR instead of a package building on top of the mechanisms already provided by Julia v1.0. This is not to say I don’t appreciate such a generic, high-performance production-quality package would be far from trivial. It is certainly beyond the scope of what I can afford creating within my own project.
[Edit] The reason this is a PR is because a ton of basic Julia functionality isn’t thread-safe. A PR is therefore needed to make things work, regardless of the scheduler issue. This makes me appreciate the immense task the PR needs to tackle; multi-threading is very basic and adding it to a language at a late design stage is extremely difficult.
Given the current lack of type safety of basic constructs in Julia v1.0, I wouldn’t use it to write multi-threaded code for anything other than exploratory purposes until the PR is merged.
Even when the PR is merged, Julia lacks any mechanisms for thread-safety in user code or libraries. In particular there is no language assistance in avoiding data races. There’s a good reason languages such as rust (and maybe pony) resorted to enforcing thread-safety at the language level. Writing multi-threaded code in languages that lack such safety features (such as C++) is notoriously difficult even for experts - and most scientific code is not written by software development experts. However I doubt such language extensions are practical in Julia v1.x.
Some other option?