Propagation of available/assigned worker-IDs in hierarchical computations?

I’ve been thinking about ways to propagate information about available workers (and possibly other resources) in scenarios with nested computations. Say we have 10000 workers available and want to run a high-level distributed algorithm which will scale up to 100 workers. Each parallel part of the high-level algorithm will in turn use a lower-level parallel algorithm internally that can also scale up to 100 workers. So all 10000 workers could be utilized - the question is, how will each instance of the low-level algorithm know which workers it’s allowed to use, when coding this in a modular fashion: There could be different low-level algorithms to choose from, and there might also be methods that are used stand-alone in other situations - we don’t want to code it as a monolithic thing.

On the thread-level, the partr scheduler has pretty much solved these problem now (since Julia v1.3). To my knowledge, we currently don’t have such a scheduler for (possibly distributed) worker processes. I was thinking about some kind of simple solution we could use until we have a fancy scheduler for workers, like we have for threads now.

Maybe task_local_storage could be used to propagate information about available/assigned resources (mainly workers) in a hierarchy of tasks? We’d need a way to pass it along when spawning new local/remote tasks, of course.

If so, could we come up with a community standard on which key names/values to use in task_local_storage, so that we can pass resource availability information through different parallel computing packages (e.g. Transducers.jl, CC @tkf) in a hierarchical computation?

1 Like

It’d be awesome if you can come up with partr-like abstraction for distributed computation! Naively thinking, some ideas behind partr like worker-local queues (work stealing) can work in distributed setting? I’m too noob to guess anything about it ATM, though.

IIUC, this doesn’t work because task_local_storage is not propagated to sub-tasks:

julia> @sync @async begin
           task_local_storage(:mykey, "hello")
           @show task_local_storage(:mykey)
           @async try
               @show task_local_storage(:mykey)
           catch err
               @show err
           end
       end;
task_local_storage(:mykey) = "hello"
err = KeyError(:mykey)

I think we need something like task_local_storage but that are inherited by sub-tasks do this (ref Context Variables in Python). It’s also useful for defining something like logging system and reproducible parallel PRNG in “user space.”

Yes, that’s the tricky part … one could use custom spawn/etc. constructs that do this, but I guess long term would require a change in Julia … that would make it an official standard, then. :slight_smile:

It’d be awesome if you can come up with partr-like abstraction for distributed computation

I asked about that at JuliaCon 2019 - I think @jeff.bezanson said that something may be in the works, long term, but I don’t know any details.

Oops, sorry, you are already aware of the problem with task_local_storage. Yeah, custom spawn sounds like the way to go ATM. If you want a “big gun” to solve this I guess you can also use something like IRTools.jl/Cassette.jl.

Soooo I tried kicking off a project to do pretty much this… I ended up quitting on it because I didn’t have the time to follow it through. It’s basically a scheduling problem. Doubtful but maybe you can see how I tried doing this sorta thing and glean some insights or something, maybe not though

1 Like

Uhm, yes … let’s hope we can avoid to go for the heavy artillery. :slight_smile:

1 Like

Actually, it wasn’t super hard to use IRTools to implement context variables

Current API/implementation is horrible as it copies Dict all the time. I think I need to look at PEP 567 for some ideas. Maybe using HAMT like they do.

I guess using Cassette could make propagation of context information seamless - but I’m worried about a high compile-time cost when using this on a largish code base.

Yeah, that’s true.

Can a task get it’s parent task? Maybe we can look up worker assignments recursively, up the task hierarchy, without propagating explicitly (except when they change)?

Parent task can be in any thread. So accessing its task_local_storage is a data race.

Talking about schedulers, this new project called Flux may help :slight_smile:
Cpuld this be what you are looking for?

In Flux, each job is a complete instance of the framework, meaning the individual task can support parallel tools, monitoring, and even launch sub-jobs that are, like fractals, smaller images of the parent job.

https://computing.llnl.gov/projects/flux-building-framework-resource-management

Oh, that I had planned to handle by keeping a global dict with task-ids and resources, with a lock to prevent race conditions. But how do I get the parent of a task?

Ah, OK. I didn’t realize that. Yeah it sounds like it’d work if we had access to the parent task. But, no, I don’t think there is a way to access parent task. I don’t think it’s a good idea in general as you’d want to GC the task as soon as it finishes.

Also, why a global lock? Propagating immutable persistent data structure seems to be a much better option for this to me (e.g., for reducing the contention).

Let’s see if building task-context propagation into Julia itself could be acceptable: https://github.com/JuliaLang/julia/issues/35757

1 Like

Thanks for opening an issue!

Indeed - but I was thinking that there might need to be a central place to look up resource allocation, but mutable since tasks pop up and close all the time. The information would be immutable for each task, of course. In any case, I assume quite a few locks are locked and unlocked somewhere every time a task is spawned … but it was just a rough idea, not a design concept. :slight_smile:

Thanks!

I guess the response to that would determine a bit how to proceed (temporary workaround vs. long-term independent solution).

Is Flux (very confusing name in Julia world :slight_smile: ) something like Slurm but with more programmable API?

I believe so. Flux is quite new. It is intended to implement more fine grained control of how jobs are run.
So yes I think it will fit in very well with the use case here.

Also if you use Slurm at the moment, Flux will run ‘on top’ of Slurm.

1 Like