Propagation of available/assigned worker-IDs in hierarchical computations?

oschulz · May 3, 2020, 1:06pm

I’ve been thinking about ways to propagate information about available workers (and possibly other resources) in scenarios with nested computations. Say we have 10000 workers available and want to run a high-level distributed algorithm which will scale up to 100 workers. Each parallel part of the high-level algorithm will in turn use a lower-level parallel algorithm internally that can also scale up to 100 workers. So all 10000 workers could be utilized - the question is, how will each instance of the low-level algorithm know which workers it’s allowed to use, when coding this in a modular fashion: There could be different low-level algorithms to choose from, and there might also be methods that are used stand-alone in other situations - we don’t want to code it as a monolithic thing.

On the thread-level, the partr scheduler has pretty much solved these problem now (since Julia v1.3). To my knowledge, we currently don’t have such a scheduler for (possibly distributed) worker processes. I was thinking about some kind of simple solution we could use until we have a fancy scheduler for workers, like we have for threads now.

Maybe task_local_storage could be used to propagate information about available/assigned resources (mainly workers) in a hierarchy of tasks? We’d need a way to pass it along when spawning new local/remote tasks, of course.

If so, could we come up with a community standard on which key names/values to use in task_local_storage, so that we can pass resource availability information through different parallel computing packages (e.g. Transducers.jl, CC @tkf) in a hierarchical computation?

tkf · May 3, 2020, 3:14pm

It’d be awesome if you can come up with partr-like abstraction for distributed computation! Naively thinking, some ideas behind partr like worker-local queues (work stealing) can work in distributed setting? I’m too noob to guess anything about it ATM, though.

IIUC, this doesn’t work because task_local_storage is not propagated to sub-tasks:

julia> @sync @async begin
           task_local_storage(:mykey, "hello")
           @show task_local_storage(:mykey)
           @async try
               @show task_local_storage(:mykey)
           catch err
               @show err
           end
       end;
task_local_storage(:mykey) = "hello"
err = KeyError(:mykey)

I think we need something like task_local_storage but that are inherited by sub-tasks do this (ref Context Variables in Python). It’s also useful for defining something like logging system and reproducible parallel PRNG in “user space.”

oschulz · May 3, 2020, 4:20pm

Yes, that’s the tricky part … one could use custom spawn/etc. constructs that do this, but I guess long term would require a change in Julia … that would make it an official standard, then.

It’d be awesome if you can come up with partr-like abstraction for distributed computation

I asked about that at JuliaCon 2019 - I think @jeff.bezanson said that something may be in the works, long term, but I don’t know any details.

tkf · May 3, 2020, 4:26pm

Oops, sorry, you are already aware of the problem with task_local_storage. Yeah, custom spawn sounds like the way to go ATM. If you want a “big gun” to solve this I guess you can also use something like IRTools.jl/Cassette.jl.

anon92994695 · May 3, 2020, 5:00pm

Soooo I tried kicking off a project to do pretty much this… I ended up quitting on it because I didn’t have the time to follow it through. It’s basically a scheduling problem. Doubtful but maybe you can see how I tried doing this sorta thing and glean some insights or something, maybe not though

oschulz · May 3, 2020, 5:30pm

Uhm, yes … let’s hope we can avoid to go for the heavy artillery.

tkf · May 5, 2020, 6:39pm

Actually, it wasn’t super hard to use IRTools to implement context variables

github.com

tkf/ContextVariablesX.jl/blob/623f83c40511eeb4a9c7446913d92c783f75bcf5/test/runtests.jl#L5-L17


      
          ok = Ref(0)
          @sync @async begin
              with_variables() do
                  context_storage(:mykey, "hello")
                  @test context_storage(:mykey) == "hello"
                  ok[] += 1
                  @async begin
                      @test context_storage(:mykey) == "hello"
                      ok[] += 1
                  end
              end
          end
          @test ok[] == 2

Current API/implementation is horrible as it copies Dict all the time. I think I need to look at PEP 567 for some ideas. Maybe using HAMT like they do.

oschulz · May 5, 2020, 6:57pm

I guess using Cassette could make propagation of context information seamless - but I’m worried about a high compile-time cost when using this on a largish code base.

tkf · May 5, 2020, 7:07pm

Yeah, that’s true.

oschulz · May 5, 2020, 7:25pm

Can a task get it’s parent task? Maybe we can look up worker assignments recursively, up the task hierarchy, without propagating explicitly (except when they change)?

tkf · May 5, 2020, 7:37pm

Parent task can be in any thread. So accessing its task_local_storage is a data race.

johnh · May 5, 2020, 7:49pm

Talking about schedulers, this new project called Flux may help
Cpuld this be what you are looking for?

In Flux, each job is a complete instance of the framework, meaning the individual task can support parallel tools, monitoring, and even launch sub-jobs that are, like fractals, smaller images of the parent job.

Flux | Computing

oschulz · May 5, 2020, 7:55pm

Oh, that I had planned to handle by keeping a global dict with task-ids and resources, with a lock to prevent race conditions. But how do I get the parent of a task?

tkf · May 5, 2020, 8:13pm

Ah, OK. I didn’t realize that. Yeah it sounds like it’d work if we had access to the parent task. But, no, I don’t think there is a way to access parent task. I don’t think it’s a good idea in general as you’d want to GC the task as soon as it finishes.

Also, why a global lock? Propagating immutable persistent data structure seems to be a much better option for this to me (e.g., for reducing the contention).

oschulz · May 5, 2020, 8:16pm

Let’s see if building task-context propagation into Julia itself could be acceptable: Propagating context information to child-tasks and remote calls? · Issue #35757 · JuliaLang/julia · GitHub

tkf · May 5, 2020, 8:18pm

Thanks for opening an issue!

oschulz · May 5, 2020, 8:18pm

Indeed - but I was thinking that there might need to be a central place to look up resource allocation, but mutable since tasks pop up and close all the time. The information would be immutable for each task, of course. In any case, I assume quite a few locks are locked and unlocked somewhere every time a task is spawned … but it was just a rough idea, not a design concept.

oschulz · May 5, 2020, 8:20pm

Thanks!

I guess the response to that would determine a bit how to proceed (temporary workaround vs. long-term independent solution).

tkf · May 5, 2020, 8:50pm

Is Flux (very confusing name in Julia world ) something like Slurm but with more programmable API?

johnh · May 6, 2020, 9:31am

I believe so. Flux is quite new. It is intended to implement more fine grained control of how jobs are run.
So yes I think it will fit in very well with the use case here.

Also if you use Slurm at the moment, Flux will run ‘on top’ of Slurm.

Topic		Replies	Views
Can i have SharedArrays on several different machines? Hybrid parallel model General Usage question	3	594	December 2, 2016
Adding data to worker processes via @everywhere General Usage	12	1241	January 11, 2019
Decide and queue when core is available? Julia at Scale	5	868	March 25, 2018
Can I distribute several tasks over several workers? New to Julia question	4	390	July 14, 2021
Macro for parallel out-of-order execution? General Usage	7	826	May 5, 2020

Propagation of available/assigned worker-IDs in hierarchical computations?

Related topics