Automatically fusing together several for-loops

tkf · September 6, 2020, 5:50pm

If it is 100-200 functions, I don’t think using a tuple makes sense. I’d use a tuple if there are only “handful” of elements (let’s say < 16, as that’s the heuristics Base uses; but the compiler can handle more).

Note that there is no way to express “signature” (input types and output type) in Julia’s type system. That’s why you need the ccall hack to get some decent performance.

This is why I asked if you have closures or some callable objects. That is to say, do you have 100 functions with completely different implementations? Or, are they actually some parameterized functions? If they are closures generated by the same function, their type is identical:

julia> create_adder(value) = (x) -> x + value;

julia> typeof(create_adder(1)) === typeof(create_adder(2))
true

So, you can put them in a vector without invoking run-time dispatch:

julia> callfirst(fs, x) = first(fs)(x);

julia> @code_warntype callfirst([create_adder(1), create_adder(2)], 0)
Variables
  #self#::Core.Const(callfirst, false)
  fs::Vector{var"#1#2"{Int64}}
  x::Int64

Body::Int64
1 ─ %1 = Base.getindex(fs, 1)::var"#1#2"{Int64}
│   %2 = (%1)(x)::Int64
└──      return %2

Even if not all function types are identical, I’d imagine there are only handful of function types. If that’s the case, you can use Iterators.flatten to group closures/functions by their type, e.g., Iterators.flatten(([create_adder(1), create_adder(2)], [create_adder(1im)])).

Unfortunately, Julia’s native for loop is not powerful enough to completely optimize a complex iterator like Iterators.flatten. You’d need to use Base.foldl or some external packages like FLoops.jl (ref [RFC/ANN] FLoops.jl: fast generic for loops (foldl for humans™)) to eliminate dynamic dispatches.

Yes, I think it’d be a better way to parallelize the computation. FWIW, if your graph object already defines Base.iterate (and, preferably, Base.length), you can simply add SplittablesBase.halve to support parallel computations via Transducers.jl, aforementioned FLoops.jl, ThreadsX.jl, etc.

Regarding ThreadPools.jl… I hope I don’t sound like trivializing @tro3’s hard work but I think it’s important to understand that the primary motivation for ThreadPools.jl is to “undo” the design of composable multi-threading in Julia (see Announcing composable multi-threaded parallelism in Julia). IIUC, ThreadPools.jl exists for separating out latency-critical code (executed in the primary thread) from throughput-oriented code (executed in non-primary threads). This is very useful if you are writing, e.g., GUI application but it is not desirable to use it in a library or throughput-oriented user code. ThreadPools.jl is a very clever and useful workaround for the current state of multi-threading in Julia. However, I think it’s a good idea to avoid using it if you mainly care about the “overall speed” (i.e., throughput) of your computation and composability with the rest of the ecosystem.

Topic		Replies	Views
ANN: Parallel `for` loops in FLoops.jl with composable and extensible fold-based API Package Announcements parallel , multithreading , distributed , loops	17	3425	August 25, 2020
Manage tasks from Threads.@spawn without using map() General Usage parallel , multithreading , threads	16	250	August 27, 2024
More threads, slower code, even if not spawning them Performance	19	836	January 29, 2022
Learning (a)synchronous parallelization on a simple graph problem Performance performance , parallel , distributed , concurrency	8	511	March 5, 2023
Speeding up a function Performance performance	43	966	August 14, 2023

Automatically fusing together several for-loops

Related topics