[ANN] FileTrees.jl -- easy everyday parallelism on trees of files

Hi, thanks for your quick intro to Dagger.jl! I’ve been wanting to look into it. I think now I can see that how it is out-of-core friendly.

BTW, I think it was a bit of exaggeration when I said “scheduler”. It was just a mechanism to hook different implementation of reduce (sequential, threaded, distributed, unordered variant of them, etc.) into the for loop syntax. So it’s not a scheduler in the sense of e.g., partr.

Just as a fair warning(?), I don’t think halve is tested outside of my packages yet. But if you don’t mind giving a shot at this, it would be fantastic!

If you have halve + iterate (or halve + __foldl__) then it should work well with Transducers.jl and all of its related packages (e.g., ThreadsX.jl, FLoops.jl, LazyGroupBy.jl, …). I’m not sure if it provides nice out-of-core facility ATM though. I just don’t have enough experience with mixing it with a bunch of I/O (and I know there are several possible improvements for this). But if you have a bit smaller scale problem, threading based reduce could be nice to have?

2 Likes