[ANN] FileTrees.jl -- easy everyday parallelism on trees of files

tkf · August 20, 2020, 6:14am

Hi, thanks for your quick intro to Dagger.jl! I’ve been wanting to look into it. I think now I can see that how it is out-of-core friendly.

BTW, I think it was a bit of exaggeration when I said “scheduler”. It was just a mechanism to hook different implementation of reduce (sequential, threaded, distributed, unordered variant of them, etc.) into the for loop syntax. So it’s not a scheduler in the sense of e.g., partr.

Just as a fair warning(?), I don’t think halve is tested outside of my packages yet. But if you don’t mind giving a shot at this, it would be fantastic!

If you have halve + iterate (or halve + __foldl__) then it should work well with Transducers.jl and all of its related packages (e.g., ThreadsX.jl, FLoops.jl, LazyGroupBy.jl, …). I’m not sure if it provides nice out-of-core facility ATM though. I just don’t have enough experience with mixing it with a bunch of I/O (and I know there are several possible improvements for this). But if you have a bit smaller scale problem, threading based reduce could be nice to have?

Topic		Replies	Views
The ultimate guide to distributed computing Julia at Scale parallel , cluster , distributed	44	9904	June 21, 2021
Storing and accessing large jagged array with julia General Usage question , data , filesystem , hep	33	4022	October 31, 2023
[ANN] Parquet2.jl Package Announcements data , parquet , tables , serialization	20	7422	May 8, 2024
Simple Parallel Examples for Embarrassingly Simple Problems Julia at Scale	29	7345	April 23, 2021
Reading and processing Data files concurrently Data parallel	18	3800	September 20, 2017

[ANN] FileTrees.jl -- easy everyday parallelism on trees of files

Related topics