I am fairly sure the OOM problem you observe can be avoid by changing the _reduce
function to not spawn tasks recursively, but instead chunk the work first and then spawn tasks all at once (like I did in my previous example). Earlier you said you are willing to try this. Did it help anything?
I tested this again with the latest MWE you provided. Spawning recursively puts me at around ~30Gs when I run with 6 or 12 threads. Using the map
approach from my other example above I only use ~11Gs which agrees with what the debug message predicts. And from the bit of testing I did I also conclude that the non-recursive version finishes much faster since it spawns all tasks at once and they can immediately do their work. Whereas in the recursive version task spawning is delayed quite a lot (tested by inserting a println
call before @spawn
).
None of this answers what causes your OOM crashes and the excessive memory use I see when running the recursive version. And whether it is a GC bug or a problem with @spawn
. Or if there is some shared state/data race that we are all not seeing. (Regarding the latter there can be weird bugs with shared state due to Julia’s scoping rules, cf. Inconsistent results when using Threads.@threads in a loop - #2 by fatteneder).
EDIT: The previous answer was a reply to a wrong post and now I have to insert some more text to make the ‘similar reply’ check of discourse go away …