Hi everyone! I am trying to update an old project of mine where I have three nested for loops as such:
scratch = zeros(Float32, tree_nodes, nthreads())
collect_here = zeros(Float32, nthreads())
for i in eachindex(V)
@threads for j in eachindex(M)
@views collect_here[threadid()] = tree_traversal!(scratch[:, threadid()], tree)
end
result = do_stuff(collect_here)
...
end
Where tree_traversal! does considerable work on a binary tree and needs a relatively large amount of scratch memory. In this old setting I was able to just allocate the memory in one go beforehand and I could avoid relying on locking mechanisms. After moving to OhMyThreads and setting up my scratch memory in @local, I am now spending approximately 20% of my time in the task allocating this memory (from @btime), which is a penalty I cannot really afford…
I am aware of the option of using the :static scheduler, but would like to avoid this.
using OhMyThreads
for i in eachindex(V)
collect_here = @tasks for j in eachindex(M)
@set collect = true
@local scratch = zeros(Float32, tree_nodes)
tree_traversal!(scratch, tree)
end
result = do_stuff(collect_here)
...
end
Is there any sort of well-supported way of setting up some sort of pool for scratch memory, where this allocated scratch memory can be reused between tasks in different iterations of the outer loop? Or am I stuck having to allocate this memory in every such iteration.