I have a workload that I want to multi thread. This involves calling a function in parallel, that needs some work space for computing. The following rules apply:
There may never be two computations using the same workspace at the same time.
If one computation is done, its workspace can be reused by another computation.
New workspaces can be created at will, but this is an expensive operation.
What is the recommended way to do this? Currently, I use the following pattern, but I don’t like the try catch here:
workspace = try
task_local_storage(:workspace)
catch err
task_local_storage(:workspace, create_workspace())
end
workspaces = [create_workspace() for _ in 1:Threads.nthreads()]
function compute(workspaces, config)
workspace = workspaces[Threads.threadid()]
#... Do some compute
end
As long as you use something like Threads.@threads you shouldn’t ever overlap.
You could also use a Channel:
# setup workspaces channel
workspaces = Channel{YourWorkspaceType}(Threads.nthreads())
for i in 1:nthreads()
put!(workspaces, create_workspace())
end
function compute(workspaces, config)
workspace = take!(workspaces)
#... Do some compute
put!(workspaces, workspace)
end
function compute(workspaces, config)
workspace = workspaces[Threads.threadid()]
#... Do some compute
end
This one makes me nervous. AFAIU The scheduler is free to launch new tasks on threads that have unfinished tasks, which means workspace clashes.
# setup workspaces channel
workspaces = Channel{YourWorkspaceType}(Threads.nthreads())
for i in 1:nthreads()
put!(workspaces, create_workspace())
end
function compute(workspaces, config)
workspace = take!(workspaces)
#... Do some compute
put!(workspaces, workspace)
end
With this, I think there are no workspace clashes, great!
This might deplete the channel if the scheduler decides to use more tasks than threads. But that may even be desirable. In contrast to my solution, this ensures no more workspaces than strictly necessary are created.