Add a value to task local storage if it is not already present

jw3126 · November 7, 2022, 8:57pm

I have a workload that I want to multi thread. This involves calling a function in parallel, that needs some work space for computing. The following rules apply:

There may never be two computations using the same workspace at the same time.
If one computation is done, its workspace can be reused by another computation.
New workspaces can be created at will, but this is an expensive operation.

What is the recommended way to do this? Currently, I use the following pattern, but I don’t like the try catch here:

workspace = try
       task_local_storage(:workspace)
catch err
       task_local_storage(:workspace, create_workspace())
end

jmair · November 7, 2022, 9:54pm

Could you have an array of workspaces?

workspaces = [create_workspace() for _ in 1:Threads.nthreads()]

function compute(workspaces, config)
    workspace = workspaces[Threads.threadid()]
    #... Do some compute
end

As long as you use something like Threads.@threads you shouldn’t ever overlap.

You could also use a Channel:

# setup workspaces channel
workspaces = Channel{YourWorkspaceType}(Threads.nthreads())
for i in 1:nthreads()
    put!(workspaces, create_workspace())
end
function compute(workspaces, config)
    workspace = take!(workspaces)
    #... Do some compute
    put!(workspaces, workspace)
end

jw3126 · November 8, 2022, 7:43am

Thanks a lot, these are very interesting ideas!


function compute(workspaces, config)
    workspace = workspaces[Threads.threadid()]
    #... Do some compute
end

This one makes me nervous. AFAIU The scheduler is free to launch new tasks on threads that have unfinished tasks, which means workspace clashes.

# setup workspaces channel
workspaces = Channel{YourWorkspaceType}(Threads.nthreads())
for i in 1:nthreads()
    put!(workspaces, create_workspace())
end
function compute(workspaces, config)
    workspace = take!(workspaces)
    #... Do some compute
    put!(workspaces, workspace)
end

With this, I think there are no workspace clashes, great!
This might deplete the channel if the scheduler decides to use more tasks than threads. But that may even be desirable. In contrast to my solution, this ensures no more workspaces than strictly necessary are created.

Topic		Replies	Views
How to use many (non-sticky) tasks while maximizing local storage reuse? General Usage multithreading	9	887	August 1, 2023
Question on TaskLocalValue Performance multithreading	10	328	March 4, 2024
Pattern for managing thread local storage? General Usage question , multithreading	5	2098	May 17, 2021
Multi-threaded code requiring scratch space Performance	17	675	February 14, 2024
`task_local_storage`, why and where? General Usage multithreading	3	171	February 8, 2025

Add a value to task local storage if it is not already present

Related topics