Sharp edge with `Threads.threadid()` and task migration

Ah, sorry, I must have misunderstood. I’ve only used Threads@threads for loops where each iteration is an independent, fairly simple calculation. Thanks for the clarification!

2 Likes

There are also the perprocess / perthread / pertask constructs that were introduced recently and help with many of the issues mentioned here:

Apologies or posting 20 days after the discussion was concluded. Just wanted to flag something that was not mentioned in the thread but seems pertinent for readers in the future…

6 Likes

I finally got some free time to play around with using task_local_storage() for the context retrieval. I was concerned that the task_local_storage() would introduce a significant overhead for fast operations, so I made a benchmark before committing to the change. I think this would be interesting for those who may still be hesitant to port their code from threadid() approach with a cache.

The old approach for retrieving context used threadid() to select elements from a globally allocated cache:

function get_ctx()
    return THREAD_CTXS[].ctx
end	

where threadid() is implicitly hidden behind getindex. I compared it with task_local_storage() approach:

function get_ctx()
    ctx = get!(OpenSSLContext, task_local_storage(), :ctx)::OpenSSLContext
    return ctx.ctx
end

The added assertion ::OpenSSLContext was essential as without it there was about 2x slowdown.

For the benchmark, I used the fastest operation in the library of comparing the equality of two points:

# Parallel comparison
result = Vector{Bool}(undef, length(list))
@btime begin
    @threads for i in eachindex($list)
        $result[i] = $list[i] == $gn
    end
end

for which I used a list with 10^6 elements. By varying the number of threads I got the following results:

NThreads ThreadID (ms) LocalStorage (ms)
10 40.6 41.5
100 38.3 41.4
400 40.3 45.5

There is a slight overhead with task_local_storage() approach as can be seen int the table but in practice it is negligible. This convinced me that even for very fast operations in the range of nanoseconds (in this case 40ns) one should use task local storage.

3 Likes

Is there a way to fix Threads so that it can have a new function such as

Threads.uniqueandstickythreadid()

That would solve all our problems.

Or maybe I am wishing for too much.

It’s the other way around: Before Julia 1.7, all tasks were sticky, but this is a bit broken because sticky tasks don’t compose well, so task migration was implemented as the fix. Still, if you really want sticky tasks, here’s how:

Threads.@threads :static for i in 1:N
    # stuff
end

You’ll get an error if you try to do this from within another task or @threads loop.

1 Like