No more 1st class support for Threads? [thread-local storage]

ufechner7 · August 11, 2025, 7:08pm

I am implementing multi-threaded scientific software, an so far it works quite well.

But now I am told that thread-local storage is no longer supported by Julia. Well, it still works, even under Julia 1.12rc1, but i am told not to use it.

What is the suggested alternative?

Shell I replace @threads with @spawn ?

I don’t think I can put @spawn in front of a for loop.

So how can I use task local storage in a for loop? Or shall I not use for loops any longer?

Very confused.

goerz · August 11, 2025, 7:20pm

Can you elaborate on that?

giordano · August 11, 2025, 7:22pm

This was widely advertised already over 2 years ago: PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads()

Consider using OhMyThreads.jl: Thread-Safe Storage · OhMyThreads.jl

ufechner7 · August 11, 2025, 7:24pm

Well, I tried it and it had a very bad performance.

stevengj · August 11, 2025, 7:37pm

You can’t use an array indexed by threadid(), but there are other mechanisms for task-local storage.

For example, the Base function task_local_storage() gives you a task-local IdDict. You can also use other patterns, e.g. a Channel as described in Pattern for managing thread local storage? - #2 by tkf.

Mason · August 11, 2025, 7:38pm

What did you try? There’s a lot of different suggestions in there for different situations.

ufechner7 · August 11, 2025, 7:43pm

This is too complicated for me. I do not have a degree in computer science, only in electrical engineering. Thread local storage is an easy to understand pattern for engineers and researchers.

stevengj · August 11, 2025, 7:53pm

For example, instead of using a value mydata[threadid()], you could use task_local_storage(mydata), or e.g. get(task_local_storage(), mydata, default) if you want a default value, where mydata is some constant (equal for all tasks), typically a global unique symbol (e.g. const mydata = gensym(@__MODULE__)). Ideally appending ::T to tell Julia the type T of the returned value (since it is an Any dictionary).

ufechner7 · August 11, 2025, 7:56pm

I don’t think you can achieve a good performance with dictionaries. Arrays are much faster.

stevengj · August 11, 2025, 7:59pm

Depends on how performance-critical access to the task-local variable is. What is your application where the cost of a dictionary lookup is significant compared to your other calculations in the task? Maybe in your case there is a better abstraction.

ufechner7 · August 11, 2025, 8:02pm

This is the function we are talking about: FLORIDyn.jl/src/visualisation/calc_flowfield.jl at 31081b8e695f5b05f0c3282c365d3eb348f48999 · ufechner7/FLORIDyn.jl · GitHub

stevengj · August 11, 2025, 8:09pm

In other words, you’d be replacing the line

GP = buffers.thread_buffers[tid]

with something like

GP = get!(task_local_storage(), :FLORIDyn_buffer) do
   # create new buffer if it doesn't exist yet for this task
end::WindFarm

Since this is only executed once per iteration, and the rest of the iteration (the rest of your for loop body) seems to do a lot of other work, why would the cost of a dictionary lookup matter?

lmiq · August 11, 2025, 8:11pm

I think the way to make the least modifications in a code that previously used threadid is to use ChunkSplitters.jl, by just replacing the threaded loop by a threaded loop over the chunks of the data:

julia> using ChunkSplitters, Base.Threads

julia> my_arr = rand(10_000);

julia> nchunks = 10
       my_sum = zeros(10)
       @threads for (ichunk, inds) in enumerate(index_chunks(my_arr; n=nchunks))
           my_sum[ichunk] += sum(@view(my_arr[inds]))
       end
       sum(my_sum)
5033.886812176603

# replacement to
julia> my_sum = zeros(10)
       @threads for i in eachindex(my_arr)
           my_sum[threadid()] += my_arr[i]
       end
       sum(my_sum)
5033.886812176624

but OhMyThreads.jl is a higher-level alternative and is probably, most times, a better option after some initial small effort to rewrite the structure of the parallel code.

ps: In your case you would do:

    buffers = create_thread_buffers(wf, nth)
    # Parallel loop using @threads
    using ChunkSplitters: chunks
    @threads for (tid, iGP_range) in enumerate(chunks(1:length(mx); n=nth))
        # Get thread-local buffers
        GP = buffers.thread_buffers[tid] # tid is now the chunk index
        comp_buffers = buffers.thread_comp_buffers[tid]
        for iGP in iGP_range
            # current calculations using iGP
        end
    end

(note that with that nth does not be necessarily equal to nthreads(), which can be useful to control the number of threads used, if nth < nthreads() or increase the number of tasks sometimes improving workload balance, if nth >> nthreads()).

ufechner7 · August 11, 2025, 8:25pm

All of these solutions look much more complex than what I have now. Is there any good reason to depreciate simple, performant, working solution in favor of complex, confusing solutions? Why are making it more and more difficult to write performant code in Julia?

stevengj · August 11, 2025, 8:30pm

Yes: allowing tasks to migrate between threads allows for much more flexible and performant parallelism, especially for irregular parallelism where you don’t know in advance how to equally divide the work among threads (i.e. where you need dynamic load balancing).

(In OpenMP, the same thing happens if you use an “untied” task, and their documentation warns against using the thread number in this case.)

(Caveat: I’m not involved in the details of Julia’s task scheduler; task migration is just a general principle of composable parallelism in my understanding: when a thread is idle, it might need to “steal” work from another thread.)

ufechner7 · August 11, 2025, 9:06pm

So can I conclude that it is discouraged to use @threads in front of a for loop? At least the “good” example at Multi-Threading · The Julia Language is not using @threads any more but @spawn.

On the other hand, you suggested to continue to use @threads together with ChunkSplitters .

And the @threads macro is not longer creating threads but tasks that can move between threads?

Still very confused.

ufechner7 · August 11, 2025, 9:35pm

lmiq:

buffers = create_thread_buffers(wf, nth)
    # Parallel loop using @threads
    using ChunkSplitters: chunks
    @threads for (tid, iGP_range) in enumerate(chunks(1:length(mx); n=nth))
        # Get thread-local buffers
        GP = buffers.thread_buffers[tid] # tid is now the chunk index
        comp_buffers = buffers.thread_comp_buffers[tid]
        for iGP in iGP_range
            # current calculations using iGP
        end
    end

I guess I will try this suggestion and see how it performs.

danielwe · August 11, 2025, 10:07pm

The thing that’s discouraged is using Threads.threadid() as an index into preallocated storage. Not just discouraged—you may get incorrect results!

Threads.@threads is still a nice and easy solution in many cases, as long as you avoid that pattern.

That said, when your algorithm needs some kind of task local storage, you may actually find it easier to obtain a correct and concise solution using the primitives in OhMyThreads.jl instead.

jling · August 11, 2025, 11:11pm

wait, if you have @threads :static for iGP in 1:length(mx) , it’s fine right?

carstenbauer · August 12, 2025, 4:29am

No, it’s not explicitly discouraged, but using threadid() is. Note, though, that @threads has always been very primitive and limited API.
Using it in conjunction with ChunkSplitters.jl makes it a bit more versatile but OhMyThreads.jl tries to provide a better API so that you don’t have to

It never created threads. It always created tasks. What has changed is that tasks, created by @threads, used to be sticky and thus were “pinned” to threads and couldn’t migrate. They now can (since Julia 1.10 IIRC). Hence the 1-1 tasks to threads mapping is generally gone (unless you actively try to restore it).

Topic		Replies	Views
Behavior of threads General Usage multithreading	33	1721	March 24, 2023
PSA: Thread-local state is no longer recommended; Common misconceptions about threadid() and nthreads() General Usage announcement , multithreading	58	3982	April 30, 2024
Behavior of `Threads.@threads for` loop General Usage multithreading	30	5052	February 14, 2022
Sharp edge with `Threads.threadid()` and task migration General Usage multithreading	64	1113	February 12, 2025
How to use many (non-sticky) tasks while maximizing local storage reuse? General Usage multithreading	9	896	August 1, 2023

No more 1st class support for Threads? [thread-local storage]

Related topics