Why has simple threading using `threadid` become so complex in v1.12...?

Hi, several of my repos got raised an issue for the upcoming v1.12 and changes to multithreading, notifying of erroneous usage of threadid. E.g., Likely erroneous use of `Threads.nthreads` and `Threads.threadid` · Issue #169 · JuliaDynamics/RecurrenceAnalysis.jl · GitHub

I didn’t really understand why our usage was wrong, but in any case we tried to find the simplest way to address the issue. What I am rather unhappy with is that in v1.12 threading code has to be so much more complex. One of the big strengths of Julia was how simple/easy was to parallelize an existing code, but now in v1.12 it is significantly more complex, with lots of book-keeping required manually by the user. This gives vibes more like C than like Julia.

This is how we were using threadid, and how we now have to use it in v1.12:

# setup
ds = mutable_datastructure()
dss = [deepcopy(ds) for _ in 1:Threads.nthreads()]
outputs = zeros(length(some_iterable))

# pre v1.12 way:
Threads.@threads for j in some_iterable
    i = Threads.threadid()
    ds = dss[i]
    outputs[j] = computation!(ds, j)
end

# post v1.12 way:
threadchannel = Channel{Int}(Threads.nthreads())
for i in 1:nbuffers
    put!(threadchannel, i)
end

Threads.@threads for j in some_iterable
    i = take!(threadchannel)
    ds = dss[i]
    outputs[j] = computation!(ds, j)
    put!(threadchannel, i)
end
close(threadchannel)

I guess my question is why couldn’t we make the first version work in v1.12 just out of the box? Or, is there a simpler and more elegant way to make version 1 work in v1.12 that is not as verbose and book-keeping heavy as the v1.12 version?

I would write this as

using OhMyThreads
outputs = @tasks for j in some_iterable
    @set collect=true # makes this into a `map`-like operation
    @local ds = mutable_datastructure() # Creates 1-instance of your mutable data structure per task
    computation!(ds, j)
end

For what it’s worth, what you were doing was incorrect long before v1.12, it’s just that it broke even more with 1.12. There’s a blogpost about this here that explains why uses of threadid like this cause race conditions. PSA: Thread-local state is no longer recommended. It should really be updated to mention OhMyThreads.jl though

14 Likes

Instead of using treadid, just create your own “taskid” below

using Base.Threads

numofdata = 100
@show rawdata = rand(numofdata)
# prepare the processed_data
processed_data = zeros(numofdata)

NumOfThreads = Threads.nthreads()
# Manually set the number of threads to 4
# NumOfThreads = 4

# Sanity check
if NumOfThreads > length(rawdata) 
    NumOfThreads = length(rawdata)
end

startpos = 1
endpos = numofdata

function CalcChunk(NumOfThreads,i,startpos,endpos)
    numofdata = endpos - (startpos - 1)
    chunksize = numofdata ÷ NumOfThreads
    firstpos = startpos + chunksize * (i - 1)
    lastpos  = startpos + chunksize * i - 1
    if i == NumOfThreads
        # The last thread takes up any remainding elements
        lastpos = endpos
    end
    return (firstpos,lastpos)
end

Threads.@threads for taskid = 1:NumOfThreads
    global NumOfThreads, startpos, endpos
    for pos = range( CalcChunk(NumOfThreads,taskid,startpos,endpos)... )
        println("Task $(taskid), Working with rawdata[$pos] = $(rawdata[pos])")
        individualdata = rawdata[pos]
        processed_data[pos] = individualdata * individualdata
    end
end


println("Done")

Actually, the idea behind the change (which happened already in Julia 1.7, not 1.12) is to make it simpler, not more complex. It’s unfortunate that so much of idiomatic Julia code used threadid, when it was never a good fit for Julia’s threading model.

A core idea behind Julia’s threading model is that the user is supposed to think in terms of tasks, and not threads. Threads are a resource provided by the operating system, which is handled transparently by the runtime. It is analogous to how a user should think of memory: Of course we can think about how much we consume, but we don’t have any control of where in memory an object is allocated or in which order on the heap a collection of variables are placed. Similar with threads: We can think about how well our program makes use of N threads provided by the operating system, but we should not reason about which task runs on which thread - and ideally, we shouldn’t even write code that depends on N threads being available.

Generally, this simplifies Julia code, but there are some cases where the user is forced to think about threads explicitly. E.g. if you call into some C code where each thread needs to be initialized independently. That’s a rare case though.

In your case, it IS especially tricky because you need to instantiate as few copies of ds as possible, while still guaranteeing that each concurrent task operates on a distinct copy (that’s how I take it, anyway). For this, I would do the same as @Mason suggests above.
Note that the underlying implementation in OhMyThreads still doesn’t express its abstraction in terms of threads, AFAICT, but instead expresses it as a number of iterations shared across a smaller number of tasks.

21 Likes

The documentation for both OhMyThreads.jl and ChunkSplitters.jl do a great job at explaining how “simple” multithreading can work.

As an example, using ChunkSplitters.jl, you could do:

# setup
n = Threads.nthreads()
ds = mutable_datastructure()
dss = [deepcopy(ds) for _ in 1:n]
outputs = zeros(length(some_iterable))

Threads.@threads for (i, c) in enumerate(chunks(some_iterable; n=n))
    local ds = dss[i]
    for j in c
        outputs[j] = computation!(ds, j)
    end
end
2 Likes