No more threadid indexing? [thread-local storage]

I think the way to make the least modifications in a code that previously used threadid is to use ChunkSplitters.jl, by just replacing the threaded loop by a threaded loop over the chunks of the data:

julia> using ChunkSplitters, Base.Threads

julia> my_arr = rand(10_000);

julia> nchunks = 10
       my_sum = zeros(10)
       @threads for (ichunk, inds) in enumerate(index_chunks(my_arr; n=nchunks))
           my_sum[ichunk] += sum(@view(my_arr[inds]))
       end
       sum(my_sum)
5033.886812176603

# replacement to
julia> my_sum = zeros(10)
       @threads for i in eachindex(my_arr)
           my_sum[threadid()] += my_arr[i]
       end
       sum(my_sum)
5033.886812176624


but OhMyThreads.jl is a higher-level alternative and is probably, most times, a better option after some initial small effort to rewrite the structure of the parallel code.

ps: In your case you would do:

    buffers = create_thread_buffers(wf, nth)
    # Parallel loop using @threads
    using ChunkSplitters: chunks
    @threads for (tid, iGP_range) in enumerate(chunks(1:length(mx); n=nth))
        # Get thread-local buffers
        GP = buffers.thread_buffers[tid] # tid is now the chunk index
        comp_buffers = buffers.thread_comp_buffers[tid]
        for iGP in iGP_range
            # current calculations using iGP
        end
    end

(note that with that nth does not be necessarily equal to nthreads(), which can be useful to control the number of threads used, if nth < nthreads() or increase the number of tasks sometimes improving workload balance, if nth >> nthreads()).