How to create private arrays for each thread before a loop?

I have a loop that does some linear algebra. Each iteration is independent of each other. I need some helper arrays in each iteration. How can I make each thread allocate its helper arrays once before the loop?

In the example below, I would like to make each thread allocated a single time its own copy of a and b. How can that be done?

function foo(n)
    out = Array{Float64}(undef, n)
    Threads.@threads for i = 1:n
        a = Array{Float64}(undef, n, n)
        b = Array{Float64}(undef, n)
        for j = 1:n
            for k = 1:n
                a[k,j] = j^k
            end
            b[j] = i * j
        end
        out[i] = (a\b)[1]
    end
    return out
end

I allocate buffers outside of the loop. https://github.com/PetrKryslUCSD/FinEtoolsDeforNonlinear.jl/blob/9f9704e78420f4325f6ad6bd431894dc4df574cf/examples/dynamics/transient/3-d/cantilever_dyn_examples.jl#L190

For the moment you may be able to do something like

function foo(n)
    out = Array{Float64}(undef, n)
    abufs = [Array{Float64}(undef, n, n) for _ in 1:Threads.nthreads()]
    bbufs = [Array{Float64}(undef, n) for _ in 1:Threads.nthreads()]
    Threads.@threads for i = 1:n
        a = abufs[Threads.threadid()]
        b = bbufs[Threads.threadid()]

although this depends on the internal detail on how tasks are scheduled (so it may not be safe in the future and there are usecases it’s already unsafe). See also: Task affinity to threads

FYI, note also that BLAS and Julia threading do not play very well ATM:

2 Likes