Would somebody suggest what is the right to work in a parallel loop and write data to the same Dataframe?

Would somebody suggest what is the right to work in a parallel loop and write data to the same Dataframe?

Thank you in advance.

Is the right way to use locker?

m=ReentrantLock()
Threads.@threads for …

lock(m)
push!(DataFrame,Data)
unlock(m)

end

Locks (ReentrantLock, SpinLock, etc) has high overhead. So I would rather do something like:

using BenchmarkTools

@btime let
    df = DataFrame(x=1:5, y='a':'e')
    dfthr = [similar(df, 0) for i in 1:nthreads()]

    @threads for i in 1:1000
        push!(dfthr[threadid()], (rand(1:10), rand('a':'e')))
    end

    df2 = vcat(df, dfthr...)
end
# 125.201 μs (1466 allocations: 96.47 KiB)

Compare to :

@btime let
    df = DataFrame(x=1:5, y='a':'e')
    m = ReentrantLock()
    @threads for i in 1:1000
        lock(m)
        push!(df, (rand(1:10), rand('a':'e')))
        unlock(m)
    end
end
# 1.923 ms (6033 allocations: 148.08 KiB)

You can use Transducers.jl for this (disclaimer: I’m the author):

2 Likes

Thank you so much