Would somebody suggest what is the right to work in a parallel loop and write data to the same Dataframe?
Thank you in advance.
Would somebody suggest what is the right to work in a parallel loop and write data to the same Dataframe?
Thank you in advance.
Is the right way to use locker?
m=ReentrantLock()
Threads.@threads for …
…
lock(m)
push!(DataFrame,Data)
unlock(m)
…
end
Locks (ReentrantLock, SpinLock, etc) has high overhead. So I would rather do something like:
using BenchmarkTools
@btime let
df = DataFrame(x=1:5, y='a':'e')
dfthr = [similar(df, 0) for i in 1:nthreads()]
@threads for i in 1:1000
push!(dfthr[threadid()], (rand(1:10), rand('a':'e')))
end
df2 = vcat(df, dfthr...)
end
# 125.201 μs (1466 allocations: 96.47 KiB)
Compare to :
@btime let
df = DataFrame(x=1:5, y='a':'e')
m = ReentrantLock()
@threads for i in 1:1000
lock(m)
push!(df, (rand(1:10), rand('a':'e')))
unlock(m)
end
end
# 1.923 ms (6033 allocations: 148.08 KiB)
You can use Transducers.jl for this (disclaimer: I’m the author):
Thank you so much