Hi all
I use the DataFrame.jl package to define and manipulate arrays across several nested functions.
I plan to parallelize some of the tasks (e.g., for
loops with data allocations) by using the Distributed.jl package. To allow for a simultaneous modification across workers of these DataFrame
objects in each iteration, I am planning to use the library SharedArrays
Toy example:
using DataFrames
using Distributed
@everywhere using SharedArrays
addprocs(4)
ixt = 100
ixr = 10
#Empty df()
A = DataFrame(Matrix{Any}(Nothing(), ne, nx), :auto)
#Non-parallelized loop:
for ie = 1:ne
for ix = 1:nx
A[ie, ix] = 10 + ie #random calculation
end
end
# Parallelized version:
for ie = 1:ne
@sync @distributed for ix = 1:nx
A[ie, ix] = 10 + ie #random calculation
end
end
For the parallelized version, we will first need to declare the matrix A
to be modified inside the for
loop in each parallel iterations by typing something like:
A = SharedArray(DataFrame(Matrix{Any}(Nothing(), ne, nx), :auto))
However, that produces the following error message:
julia> A = SharedArray(DataFrame(Matrix{Any}(Nothing(), ne, nx), :auto))
ERROR: MethodError: no method matching SharedArray(::DataFrame)
Any idea is much appreciated!