for (name, security) in dataset.securities
push!(arg_array, (name, data[name]))
end
tuples = pmap(extract_variables, arg_array)
In the above code, data[name]) is a Dict{Int64, Float32} that contains a lot of data.
This document https://docs.julialang.org/en/stable/manual/parallel-computing is a very hard read, and it makes me think that arg_array will always be copied to a new process that is running extract_variables unless I resort to something such as Shared Arrays.
If that is true, how can I make data[name] be shared across all processes in pmap so that it will not be copied and cost some efficiency.
Yes, you read it correctly, data is not shared for the pmap or @parallel for loops. Also you did find one of the answers: SharedArrays. The other is DistributedArrays.jl. Last, you can try and use Threads.@treads, which shares data.
Reading files in parallel is probably not the best. Usually that is limited by disc access not CPU. I don’t know whether that could cause the error too.