Pmap: does it copy or share arguments across processes?

for (name, security) in dataset.securities
    push!(arg_array, (name, data[name]))
end
tuples = pmap(extract_variables, arg_array)

In the above code, data[name]) is a Dict{Int64, Float32} that contains a lot of data.

This document https://docs.julialang.org/en/stable/manual/parallel-computing is a very hard read, and it makes me think that arg_array will always be copied to a new process that is running extract_variables unless I resort to something such as Shared Arrays.

If that is true, how can I make data[name] be shared across all processes in pmap so that it will not be copied and cost some efficiency.

Yes, you read it correctly, data is not shared for the pmap or @parallel for loops. Also you did find one of the answers: SharedArrays. The other is DistributedArrays.jl. Last, you can try and use Threads.@treads, which shares data.

1 Like

Thank you. No wonder why in all my benchmarking, pmap is not faster than map.

And unfortunately neither SharedArrays nor DistributedArrays.jl will work with Dict.

DistributedArrays can now hold any type of data with Julia>v0.6.

After reading the documentation, it appears that Threads.@threads is what I am looking for as it does not spawn a new process and can share memory.

len = length(arg_array)
secs = Array{Data.Security, 1}(len)
Threads.@threads for i = 1:len
    secs[i] = read_csv_and_init(arg_array[i])
end

I am getting Bus error: 10 from the code above.

It is interesting because without Threads.@threads this code runs fine.

read_csv_and_init does not cause any side effect either as it does not modify any of its input arguments nor access global variables.

I am on Mac OS High Sierra.

Reading files in parallel is probably not the best. Usually that is limited by disc access not CPU. I don’t know whether that could cause the error too.

This code gets Bus error: 10

function test(filename)
    readdlm(string("data/yahoo/", filename), ',')
end

Threads.@threads for filename = readdir("data/yahoo/")
    test(filename)
end

While this code works fine.

function test(filename)
    readdlm(string("data/yahoo/", "2S.csv"), ',')
end

Threads.@threads for filename in readdir("data/yahoo/")
    test(filename)
end

It appears that the error is caused when trying to access filename from readdlm where as the string literal “2S.csv” works fine.

I don’t believe IO is thread-safe yet (ever?). Printing in threads will do the same.

1 Like

For printing Core.println works.