Iterating over dict in parallel

I expected the following to work:

using Distributed
addprocs(4)

d = Dict("a" => 1, "b" => 2)
@sync @distributed for (k, v) in d
    println(k)
end

But receive this error:

ERROR: On worker 2:
KeyError: key 1:1 not found
getindex at ./dict.jl:478 [inlined]

Iterating over a vector in parallel or iterating over the dict sequentially works fine.
Could someone explain what’s going on here?

Dict is not shared automatically like e.g. the data type SharedArrays.
https://docs.julialang.org/en/v1/stdlib/SharedArrays/index.html
I can not tell an exact solution but you may find the proper way in the Docs:
https://docs.julialang.org/en/v1/manual/parallel-computing/
especially
https://docs.julialang.org/en/v1/manual/parallel-computing/#Data-Movement-1

Looking at what code @distributed generates:

@macroexpand @distributed for (k, v) in d
    println(k)
end

gives a hint what’s going on, in particular the following line from the output:

for (k, v) = #25#R[#26#lo:#27#hi]

Looks like the @distributed macro uses length(d) to break the object into slices lo:hi, in this case 1:1 and 2:2, but slicing like that doesn’t work with dicts. You can put the items in the dict into an array using collect then it should work:

julia> @sync @distributed for (k, v) in collect(d)
           println(k)
       end
      From worker 2:	b
      From worker 3:	a
Task (done) @0x00007f4b98df8010

I think arguably this should work without the collect, so you could file an Issue, if there isn’t something similar already.

4 Likes

Thank you! that explains a lot.