Bug in DistributedArrays?


#1

Hello all,

I encountered this in an application and it took forever to debug. I think this is a bug in DistributedArrays on v0.6.4. MWE:

julia> addprocs(2)
2-element Array{Int64,1}:
 2
 3

julia> using DistributedArrays
julia> a=[rand(2) for i=1:2];

julia> b,c=copy(a),deepcopy(a);

julia> da=@DArray [rand(2) for i=1:2];

julia> db,dc=copy(da),deepcopy(da);

julia> da.pids==db.pids==dc.pids
true

julia> da.indexes==db.indexes==dc.indexes
true

julia> @everywhere f(a,b)=pointer(a),pointer(b),pointer(a)==pointer(b)

julia> f.(a,b) #should be true
2-element Array{Tuple{Ptr{Float64},Ptr{Float64},Bool},1}:
 (Ptr{Float64} @0x00002b397b97b1d0, Ptr{Float64} @0x00002b397b97b1d0, true)
 (Ptr{Float64} @0x00002b397b97b2f0, Ptr{Float64} @0x00002b397b97b2f0, true)

julia> f.(a,c) #should be false
2-element Array{Tuple{Ptr{Float64},Ptr{Float64},Bool},1}:
 (Ptr{Float64} @0x00002b397b97b1d0, Ptr{Float64} @0x00002b397b989f70, false)
 (Ptr{Float64} @0x00002b397b97b2f0, Ptr{Float64} @0x00002b397b989fd0, false)

julia> @spawnat 2 @show pointer(da[:L][1]),pointer(db[:L][1]),pointer(dc[:L][1])
Future(2, 1, 116, Nullable{Any}())

julia>  From worker 2:  (pointer((da[:L])[1]), pointer((db[:L])[1]), pointer((dc[:L])[1])) = (Ptr{Float64} @0x00002b5fb31db410, Ptr{Float64} @0x00002b5fb31db410, Ptr{Float64} @0x00002b5fb31db410)
julia> 

julia> @spawnat 2 @show pointer(da[:L][1])==pointer(db[:L][1])==pointer(dc[:L][1])
Future(2, 1, 115, Nullable{Any}())

julia>  From worker 2:  pointer((da[:L])[1]) == pointer((db[:L])[1]) == pointer((dc[:L])[1]) = true

The last statement should be wrong but it is true. Is this a bug or expected behavior? Is it worth filing an issue?

Cheers!


#2

The returned pointers seem to be all zero, even on the workers:

julia> pointer.(da)
2-element DistributedArrays.DArray{Ptr{Float64},1,Array{Ptr{Float64},1}}:
 Ptr{Float64} @0x0000000000000000
 Ptr{Float64} @0x0000000000000000

julia> fetch(@spawnat 2 pointer.(da))
2-element DistributedArrays.DArray{Ptr{Float64},1,Array{Ptr{Float64},1}}:
 Ptr{Float64} @0x0000000000000000
 Ptr{Float64} @0x0000000000000000

Actually, what should pointer return for a DArray? The data is not necessarily stored on the process that invokes pointer.


#3

Sorry those were the wrong examples to show, I had tested differently, this is correct and the issue still holds:

julia> @spawnat 2 @show pointer(da[:L][1]),pointer(db[:L][1]),pointer(dc[:L][1])

From worker 2:  (pointer((da[:L])[1]), pointer((db[:L])[1]), pointer((dc[:L])[1])) = (Ptr{Float64} @0x00002b5fb31db410, Ptr{Float64} @0x00002b5fb31db410, Ptr{Float64} @0x00002b5fb31db410)

julia> @spawnat 2 @show pointer(da[:L][1])==pointer(db[:L][1])==pointer(dc[:L][1])
Future(2, 1, 115, Nullable{Any}())

julia>  From worker 2:  pointer((da[:L])[1]) == pointer((db[:L])[1]) == pointer((dc[:L])[1]) = true

Will update the original post.

Your two examples are because pointer.(da) is executed on master and worker two does not “own” da.

I am not sure why broadcasting didn’t work correctly for the distributed arrays though, I thought it would.


#4

If I understand correctly, the issue is that deepcopy(::DArray) doesn’t create a deep but a shallow copy. But copy(::DArray) works as intended, correct?

At least when I try the following (with DistributedArrays master, julia 0.6.4):

julia> da=@DArray [rand(2) for i=1:2];
julia> db=copy(da); dc=deepcopy(da);
julia> fetch(@spawnat 2 da[:L][1] .= 0);
julia> db
2-element DistributedArrays.DArray{Array{Float64,1},1,Array{Array{Float64,1},1}}:
 [0.0, 0.0]          
 [0.224562, 0.483087]
julia> dc
2-element DistributedArrays.DArray{Array{Float64,1},1,Array{Array{Float64,1},1}}:
 [0.0, 0.0]          
 [0.224562, 0.483087]

the outcome for the copy looks correct (it is a shallow copy so shares data with the source) but the one for deepcopy looks wrong (it should be independent of the source). So it’s worth filing an issue.

I have a fork of DistributedArrays with updates for julia 0.7. It currently passes all tests without deprecation warnings and I’ll open a PR soon. The issue you found was also present there, and I have added an explicit method for deepcopy(::DArray) which fixes the issue (also added tests for it). It should also be possible to backport that fix to the current upstream master.


#5

Correct, that is the issue. Thank you for working on it, it is great!

It may also be worth understanding why broadcasting is not working correctly on the pointer function above. When replaced with any other function that works on the arrays, it broadcasts to the local parts correctly, and returns a distributed result.