How does a variable get copied to another process?

In parallel computing with julia, I know that variables defined outside of a @spawn code block, but referenced in said block are copied over to the process, but is this a shallow copy or a deep copy?

Say I have a type, kinda like a view of a vector and say I have several instances of them:

struct MyType
    a::Vector{UInt64}
    first::Int 
    last::Int
end
vec = zeros(10)
a = MyType(vec, 1, 5)
b = MyType(vec, 6, 10)

Say a or b gets used then in a @spawn statement and gets copied over, does then a.a or b.b i.e. the vector vec also get copied over (a deep copy)?

As written, I think there’s already a copy when you construct MyType because the vector needs to be converted from Vector{Float64} to Vector{UInt64}. So a.a and b.a no longer alias vec.

If I modify your example slightly:

julia> workers()
1-element Array{Int64,1}:
 2

julia> @everywhere struct MyType
           a::Vector{Float64}
       end

julia> vec = rand(10)
       a = MyType(vec)
       b = MyType(vec);

# the values are unchanged after copying to a worker
julia> println(a)
MyType([0.228585, 0.723633, 0.393861, 0.16252, 0.507443, 0.286907, 0.862922, 0.977845, 0.213451, 0.0352139])

julia> remotecall_fetch(println, 2, a)
    From worker 2:    MyType([0.228585, 0.723633, 0.393861, 0.16252, 0.507443, 0.286907, 0.862922, 0.977845, 0.213451, 0.0352139])

# furthermore a.a and b.a are still aliases after copying to a worker
julia> @everywhere function check(a, b)
           a.a === b.a
       end

julia> check(a, b)
true

julia> remotecall_fetch(check, 2, a, b)
true

Oh yes, my original example was not supposed to copy or do the conversion! So I see from your example, if a and or b get sent to a process, they end up pointing to the same array and share their array, as demonstrated by your check method when it runs on a worker. But the question I then had is: Is the array vec a different array i.e. does master have it’s own unique copy of vec, and does each worker also have it’s own vec, that are equivalent, but not identical. One way I thought I might answer this is with pointer addresses (correct me if I’m wrong):

julia> @everywhere struct MyType
           a::Vector{Float64}
       end

julia> vec = rand(10)
       a = MyType(vec)
       b = MyType(vec);

julia> pointer(a.a)
Ptr{Float64} @0x000000011ea2ba70

julia> pointer(b.a)
Ptr{Float64} @0x000000011ea2ba70

julia> remotecall_fetch((x) -> println(pointer(a.a)), 2, a)
	From worker 2:	Ptr{Float64} @0x000000010ea87f70

The addresses are different, suggesting each process gets its own vec, which is shared between it’s own a and b.

No, I don’t think that’s right.

julia> @everywhere function change_something(a, b)
           a.a[1] = 1
           b.a[1] == 1
       end

julia> remotecall_fetch(change_something, 2, a, b)
true

julia> a
MyType([0.64119, 0.344774, 0.448053, 0.625478, 0.0786488, 0.766661, 0.462587, 0.425674, 0.157289, 0.465477])

julia> b
MyType([0.64119, 0.344774, 0.448053, 0.625478, 0.0786488, 0.766661, 0.462587, 0.425674, 0.157289, 0.465477])

This example shows that if we mutate a on a worker process it is not reflected on the master process. Even though a.a and b.a are aliases on the worker, they do not alias vec on the master process.

Ah yes! I see that now, I was editing my reply to show a little experiment with pointer addresses which also demonstrates what your example shows.

1 Like