How does a variable get copied to another process?

parallel

#1

In parallel computing with julia, I know that variables defined outside of a @spawn code block, but referenced in said block are copied over to the process, but is this a shallow copy or a deep copy?

Say I have a type, kinda like a view of a vector and say I have several instances of them:

struct MyType
    a::Vector{UInt64}
    first::Int 
    last::Int
end
vec = zeros(10)
a = MyType(vec, 1, 5)
b = MyType(vec, 6, 10)

Say a or b gets used then in a @spawn statement and gets copied over, does then a.a or b.b i.e. the vector vec also get copied over (a deep copy)?


#2

As written, I think there’s already a copy when you construct MyType because the vector needs to be converted from Vector{Float64} to Vector{UInt64}. So a.a and b.a no longer alias vec.

If I modify your example slightly:

julia> workers()
1-element Array{Int64,1}:
 2

julia> @everywhere struct MyType
           a::Vector{Float64}
       end

julia> vec = rand(10)
       a = MyType(vec)
       b = MyType(vec);

# the values are unchanged after copying to a worker
julia> println(a)
MyType([0.228585, 0.723633, 0.393861, 0.16252, 0.507443, 0.286907, 0.862922, 0.977845, 0.213451, 0.0352139])

julia> remotecall_fetch(println, 2, a)
    From worker 2:    MyType([0.228585, 0.723633, 0.393861, 0.16252, 0.507443, 0.286907, 0.862922, 0.977845, 0.213451, 0.0352139])

# furthermore a.a and b.a are still aliases after copying to a worker
julia> @everywhere function check(a, b)
           a.a === b.a
       end

julia> check(a, b)
true

julia> remotecall_fetch(check, 2, a, b)
true

#3

Oh yes, my original example was not supposed to copy or do the conversion! So I see from your example, if a and or b get sent to a process, they end up pointing to the same array and share their array, as demonstrated by your check method when it runs on a worker. But the question I then had is: Is the array vec a different array i.e. does master have it’s own unique copy of vec, and does each worker also have it’s own vec, that are equivalent, but not identical. One way I thought I might answer this is with pointer addresses (correct me if I’m wrong):

julia> @everywhere struct MyType
           a::Vector{Float64}
       end

julia> vec = rand(10)
       a = MyType(vec)
       b = MyType(vec);

julia> pointer(a.a)
Ptr{Float64} @0x000000011ea2ba70

julia> pointer(b.a)
Ptr{Float64} @0x000000011ea2ba70

julia> remotecall_fetch((x) -> println(pointer(a.a)), 2, a)
	From worker 2:	Ptr{Float64} @0x000000010ea87f70

The addresses are different, suggesting each process gets its own vec, which is shared between it’s own a and b.


#4

No, I don’t think that’s right.

julia> @everywhere function change_something(a, b)
           a.a[1] = 1
           b.a[1] == 1
       end

julia> remotecall_fetch(change_something, 2, a, b)
true

julia> a
MyType([0.64119, 0.344774, 0.448053, 0.625478, 0.0786488, 0.766661, 0.462587, 0.425674, 0.157289, 0.465477])

julia> b
MyType([0.64119, 0.344774, 0.448053, 0.625478, 0.0786488, 0.766661, 0.462587, 0.425674, 0.157289, 0.465477])

This example shows that if we mutate a on a worker process it is not reflected on the master process. Even though a.a and b.a are aliases on the worker, they do not alias vec on the master process.


#5

Ah yes! I see that now, I was editing my reply to show a little experiment with pointer addresses which also demonstrates what your example shows.