In parallel computing with julia, I know that variables defined outside of a @spawn code block, but referenced in said block are copied over to the process, but is this a shallow copy or a deep copy?
Say I have a type, kinda like a view of a vector and say I have several instances of them:
struct MyType
a::Vector{UInt64}
first::Int
last::Int
end
vec = zeros(10)
a = MyType(vec, 1, 5)
b = MyType(vec, 6, 10)
Say a
or b
gets used then in a @spawn statement and gets copied over, does then a.a
or b.b
i.e. the vector vec
also get copied over (a deep copy)?
As written, I think there’s already a copy when you construct MyType
because the vector needs to be converted from Vector{Float64}
to Vector{UInt64}
. So a.a
and b.a
no longer alias vec
.
If I modify your example slightly:
julia> workers()
1-element Array{Int64,1}:
2
julia> @everywhere struct MyType
a::Vector{Float64}
end
julia> vec = rand(10)
a = MyType(vec)
b = MyType(vec);
# the values are unchanged after copying to a worker
julia> println(a)
MyType([0.228585, 0.723633, 0.393861, 0.16252, 0.507443, 0.286907, 0.862922, 0.977845, 0.213451, 0.0352139])
julia> remotecall_fetch(println, 2, a)
From worker 2: MyType([0.228585, 0.723633, 0.393861, 0.16252, 0.507443, 0.286907, 0.862922, 0.977845, 0.213451, 0.0352139])
# furthermore a.a and b.a are still aliases after copying to a worker
julia> @everywhere function check(a, b)
a.a === b.a
end
julia> check(a, b)
true
julia> remotecall_fetch(check, 2, a, b)
true
Oh yes, my original example was not supposed to copy or do the conversion! So I see from your example, if a
and or b
get sent to a process, they end up pointing to the same array and share their array, as demonstrated by your check method when it runs on a worker. But the question I then had is: Is the array vec
a different array i.e. does master have it’s own unique copy of vec
, and does each worker also have it’s own vec
, that are equivalent, but not identical. One way I thought I might answer this is with pointer addresses (correct me if I’m wrong):
julia> @everywhere struct MyType
a::Vector{Float64}
end
julia> vec = rand(10)
a = MyType(vec)
b = MyType(vec);
julia> pointer(a.a)
Ptr{Float64} @0x000000011ea2ba70
julia> pointer(b.a)
Ptr{Float64} @0x000000011ea2ba70
julia> remotecall_fetch((x) -> println(pointer(a.a)), 2, a)
From worker 2: Ptr{Float64} @0x000000010ea87f70
The addresses are different, suggesting each process gets its own vec
, which is shared between it’s own a
and b
.
No, I don’t think that’s right.
julia> @everywhere function change_something(a, b)
a.a[1] = 1
b.a[1] == 1
end
julia> remotecall_fetch(change_something, 2, a, b)
true
julia> a
MyType([0.64119, 0.344774, 0.448053, 0.625478, 0.0786488, 0.766661, 0.462587, 0.425674, 0.157289, 0.465477])
julia> b
MyType([0.64119, 0.344774, 0.448053, 0.625478, 0.0786488, 0.766661, 0.462587, 0.425674, 0.157289, 0.465477])
This example shows that if we mutate a
on a worker process it is not reflected on the master process. Even though a.a
and b.a
are aliases on the worker, they do not alias vec
on the master process.
Ah yes! I see that now, I was editing my reply to show a little experiment with pointer addresses which also demonstrates what your example shows.
1 Like